1
|
Tajeddin N, Arabfard M, Alizadeh S, Salesi M, Khamse S, Delbari A, Ohadi M. Novel islands of GGC and GCC repeats coincide with human evolution. Gene 2024; 902:148194. [PMID: 38262548 DOI: 10.1016/j.gene.2024.148194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 10/29/2023] [Accepted: 01/18/2024] [Indexed: 01/25/2024]
Abstract
BACKGROUND Because of high mutation rate, overrepresentation in genic regions, and link with various neurological, neurodegenerative, and movement disorders, GGC and GCC short tandem repeats (STRs) are prone to natural selection. Among a number of lacking data, the 3-repeats of these STRs remain widely unexplored. RESULTS In a genome-wide search in human, here we mapped GGC and GCC STRs of ≥3-repeats, and found novel islands of up to 45 of those STRs, populating spans of 1 to 2 kb of genomic DNA. RGPD4 and NOC4L harbored the densest (GGC)3 (probability 3.09061E-71) and (GCC)3 (probability 1.72376E-61) islands, respectively, and were human-specific. We also found prime instances of directional incremented density of STRs at specific loci in human versus other species, including the FOXK2 and SKI GGC islands. The genes containing those islands significantly diverged in expression in human versus other species, and the proteins encoded by those genes interact closely in a physical interaction network, consequence of which may be human-specific characteristics such as higher order brain functions. CONCLUSION We report novel islands of GGC and GCC STRs of evolutionary relevance to human. The density, and in some instances, periodicity of these islands support them as a novel genomic entity, which need to be further explored in evolutionary, mechanistic, and functional platforms.
Collapse
Affiliation(s)
- N Tajeddin
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - M Arabfard
- Chemical Injuries Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - S Alizadeh
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - M Salesi
- Chemical Injuries Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - S Khamse
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - A Delbari
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - M Ohadi
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
| |
Collapse
|
2
|
Behringer MG, Ho WC, Miller SF, Worthan SB, Cen Z, Stikeleather R, Lynch M. Trade-offs, trade-ups, and high mutational parallelism underlie microbial adaptation during extreme cycles of feast and famine. Curr Biol 2024; 34:1403-1413.e5. [PMID: 38460514 PMCID: PMC11066936 DOI: 10.1016/j.cub.2024.02.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 12/12/2023] [Accepted: 02/16/2024] [Indexed: 03/11/2024]
Abstract
Microbes are evolutionarily robust organisms capable of rapid adaptation to complex stress, which enables them to colonize harsh environments. In nature, microbes are regularly challenged by starvation, which is a particularly complex stress because resource limitation often co-occurs with changes in pH, osmolarity, and toxin accumulation created by metabolic waste. Often overlooked are the additional complications introduced by eventual resource replenishment, as successful microbes must withstand rapid environmental shifts before swiftly capitalizing on replenished resources to avoid invasion by competing species. To understand how microbes navigate trade-offs between growth and survival, ultimately adapting to thrive in environments with extreme fluctuations, we experimentally evolved 16 Escherichia coli populations for 900 days in repeated feast/famine conditions with cycles of 100-day starvation before resource replenishment. Using longitudinal population-genomic analysis, we found that evolution in response to extreme feast/famine is characterized by narrow adaptive trajectories with high mutational parallelism and notable mutational order. Genetic reconstructions reveal that early mutations result in trade-offs for biofilm and motility but trade-ups for growth and survival, as these mutations conferred positively correlated advantages during both short-term and long-term culture. Our results demonstrate how microbes can navigate the adaptive landscapes of regularly fluctuating conditions and ultimately follow mutational trajectories that confer benefits across diverse environments.
Collapse
Affiliation(s)
- Megan G Behringer
- Department of Biological Sciences, Vanderbilt University, 21st Avenue S, Nashville, TN 37232, USA; Department of Pathology Microbiology and Immunology, Vanderbilt University Medical Center, 21st Avenue S, Nashville, TN 37232, USA.
| | - Wei-Chin Ho
- Biodesign Center for Mechanisms of Evolution, Arizona State University, S McAllister Ave., Tempe, AZ 85281, USA; Department of Biology, University of Texas at Tyler, University Blvd., Tyler, TX 75799, USA.
| | - Samuel F Miller
- Biodesign Center for Mechanisms of Evolution, Arizona State University, S McAllister Ave., Tempe, AZ 85281, USA
| | - Sarah B Worthan
- Department of Biological Sciences, Vanderbilt University, 21st Avenue S, Nashville, TN 37232, USA
| | - Zeer Cen
- Department of Biological Sciences, Vanderbilt University, 21st Avenue S, Nashville, TN 37232, USA
| | - Ryan Stikeleather
- Biodesign Center for Mechanisms of Evolution, Arizona State University, S McAllister Ave., Tempe, AZ 85281, USA
| | - Michael Lynch
- Biodesign Center for Mechanisms of Evolution, Arizona State University, S McAllister Ave., Tempe, AZ 85281, USA
| |
Collapse
|
3
|
Khamse S, Alizadeh S, Khorshid HRK, Delbari A, Tajeddin N, Ohadi M. A Hypermutable Region in the DISP2 Gene Links to Natural Selection and Late-Onset Neurocognitive Disorders in Humans. Mol Neurobiol 2024:10.1007/s12035-024-04155-y. [PMID: 38565786 DOI: 10.1007/s12035-024-04155-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2023] [Accepted: 03/25/2024] [Indexed: 04/04/2024]
Abstract
(CCG) short tandem repeats (STRs) are predominantly enriched in genic regions, mutation hotspots for C to T truncating substitutions, and involved in various neurological and neurodevelopmental disorders. However, intact blocks of this class of STRs are widely overlooked with respect to their link with natural selection. The human neuron-specific gene, DISP2 (dispatched RND transporter family member 2), contains a (CCG) repeat in its 5' untranslated region. Here, we sequenced this STR in a sample of 448 Iranian individuals, consisting of late-onset neurocognitive disorder (NCD) (N = 203) and controls (N = 245). We found that the region spanning the (CCG) repeat was highly mutated, resulting in several flanking (CCG) residues. However, an 8-repeat of the (CCG) repeat was predominantly abundant (frequency = 0.92) across the two groups. While the overall distribution of genotypes was not different between the two groups (p > 0.05), we detected four genotypes in the NCD group only (2% of the NCD genotypes, Mid-p = 0.02), consisting of extreme short alleles, 5- and 6-repeats, that were not detected in the control group. The patients harboring those genotypes received the diagnoses of probable Alzheimer's disease and vascular dementia. We also found six genotypes in the control group only (2.5% of the control genotypes, Mid-p = 0.01) that consisted of the 8-repeat and extreme long alleles, 9- and 10-repeats, of which the 10-repeat was not detected in the NCD group. The (CCG) repeat specifically expanded in primates. In conclusion, we report an indication of natural selection at a novel hypermutable region in the human genome and divergent alleles and genotypes in late-onset NhCDs and controls. These findings reinforce the hypothesis that a collection of rare alleles and genotypes in a number of genes may unambiguously contribute to the cognition impairment component of late-onset NCDs.
Collapse
Affiliation(s)
- S Khamse
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - S Alizadeh
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - H R Khorram Khorshid
- Personalized Medicine and Genometabolomics Research Center, Hope Generation Foundation, Tehran, Iran
| | - A Delbari
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
| | - N Tajeddin
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
- Department of Biology, Central Tehran Branch, Islamic Azad University, Tehran, Iran
| | - M Ohadi
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
| |
Collapse
|
4
|
Arabfard M, Tajeddin N, Alizadeh S, Salesi M, Bayat H, Khorram Khorshid HR, Khamse S, Delbari A, Ohadi M. Dyads of GGC and GCC form hotspot colonies that coincide with the evolution of human and other great apes. BMC Genom Data 2024; 25:21. [PMID: 38383300 PMCID: PMC10880355 DOI: 10.1186/s12863-024-01207-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 02/11/2024] [Indexed: 02/23/2024] Open
Abstract
BACKGROUND GGC and GCC short tandem repeats (STRs) are of various evolutionary, biological, and pathological implications. However, the fundamental two-repeats (dyads) of these STRs are widely unexplored. RESULTS On a genome-wide scale, we mapped (GGC)2 and (GCC)2 dyads in human, and found monumental colonies (distance between each dyad < 500 bp) of extraordinary density, and in some instances periodicity. The largest (GCC)2 and (GGC)2 colonies were intergenic, homogeneous, and human-specific, consisting of 219 (GCC)2 on chromosome 2 (probability < 1.545E-219) and 70 (GGC)2 on chromosome 9 (probability = 1.809E-148). We also found that several colonies were shared in other great apes, and directionally increased in density and complexity in human, such as a colony of 99 (GCC)2 on chromosome 20, that specifically expanded in great apes, and reached maximum complexity in human (probability 1.545E-220). Numerous other colonies of evolutionary relevance in human were detected in other largely overlooked regions of the genome, such as chromosome Y and pseudogenes. Several of the genes containing or nearest to those colonies were divergently expressed in human. CONCLUSION In conclusion, (GCC)2 and (GGC)2 form unprecedented genomic colonies that coincide with the evolution of human and other great apes. The extent of the genomic rearrangements leading to those colonies support overlooked recombination hotspots, shared across great apes. The identified colonies deserve to be studied in mechanistic, evolutionary, and functional platforms.
Collapse
Affiliation(s)
- M Arabfard
- Chemical Injuries Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - N Tajeddin
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
- Department of Biology, Central Tehran Branch, Islamic Azad University, Tehran, Iran
| | - S Alizadeh
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - M Salesi
- Chemical Injuries Research Center, Systems Biology and Poisonings Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
- Research Center for Prevention of Oral and Dental Diseases, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - H Bayat
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - H R Khorram Khorshid
- Personalized Medicine and Genometabolomics Research Center, Hope Generation Foundation, Tehran, Iran
| | - S Khamse
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - A Delbari
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - M Ohadi
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
| |
Collapse
|
5
|
Alizadeh S, Khamse S, Tajeddin N, Khorram Khorshid HR, Delbari A, Ohadi M. A GCC repeat in RAB26 undergoes natural selection in human and harbors divergent genotypes in late-onset Alzheimer's disease. Gene 2024; 893:147968. [PMID: 37931854 DOI: 10.1016/j.gene.2023.147968] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Revised: 10/28/2023] [Accepted: 11/03/2023] [Indexed: 11/08/2023]
Abstract
Although mainly located in genic regions and being mutation hotspots, intact blocks of CG-rich trinucleotide short tandem repeats (STRs) are largely overlooked with respect to their link with natural selection. The human RAB26 (member RAS oncogene family) directs synaptic and secretory vesicles into preautophagosomal structures, inhibition of which specifically disrupts axonal transport of degradative organelles and leads to an axonal dystrophy, resembling Alzheimer's disease (AD). Human RAB26 contains a GCC repeat in the top 1st percent in respect of length. Here we sequenced this STR in 441 Iranian individuals, consisting of late-onset neurocognitive disorder (NCD) (N = 216) and controls (N = 225). In both groups, the 12-repeat allele and the 12/12 genotype were predominantly abundant. We found excess of homozygosity for non-12 alleles in the NCD group (Mid-P exact = 0.027). Furthermore, divergent genotypes were detected that were specific to the NCD group (2.8% of genotypes) (Mid-P exact = 0.006) or controls (3.1% of genotypes) (Mid-P exact = 0.004). The patients harboring divergent genotypes received the diagnosis of AD. Based on the predominant abundance of the 12-repeat and 12/12 genotype in both groups, excess of non-12 homozygosity in the NCD group, and divergent genotypes across the NCD and control groups, we propose natural selection at this locus and link with late-onset AD. Our findings strengthen the hypothesis that a collection of rare genotypes unambiguously contribute to the pathogenesis of late-onset NCDs, such as AD.
Collapse
Affiliation(s)
- S Alizadeh
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - S Khamse
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - N Tajeddin
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - H R Khorram Khorshid
- Personalized Medicine and Genometabolomics Research Center, Hope Generation Foundation, Tehran, Iran
| | - A Delbari
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
| | - M Ohadi
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
| |
Collapse
|
6
|
Ding Y, Liao Y, He J, Ma J, Wei X, Liu X, Zhang G, Wang J. Enhancing genomic mutation data storage optimization based on the compression of asymmetry of sparsity. Front Genet 2023; 14:1213907. [PMID: 37323665 PMCID: PMC10267386 DOI: 10.3389/fgene.2023.1213907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 05/24/2023] [Indexed: 06/17/2023] Open
Abstract
Background: With the rapid development of high-throughput sequencing technology and the explosive growth of genomic data, storing, transmitting and processing massive amounts of data has become a new challenge. How to achieve fast lossless compression and decompression according to the characteristics of the data to speed up data transmission and processing requires research on relevant compression algorithms. Methods: In this paper, a compression algorithm for sparse asymmetric gene mutations (CA_SAGM) based on the characteristics of sparse genomic mutation data was proposed. The data was first sorted on a row-first basis so that neighboring non-zero elements were as close as possible to each other. The data were then renumbered using the reverse Cuthill-Mckee sorting technique. Finally the data were compressed into sparse row format (CSR) and stored. We had analyzed and compared the results of the CA_SAGM, coordinate format (COO) and compressed sparse column format (CSC) algorithms for sparse asymmetric genomic data. Nine types of single-nucleotide variation (SNV) data and six types of copy number variation (CNV) data from the TCGA database were used as the subjects of this study. Compression and decompression time, compression and decompression rate, compression memory and compression ratio were used as evaluation metrics. The correlation between each metric and the basic characteristics of the original data was further investigated. Results: The experimental results showed that the COO method had the shortest compression time, the fastest compression rate and the largest compression ratio, and had the best compression performance. CSC compression performance was the worst, and CA_SAGM compression performance was between the two. When decompressing the data, CA_SAGM performed the best, with the shortest decompression time and the fastest decompression rate. COO decompression performance was the worst. With increasing sparsity, the COO, CSC and CA_SAGM algorithms all exhibited longer compression and decompression times, lower compression and decompression rates, larger compression memory and lower compression ratios. When the sparsity was large, the compression memory and compression ratio of the three algorithms showed no difference characteristics, but the rest of the indexes were still different. Conclusion: CA_SAGM was an efficient compression algorithm that combines compression and decompression performance for sparse genomic mutation data.
Collapse
Affiliation(s)
- Youde Ding
- The Sixth Affiliated Hospital of Guangzhou Medical University, Qingyuan People’s Hospital, Qingyuan, China
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou, China
| | - Yuan Liao
- The Sixth Affiliated Hospital of Guangzhou Medical University, Qingyuan People’s Hospital, Qingyuan, China
| | - Ji He
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou, China
| | - Jianfeng Ma
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou, China
| | - Xu Wei
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou, China
| | - Xuemei Liu
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou, China
| | - Guiying Zhang
- The Sixth Affiliated Hospital of Guangzhou Medical University, Qingyuan People’s Hospital, Qingyuan, China
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou, China
| | - Jing Wang
- The Sixth Affiliated Hospital of Guangzhou Medical University, Qingyuan People’s Hospital, Qingyuan, China
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou, China
| |
Collapse
|
7
|
Shi Y, Niu Y, Zhang P, Luo H, Liu S, Zhang S, Wang J, Li Y, Liu X, Song T, Xu T, He S. Characterization of genome-wide STR variation in 6487 human genomes. Nat Commun 2023; 14:2092. [PMID: 37045857 PMCID: PMC10097659 DOI: 10.1038/s41467-023-37690-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 03/27/2023] [Indexed: 04/14/2023] Open
Abstract
Short tandem repeats (STRs) are abundant and highly mutagenic in the human genome. Many STR loci have been associated with a range of human genetic disorders. However, most population-scale studies on STR variation in humans have focused on European ancestry cohorts or are limited by sequencing depth. Here, we depicted a comprehensive map of 366,013 polymorphic STRs (pSTRs) constructed from 6487 deeply sequenced genomes, comprising 3983 Chinese samples (~31.5x, NyuWa) and 2504 samples from the 1000 Genomes Project (~33.3x, 1KGP). We found that STR mutations were affected by motif length, chromosome context and epigenetic features. We identified 3273 and 1117 pSTRs whose repeat numbers were associated with gene expression and 3'UTR alternative polyadenylation, respectively. We also implemented population analysis, investigated population differentiated signatures, and genotyped 60 known disease-causing STRs. Overall, this study further extends the scale of STR variation in humans and propels our understanding of the semantics of STRs.
Collapse
Affiliation(s)
- Yirong Shi
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yiwei Niu
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Peng Zhang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Huaxia Luo
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Shuai Liu
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Sijia Zhang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jiajia Wang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Yanyan Li
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Xinyue Liu
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Tingrui Song
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Tao Xu
- National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
- Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, 250117, Shandong, China.
| | - Shunmin He
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
8
|
Kindelay SM, Maggert KA. Under the magnifying glass: The ups and downs of rDNA copy number. Semin Cell Dev Biol 2023; 136:38-48. [PMID: 35595601 PMCID: PMC9976841 DOI: 10.1016/j.semcdb.2022.05.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 04/27/2022] [Accepted: 05/09/2022] [Indexed: 11/22/2022]
Abstract
The ribosomal DNA (rDNA) in Drosophila is found as two additive clusters of individual 35 S cistrons. The multiplicity of rDNA is essential to assure proper translational demands, but the nature of the tandem arrays expose them to copy number variation within and between populations. Here, we discuss means by which a cell responds to insufficient rDNA copy number, including a historical view of rDNA magnification whose mechanism was inferred some 35 years ago. Recent work has revealed that multiple conditions may also result in rDNA loss, in response to which rDNA magnification may have evolved. We discuss potential models for the mechanism of magnification, and evaluate possible consequences of rDNA copy number variation.
Collapse
Affiliation(s)
- Selina M Kindelay
- Genetics Graduate Interdisciplinary Program, The University of Arizona, Tucson, AZ 85724, USA
| | - Keith A Maggert
- Genetics Graduate Interdisciplinary Program, The University of Arizona, Tucson, AZ 85724, USA; Department of Cellular and Molecular Medicine, The University of Arizona, Tucson, AZ 85724, USA.
| |
Collapse
|
9
|
Verbiest M, Maksimov M, Jin Y, Anisimova M, Gymrek M, Bilgin Sonay T. Mutation and selection processes regulating short tandem repeats give rise to genetic and phenotypic diversity across species. J Evol Biol 2023; 36:321-336. [PMID: 36289560 PMCID: PMC9990875 DOI: 10.1111/jeb.14106] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 06/29/2022] [Accepted: 08/01/2022] [Indexed: 02/03/2023]
Abstract
Short tandem repeats (STRs) are units of 1-6 bp that repeat in a tandem fashion in DNA. Along with single nucleotide polymorphisms and large structural variations, they are among the major genomic variants underlying genetic, and likely phenotypic, divergence. STRs experience mutation rates that are orders of magnitude higher than other well-studied genotypic variants. Frequent copy number changes result in a wide range of alleles, and provide unique opportunities for modulating complex phenotypes through variation in repeat length. While classical studies have identified key roles of individual STR loci, the advent of improved sequencing technology, high-quality genome assemblies for diverse species, and bioinformatics methods for genome-wide STR analysis now enable more systematic study of STR variation across wide evolutionary ranges. In this review, we explore mutation and selection processes that affect STR copy number evolution, and how these processes give rise to varying STR patterns both within and across species. Finally, we review recent examples of functional and adaptive changes linked to STRs.
Collapse
Affiliation(s)
- Max Verbiest
- Institute of Computational Life Sciences, School of Life Sciences and Facility ManagementZürich University of Applied SciencesWädenswilSwitzerland
- Department of Molecular Life SciencesUniversity of ZurichZurichSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Mikhail Maksimov
- Department of Computer Science & EngineeringUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Ye Jin
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of BioengineeringUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Maria Anisimova
- Institute of Computational Life Sciences, School of Life Sciences and Facility ManagementZürich University of Applied SciencesWädenswilSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Melissa Gymrek
- Department of Computer Science & EngineeringUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Tugce Bilgin Sonay
- Institute of Ecology, Evolution and Environmental BiologyColumbia UniversityNew YorkNew YorkUSA
| |
Collapse
|
10
|
Global abundance of short tandem repeats is non-random in rodents and primates. BMC Genom Data 2022; 23:77. [PMID: 36329409 PMCID: PMC9635179 DOI: 10.1186/s12863-022-01092-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 10/18/2022] [Indexed: 11/06/2022] Open
Abstract
Background While of predominant abundance across vertebrate genomes and significant biological implications, the relevance of short tandem repeats (STRs) (also known as microsatellites) to speciation remains largely elusive and attributed to random coincidence for the most part. Here we collected data on the whole-genome abundance of mono-, di-, and trinucleotide STRs in nine species, encompassing rodents and primates, including rat, mouse, olive baboon, gelada, macaque, gorilla, chimpanzee, bonobo, and human. The collected data were used to analyze hierarchical clustering of the STR abundances in the selected species. Results We found massive differential STR abundances between the rodent and primate orders. In addition, while numerous STRs had random abundance across the nine selected species, the global abundance conformed to three consistent < clusters>, as follows: <rat, mouse>, <gelada, macaque, olive baboon>, and <gorilla, chimpanzee, bonobo, human>, which coincided with the phylogenetic distances of the selected species (p < 4E-05). Exceptionally, in the trinucleotide STR compartment, human was significantly distant from all other species. Conclusion Based on hierarchical clustering, we propose that the global abundance of STRs is non-random in rodents and primates, and probably had a determining impact on the speciation of the two orders. We also propose the STRs and STR lengths, which predominantly conformed to the phylogeny of the selected species, exemplified by (t)10, (ct)6, and (taa4). Phylogenetic and experimental platforms are warranted to further examine the observed patterns and the biological mechanisms associated with those STRs.
Collapse
|
11
|
A (GCC) repeat in SBF1 reveals a novel biological phenomenon in human and links to late onset neurocognitive disorder. Sci Rep 2022; 12:15480. [PMID: 36104480 PMCID: PMC9474449 DOI: 10.1038/s41598-022-19878-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2022] [Accepted: 09/06/2022] [Indexed: 12/22/2022] Open
Abstract
The human SBF1 (SET binding factor 1) gene, alternatively known as MTMR5, is predominantly expressed in the brain, and its epigenetic dysregulation is linked to late-onset neurocognitive disorders (NCDs), such as Alzheimer’s disease. This gene contains a (GCC)-repeat at the interval between + 1 and + 60 of the transcription start site (SBF1-202 ENST00000380817.8). We sequenced the SBF1 (GCC)-repeat in a sample of 542 Iranian individuals, consisting of late-onset NCDs (N = 260) and controls (N = 282). While multiple alleles were detected at this locus, the 8 and 9 repeats were predominantly abundant, forming > 95% of the allele pool across the two groups. Among a number of anomalies, the allele distribution was significantly different in the NCD group versus controls (Fisher’s exact p = 0.006), primarily as a result of enrichment of the 8-repeat in the former. The genotype distribution departed from the Hardy–Weinberg principle in both groups (p < 0.001), and was significantly different between the two groups (Fisher’s exact p = 0.001). We detected significantly low frequency of the 8/9 genotype in both groups, higher frequency of this genotype in the NCD group, and reverse order of 8/8 versus 9/9 genotypes in the NCD group versus controls. Biased heterozygous/heterozygous ratios were also detected for the 6/8 versus 6/9 genotypes (in favor of 6/8) across the human samples studied (Fisher’s exact p = 0.0001). Bioinformatics studies revealed that the number of (GCC)-repeats may change the RNA secondary structure and interaction sites at least across human exon 1. This STR was specifically expanded beyond 2-repeats in primates. In conclusion, we report indication of a novel biological phenomenon, in which there is selection against certain heterozygous genotypes at a STR locus in human. We also report different allele and genotype distribution at this STR locus in late-onset NCD versus controls. In view of the location of this STR in the 5′ untranslated region, RNA/RNA or RNA/DNA heterodimer formation of the involved genotypes and alternative RNA processing and/or translation should be considered.
Collapse
|
12
|
Lepais O, Aissi A, Véla E, Beghami Y. Joint analysis of microsatellites and flanking sequences enlightens complex demographic history of interspecific gene flow and vicariance in rear-edge oak populations. Heredity (Edinb) 2022; 129:169-182. [PMID: 35725763 PMCID: PMC9411615 DOI: 10.1038/s41437-022-00550-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 06/10/2022] [Accepted: 06/10/2022] [Indexed: 12/25/2022] Open
Abstract
Inference of recent population divergence requires fast evolving markers and necessitates to differentiate shared genetic variation caused by ancestral polymorphism and gene flow. Theoretical research shows that the use of compound marker systems integrating linked polymorphisms with different mutational dynamics, such as a microsatellite and its flanking sequences, can improve estimation of population structure and inference of demographic history, especially in the case of complex population dynamics. However, empirical application in natural populations has so far been limited by lack of suitable methods for data collection. A solution comes from the development of sequence-based microsatellite genotyping which we used to study molecular variation at 36 sequenced nuclear microsatellites in seven Quercus canariensis and four Q. faginea rear-edge populations across Algeria. We aim to decipher their taxonomic relationship, past evolutionary history and recent demographic trajectory. First, we compare the estimation of population genetics parameters and simulation-based inference of demographic history from microsatellite sequence alone, flanking sequence alone or the combination of linked microsatellite and flanking sequence variation. Second, we apply random forest approximate Bayesian computation to identify which of these sequence types is most informative. Whereas analysing microsatellite variation alone indicates recent interspecific gene flow, additional information gained by integrating nucleotide variation in flanking sequences, by reducing homoplasy, suggests ancient interspecific gene flow followed by drift in isolation instead. The weight of each polymorphism in the inference also demonstrates the value of linked variations with contrasted mutation dynamic to improve estimation of both demographic and mutational parameters.
Collapse
Affiliation(s)
- Olivier Lepais
- Univ. Bordeaux, INRAE, BIOGECO, F-33610, Cestas, France.
| | | | - Errol Véla
- AMAP, Université de Montpellier/CIRAD/CNRS/INRA/IRD, Montpellier, France
| | - Yassine Beghami
- LAPAPEZA, Université Batna 1 Hadj Lakhdar, ISVSA, Batna, Algeria
| |
Collapse
|
13
|
Kitano J, Ishikawa A, Ravinet M, Courtier-Orgogozo V. Genetic basis of speciation and adaptation: from loci to causative mutations. Philos Trans R Soc Lond B Biol Sci 2022; 377:20200503. [PMID: 35634921 PMCID: PMC9149796 DOI: 10.1098/rstb.2020.0503] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Does evolution proceed in small steps or large leaps? How repeatable is evolution? How constrained is the evolutionary process? Answering these long-standing questions in evolutionary biology is indispensable for both understanding how extant biodiversity has evolved and predicting how organisms and ecosystems will respond to changing environments in the future. Understanding the genetic basis of phenotypic diversification and speciation in natural populations is key to properly answering these questions. The leap forward in genome sequencing technologies has made it increasingly easier to not only investigate the genetic architecture but also identify the variant sites underlying adaptation and speciation in natural populations. Furthermore, recent advances in genome editing technologies are making it possible to investigate the functions of each candidate gene in organisms from natural populations. In this article, we discuss how these recent technological advances enable the analysis of causative genes and mutations and how such analysis can help answer long-standing evolutionary biology questions. This article is part of the theme issue ‘Genetic basis of adaptation and speciation: from loci to causative mutations’.
Collapse
Affiliation(s)
- Jun Kitano
- Ecological Genetics Laboratory, National Institute of Genetics, Yata 1111, Mishima, Shizuoka 411-8540, Japan
| | - Asano Ishikawa
- Ecological Genetics Laboratory, National Institute of Genetics, Yata 1111, Mishima, Shizuoka 411-8540, Japan
- Laboratory of Molecular Ecological Genetics, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwanoha 5-1-5, Chiba 277-8562, Japan
| | - Mark Ravinet
- School of Life Sciences, University of Nottingham, Nottingham NG7 2RD, UK
| | | |
Collapse
|
14
|
Ho EKH, Schaack S. Intraspecific Variation in the Rates of Mutations Causing Structural Variation in Daphnia magna. Genome Biol Evol 2021; 13:6444992. [PMID: 34849778 PMCID: PMC8691059 DOI: 10.1093/gbe/evab241] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/21/2021] [Indexed: 12/17/2022] Open
Abstract
Mutations that cause structural variation are important sources of genetic variation upon which other evolutionary forces can act, however, they are difficult to observe and therefore few direct estimates of their rate and spectrum are available. Understanding mutation rate evolution, however, requires adding to the limited number of species for which direct estimates are available, quantifying levels of intraspecific variation in mutation rates, and assessing whether rate estimates co-vary across types of mutation. Here, we report structural variation-causing mutation rates (svcMRs) for six categories of mutations (short insertions and deletions, long deletions and duplications, and deletions and duplications at copy number variable sites) from nine genotypes of Daphnia magna collected from three populations in Finland, Germany, and Israel using a mutation accumulation approach. Based on whole-genome sequence data and validated using simulations, we find svcMRs are high (two orders of magnitude higher than base substitution mutation rates measured in the same lineages), highly variable among populations, and uncorrelated across categories of mutation. Furthermore, to assess the impact of scvMRs on the genome, we calculated rates while adjusting for the lengths of events and ran simulations to determine if the mutations occur in genic regions more or less frequently than expected by chance. Our results pose a challenge to most prevailing theories aimed at explaining the evolution of the mutation rate, underscoring the importance of obtaining additional mutation rate estimates in more genotypes, for more types of mutation, in more species, in order to improve our future understanding of mutation rates, their variation, and their evolution.
Collapse
Affiliation(s)
- Eddie K H Ho
- Department of Biology, Reed College, Portland, Oregon, USA
| | - Sarah Schaack
- Department of Biology, Reed College, Portland, Oregon, USA
| |
Collapse
|
15
|
Xu X, Wang BS, Yu H. Intraspecies Genomic Divergence of a Fig Wasp Species Is Due to Geographical Barrier and Adaptation. Front Ecol Evol 2021. [DOI: 10.3389/fevo.2021.764828] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Understanding how intraspecies divergence results in speciation has great importance for our knowledge of evolutionary biology. Here we applied population genomics approaches to a fig wasp species (Valisia javana complex sp 1) to reveal its intraspecies differentiation and the underlying evolutionary dynamics. With re-sequencing data, we prove the Hainan Island population (DA) of sp1 genetically differ from the continental ones, then reveal the differed divergence pattern. DA has reduced SNP diversity but a higher proportion of population-specific structural variations (SVs), implying a restricted gene exchange. Based on SNPs, 32 differentiated islands containing 204 genes were detected, along with 1,532 population-specific SVs of DA overlapping 4,141 genes. The gene ontology (GO) enrichment analysis performed on differentiated islands linked to three significant GO terms on a basic metabolism process, with most of the genes failing to enrich. In contrast, population-specific SVs contributed more to the adaptation than the SNPs by linking to 59 terms that are crucial for wasp speciation, such as host reorganization and development regulation. In addition, the generalized dissimilarity modeling confirms the importance of environment difference on the genetic divergence within sp1. Hence, we assume the genetic divergence between DA and the continent due to not only the strait as a geographic barrier, but also adaptation. We reconstruct the demographic history within sp1. DA shares a similar population history with the nearby continental population, suggesting an incomplete divergence. Summarily, our results reveal how geographic barriers and adaptation both influence the genetic divergence at population-level, thereby increasing our knowledge on the potential speciation of non-model organisms.
Collapse
|
16
|
Ho EKH, Bellis ES, Calkins J, Adrion JR, Latta IV LC, Schaack S. Engines of change: Transposable element mutation rates are high and variable within Daphnia magna. PLoS Genet 2021; 17:e1009827. [PMID: 34723969 PMCID: PMC8594854 DOI: 10.1371/journal.pgen.1009827] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 11/16/2021] [Accepted: 09/16/2021] [Indexed: 12/22/2022] Open
Abstract
Transposable elements (TEs) represent a major portion of most eukaryotic genomes, yet little is known about their mutation rates or how their activity is shaped by other evolutionary forces. Here, we compare short- and long-term patterns of genome-wide mutation accumulation (MA) of TEs among 9 genotypes from three populations of Daphnia magna from across a latitudinal gradient. While the overall proportion of the genome comprised of TEs is highly similar among genotypes from Finland, Germany, and Israel, populations are distinguishable based on patterns of insertion site polymorphism. Our direct rate estimates indicate TE movement is highly variable (net rates ranging from -11.98 to 12.79 x 10-5 per copy per generation among genotypes), differing both among populations and TE families. Although gains outnumber losses when selection is minimized, both types of events appear to be highly deleterious based on their low frequency in control lines where propagation is not limited to random, single-progeny descent. With rate estimates 4 orders of magnitude higher than base substitutions, TEs clearly represent a highly mutagenic force in the genome. Quantifying patterns of intra- and interspecific variation in TE mobility with and without selection provides insight into a powerful mechanism generating genetic variation in the genome.
Collapse
Affiliation(s)
- Eddie K. H. Ho
- Department of Biology, Reed College, Portland, Oregon, United States of America
| | - Emily S. Bellis
- Department of Biology, Reed College, Portland, Oregon, United States of America
- Department of Computer Science, Arkansas State University, Jonesboro, Arkansas, United States of America
| | - Jaclyn Calkins
- Department of Biology, Reed College, Portland, Oregon, United States of America
- College of Human Medicine, Michigan State University, East Lansing, Michigan, United States of America
| | - Jeffrey R. Adrion
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon, United States of America
| | - Leigh C. Latta IV
- Department of Biology, Reed College, Portland, Oregon, United States of America
- Lewis-Clark State College, Lewiston, Idaho, United States of America
| | - Sarah Schaack
- Department of Biology, Reed College, Portland, Oregon, United States of America
| |
Collapse
|
17
|
Novel implications of a strictly monomorphic (GCC) repeat in the human PRKACB gene. Sci Rep 2021; 11:20629. [PMID: 34667254 PMCID: PMC8526596 DOI: 10.1038/s41598-021-99932-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 10/05/2021] [Indexed: 02/07/2023] Open
Abstract
PRKACB (Protein Kinase CAMP-Activated Catalytic Subunit Beta) is predominantly expressed in the brain, and regulation of this gene links to neuroprotective effects against tau and Aβ-induced toxicity. Here we studied a (GCC)-repeat spanning the core promoter and 5′ UTR of this gene in 300 human subjects, consisting of late-onset neurocognitive disorder (NCD) (N = 150) and controls (N = 150). We also implemented several models to study the impact of this repeat on the three-dimensional (3D) structure of DNA. While the PRKACB (GCC)-repeat was strictly monomorphic at 7-repeats, we detected two 7/8 genotypes only in the NCD group. In all examined models, the (GCC)7 and its periodicals had the least range of divergence variation on the 3D structure of DNA in comparison to the 8-repeat periodicals and several hypothetical repeat lengths. A similar inert effect on the 3D structure was not detected in other classes of short tandem repeats (STRs) such as GA and CA repeats. In conclusion, we report monomorphism of a long (GCC)-repeat in the PRKACB gene in human, its inert effect on DNA structure, and enriched divergence in late-onset NCD. This is the first indication of natural selection for a monomorphic (GCC)-repeat, which probably evolved to function as an “epigenetic knob”, without changing the regional DNA structure.
Collapse
|
18
|
McElroy KE, Müller S, Lamatsch DK, Bankers L, Fields PD, Jalinsky JR, Sharbrough J, Boore JL, Logsdon JM, Neiman M. Asexuality Associated with Marked Genomic Expansion of Tandemly Repeated rRNA and Histone Genes. Mol Biol Evol 2021; 38:3581-3592. [PMID: 33885820 PMCID: PMC8382920 DOI: 10.1093/molbev/msab121] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
How does asexual reproduction influence genome evolution? Although is it clear that genomic structural variation is common and important in natural populations, we know very little about how one of the most fundamental of eukaryotic traits-mode of genomic inheritance-influences genome structure. We address this question with the New Zealand freshwater snail Potamopyrgus antipodarum, which features multiple separately derived obligately asexual lineages that coexist and compete with otherwise similar sexual lineages. We used whole-genome sequencing reads from a diverse set of sexual and asexual individuals to analyze genomic abundance of a critically important gene family, rDNA (the genes encoding rRNAs), that is notable for dynamic and variable copy number. Our genomic survey of rDNA in P. antipodarum revealed two striking results. First, the core histone and 5S rRNA genes occur between tandem copies of the 18S-5.8S-28S gene cluster, a unique architecture for these crucial gene families. Second, asexual P. antipodarum harbor dramatically more rDNA-histone copies than sexuals, which we validated through molecular and cytogenetic analysis. The repeated expansion of this genomic region in asexual P. antipodarum lineages following distinct transitions to asexuality represents a dramatic genome structural change associated with asexual reproduction-with potential functional consequences related to the loss of sexual reproduction.
Collapse
Affiliation(s)
- Kyle E McElroy
- Ecology, Evolutionary, and Organismal Biology, Iowa State University, Ames, IA, USA
- Department of Biology, University of Iowa, Iowa City, IA, USA
| | - Stefan Müller
- Institute of Human Genetics, Munich University Hospital, Ludwig-Maximilians University, Munich, Germany
| | - Dunja K Lamatsch
- Research Department for Limnology, University of Innsbruck, Mondsee, Mondsee, Austria
| | - Laura Bankers
- Division of Infectious Diseases, University of Colorado—Anschutz Medical Campus, Aurora, CO, USA
| | - Peter D Fields
- Department of Environmental Sciences, Zoology, University of Basel, Basel, Switzerland
| | | | - Joel Sharbrough
- Biology Department, New Mexico Institute of Mining and Technology, Socorro, NM, USA
- Department of Biology, Colorado State University, Fort Collins, CO, USA
| | - Jeffrey L Boore
- Providence St. Joseph Health and Institute for Systems Biology, Seattle, WA, USA
| | - John M Logsdon
- Department of Biology, University of Iowa, Iowa City, IA, USA
| | - Maurine Neiman
- Department of Biology, University of Iowa, Iowa City, IA, USA
- Department of Gender, Women's, and Sexuality Studies, University of Iowa, Iowa City, IA, USA
| |
Collapse
|
19
|
Salim D, Bradford WD, Rubinstein B, Gerton JL. DNA replication, transcription, and H3K56 acetylation regulate copy number and stability at tandem repeats. G3-GENES GENOMES GENETICS 2021; 11:6174693. [PMID: 33729510 DOI: 10.1093/g3journal/jkab082] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Accepted: 02/26/2021] [Indexed: 11/13/2022]
Abstract
Tandem repeats are inherently unstable and exhibit extensive copy number polymorphisms. Despite mounting evidence for their adaptive potential, the mechanisms associated with regulation of the stability and copy number of tandem repeats remain largely unclear. To study copy number variation at tandem repeats, we used two well-studied repetitive arrays in the budding yeast genome, the ribosomal DNA (rDNA) locus, and the copper-inducible CUP1 gene array. We developed powerful, highly sensitive, and quantitative assays to measure repeat instability and copy number and used them in multiple high-throughput genetic screens to define pathways involved in regulating copy number variation. These screens revealed that rDNA stability and copy number are regulated by DNA replication, transcription, and histone acetylation. Through parallel studies of both arrays, we demonstrate that instability can be induced by DNA replication stress and transcription. Importantly, while changes in stability in response to stress are observed within a few cell divisions, a change in steady state repeat copy number requires selection over time. Further, H3K56 acetylation is required for regulating transcription and transcription-induced instability at the CUP1 array, and restricts transcription-induced amplification. Our work suggests that the modulation of replication and transcription is a direct, reversible strategy to alter stability at tandem repeats in response to environmental stimuli, which provides cells rapid adaptability through copy number variation. Additionally, histone acetylation may function to promote the normal adaptive program in response to transcriptional stress. Given the omnipresence of DNA replication, transcription, and chromatin marks like histone acetylation, the fundamental mechanisms we have uncovered significantly advance our understanding of the plasticity of tandem repeats more generally.
Collapse
Affiliation(s)
- Devika Salim
- Stowers Institute for Medical Research, Kansas City, MO 64110, United States of America.,Open University, Milton Keynes MK7 6BJ, United Kingdom
| | - William D Bradford
- Stowers Institute for Medical Research, Kansas City, MO 64110, United States of America
| | - Boris Rubinstein
- Stowers Institute for Medical Research, Kansas City, MO 64110, United States of America
| | - Jennifer L Gerton
- Stowers Institute for Medical Research, Kansas City, MO 64110, United States of America.,Department of Biochemistry and Molecular Biology, University of Kansas Medical Center, Kansas City, KS 66160, United States of America
| |
Collapse
|
20
|
Thousands of high-quality sequencing samples fail to show meaningful correlation between 5S and 45S ribosomal DNA arrays in humans. Sci Rep 2021; 11:449. [PMID: 33432083 PMCID: PMC7801704 DOI: 10.1038/s41598-020-80049-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Accepted: 12/15/2020] [Indexed: 11/08/2022] Open
Abstract
The ribosomal RNA genes (rDNA) are tandemly arrayed in most eukaryotes and exhibit vast copy number variation. There is growing interest in integrating this variation into genotype-phenotype associations. Here, we explored a possible association of rDNA copy number variation with autism spectrum disorder and found no difference between probands and unaffected siblings. Because short-read sequencing estimates of rDNA copy number are error prone, we sought to validate our 45S estimates. Previous studies reported tightly correlated, concerted copy number variation between the 45S and 5S arrays, which should enable the validation of 45S copy number estimates with pulsed-field gel-verified 5S copy numbers. Here, we show that the previously reported strong concerted copy number variation may be an artifact of variable data quality in the earlier published 1000 Genomes Project sequences. We failed to detect a meaningful correlation between 45S and 5S copy numbers in thousands of samples from the high-coverage Simons Simplex Collection dataset as well as in the recent high-coverage 1000 Genomes Project sequences. Our findings illustrate the challenge of genotyping repetitive DNA regions accurately and call into question the accuracy of recently published studies of rDNA copy number variation in cancer that relied on diverse publicly available resources for sequence data.
Collapse
|
21
|
Afshar H, Adelirad F, Kowsari A, Kalhor N, Delbari A, Najafipour R, Foroughan M, Bozorgmehr A, Khamse S, Nazaripanah N, Ohadi M. Natural Selection at the NHLH2 Core Promoter Exceptionally Long CA-Repeat in Human and Disease-Only Genotypes in Late-Onset Neurocognitive Disorder. Gerontology 2020; 66:514-522. [PMID: 32877896 DOI: 10.1159/000509471] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2020] [Accepted: 06/17/2020] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Approximately 2% of the human core promoter short tandem repeats (STRs) reach lengths of ≥6 repeats, which may in part be a result of adaptive evolutionary processes and natural selection. A single-exon transcript of the human nescient helix loop helix 2 (NHLH2) gene is flanked by the longest CA-repeat detected in a human protein-coding gene core promoter (Ensembl transcript ID: ENST00000369506.1). NHLH2 is involved in several biological and pathological pathways, such as motivated exercise, obesity, and diabetes. METHODS The allele and genotype distribution of the NHLH2 CA-repeat were investigated by sequencing in 655 Iranian subjects, consisting of late-onset neurocognitive disorder (NCD) as a clinical entity (n = 290) and matched controls (n = 365). The evolutionary trend of the CA-repeat was also studied across vertebrates. RESULTS The allele range was between 9 and 25 repeats in the NCD cases, and 12 and 24 repeats in the controls. At the frequency of 0.56, the 21-repeat allele was the predominant allele in the controls. While the 21-repeat was also the predominant allele in the NCD patients, we detected significant decline of the frequency (p < 0.0001) and homozygosity (p < 0.006) of this allele in this group. Furthermore, 12 genotypes were detected across 16 patients (5.5% of the entire NCD sample) and not in the controls (disease-only genotypes; p < 0.0003), consisting of at least one extreme allele. The extreme alleles were at 9, 12, 13, 18, and 19 repeats (extreme short end), and 23, 24, and 25 repeats (extreme long end), and their frequencies ranged between 0.001 and 0.04. The frequency of the 21-repeat allele significantly dropped to 0.09 in the disease-only genotype compartment (p < 0.0001). Evolutionarily, while the maximum length of the NHLH2 CA-repeat was 11 repeats in non-primates, this CA-repeat was ≥14 repeats in primates and reached maximum length in human. CONCLUSION We propose a novel locus for late-onset NCD at the NHLH2 core promoter exceptionally long CA-STR and natural selection at this locus. Furthermore, there was indication of genotypes at this locus that unambiguously linked to late-onset NCD. This is the first instance of natural selection in favor of a predominantly abundant STR allele in human and its differential distribution in late-onset NCD.
Collapse
Affiliation(s)
- Hossein Afshar
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Fatemeh Adelirad
- Department of Health Education and Promotion, Faculty of Health Sciences Tabriz University of Medical Sciences, Tabriz, Iran
| | - Ali Kowsari
- Department of Mesenchymal Stem Cell, The Academic Center for Education, Culture and Research, Qom, Iran
| | - Naser Kalhor
- Department of Mesenchymal Stem Cell, The Academic Center for Education, Culture and Research, Qom, Iran
| | - Ahmad Delbari
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Reza Najafipour
- Cellular and Molecular Research Centre, Research Institute for Prevention of Non Communicable Disease, Qazvin University of Medical Sciences, Qazvin, Iran
| | - Mahshid Foroughan
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Ali Bozorgmehr
- Department of Neuroscience, Faculty of Advanced Technologies in Medicine, Iran University of Medical Sciences, Tehran, Iran
| | - Safoura Khamse
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Neda Nazaripanah
- Department of Health Education and Promotion, Faculty of Health Sciences Tabriz University of Medical Sciences, Tabriz, Iran
| | - Mina Ohadi
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran,
| |
Collapse
|
22
|
Blount ZD, Maddamsetti R, Grant NA, Ahmed ST, Jagdish T, Baxter JA, Sommerfeld BA, Tillman A, Moore J, Slonczewski JL, Barrick JE, Lenski RE. Genomic and phenotypic evolution of Escherichia coli in a novel citrate-only resource environment. eLife 2020; 9:55414. [PMID: 32469311 PMCID: PMC7299349 DOI: 10.7554/elife.55414] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Accepted: 05/28/2020] [Indexed: 12/27/2022] Open
Abstract
Evolutionary innovations allow populations to colonize new ecological niches. We previously reported that aerobic growth on citrate (Cit+) evolved in an Escherichia coli population during adaptation to a minimal glucose medium containing citrate (DM25). Cit+ variants can also grow in citrate-only medium (DM0), a novel environment for E. coli. To study adaptation to this niche, we founded two sets of Cit+ populations and evolved them for 2500 generations in DM0 or DM25. The evolved lineages acquired numerous parallel mutations, many mediated by transposable elements. Several also evolved amplifications of regions containing the maeA gene. Unexpectedly, some evolved populations and clones show apparent declines in fitness. We also found evidence of substantial cell death in Cit+ clones. Our results thus demonstrate rapid trait refinement and adaptation to the new citrate niche, while also suggesting a recalcitrant mismatch between E. coli physiology and growth on citrate.
Collapse
Affiliation(s)
- Zachary D Blount
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, United States.,The BEACON Center for the Study of Evolution in Action, East Lansing, United States
| | - Rohan Maddamsetti
- Department of Biomedical Engineering, Duke University, Durham, United States
| | - Nkrumah A Grant
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, United States.,The BEACON Center for the Study of Evolution in Action, East Lansing, United States
| | - Sumaya T Ahmed
- Department of Biology, Kenyon College, Gambier, United States
| | - Tanush Jagdish
- The BEACON Center for the Study of Evolution in Action, East Lansing, United States.,Program for Systems, Synthetic, and Quantitative Biology, Harvard University, Cambridge, United States
| | - Jessica A Baxter
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, United States
| | - Brooke A Sommerfeld
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, United States
| | - Alice Tillman
- Department of Biology, Kenyon College, Gambier, United States
| | - Jeremy Moore
- Department of Biology, Kenyon College, Gambier, United States
| | | | - Jeffrey E Barrick
- The BEACON Center for the Study of Evolution in Action, East Lansing, United States.,Department of Molecular Biosciences, The University of Texas, Austin, United States
| | - Richard E Lenski
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, United States.,The BEACON Center for the Study of Evolution in Action, East Lansing, United States
| |
Collapse
|
23
|
Abstract
Individuals within a species can exhibit vast variation in copy number of repetitive DNA elements. This variation may contribute to complex traits such as lifespan and disease, yet it is only infrequently considered in genotype-phenotype associations. Although the possible importance of copy number variation is widely recognized, accurate copy number quantification remains challenging. Here, we assess the technical reproducibility of several major methods for copy number estimation as they apply to the large repetitive ribosomal DNA array (rDNA). rDNA encodes the ribosomal RNAs and exists as a tandem gene array in all eukaryotes. Repeat units of rDNA are kilobases in size, often with several hundred units comprising the array, making rDNA particularly intractable to common quantification techniques. We evaluate pulsed-field gel electrophoresis, droplet digital PCR, and Nextera-based whole genome sequencing as approaches to copy number estimation, comparing techniques across model organisms and spanning wide ranges of copy numbers. Nextera-based whole genome sequencing, though commonly used in recent literature, produced high error. We explore possible causes for this error and provide recommendations for best practices in rDNA copy number estimation. We present a resource of high-confidence rDNA copy number estimates for a set of S. cerevisiae and C. elegans strains for future use. We furthermore explore the possibility for FISH-based copy number estimation, an alternative that could potentially characterize copy number on a cellular level.
Collapse
|
24
|
Macko-Podgórni A, Stelmach K, Kwolek K, Grzebelus D. Stowaway miniature inverted repeat transposable elements are important agents driving recent genomic diversity in wild and cultivated carrot. Mob DNA 2019; 10:47. [PMID: 31798695 PMCID: PMC6881990 DOI: 10.1186/s13100-019-0190-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 11/21/2019] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Miniature inverted repeat transposable elements (MITEs) are small non-autonomous DNA transposons that are ubiquitous in plant genomes, and are mobilised by their autonomous relatives. Stowaway MITEs are derived from and mobilised by elements from the mariner superfamily. Those elements constitute a significant portion of the carrot genome; however the variation caused by Daucus carota Stowaway MITEs (DcStos), their association with genes and their putative impact on genome evolution has not been comprehensively analysed. RESULTS Fourteen families of Stowaway elements DcStos occupy about 0.5% of the carrot genome. We systematically analysed 31 genomes of wild and cultivated Daucus carota, yielding 18.5 thousand copies of these elements, showing remarkable insertion site polymorphism. DcSto element demography differed based on the origin of the host populations, and corresponded with the four major groups of D. carota, wild European, wild Asian, eastern cultivated and western cultivated. The DcStos elements were associated with genes, and most frequently occurred in 5' and 3' untranslated regions (UTRs). Individual families differed in their propensity to reside in particular segments of genes. Most importantly, DcSto copies in the 2 kb regions up- and downstream of genes were more frequently associated with open reading frames encoding transcription factors, suggesting their possible functional impact. More than 1.5% of all DcSto insertion sites in different host genomes contained different copies in exactly the same position, indicating the existence of insertional hotspots. The DcSto7b family was much more polymorphic than the other families in cultivated carrot. A line of evidence pointed at its activity in the course of carrot domestication, and identified Dcmar1 as an active carrot mariner element and a possible source of the transposition machinery for DcSto7b. CONCLUSION Stowaway MITEs have made a substantial contribution to the structural and functional variability of the carrot genome.
Collapse
Affiliation(s)
- Alicja Macko-Podgórni
- Institute of Plant Biology and Biotechnology, Faculty of Biotechnology and Horticulture, University of Agriculture in Krakow, 31425 Krakow, Poland
| | - Katarzyna Stelmach
- Institute of Plant Biology and Biotechnology, Faculty of Biotechnology and Horticulture, University of Agriculture in Krakow, 31425 Krakow, Poland
| | - Kornelia Kwolek
- Institute of Plant Biology and Biotechnology, Faculty of Biotechnology and Horticulture, University of Agriculture in Krakow, 31425 Krakow, Poland
| | - Dariusz Grzebelus
- Institute of Plant Biology and Biotechnology, Faculty of Biotechnology and Horticulture, University of Agriculture in Krakow, 31425 Krakow, Poland
| |
Collapse
|