1
|
King DG. Evolving a favorable distribution for mutation effects. Trends Genet 2024; 40:819-821. [PMID: 39278786 DOI: 10.1016/j.tig.2024.07.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Revised: 07/23/2024] [Accepted: 07/24/2024] [Indexed: 09/18/2024]
Abstract
Tandem-repeat DNA sequences appear to be singularly capable of yielding abundant repeat-number mutations with a potentially advantageous distribution of fitness effects. Although knowing the rates and relative proportions of deleterious, neutral and beneficial mutations is fundamental for understanding evolvability, analysis of adaptation routinely overlooks small-effect mutations arising in tandem repeats.
Collapse
Affiliation(s)
- David G King
- Department of Anatomy, School of Medicine, Southern Illinois University Carbondale, Carbondale, IL, USA; Department of Zoology, College of Agricultural, Life, and Physical Sciences, Southern Illinois University Carbondale, Carbondale, IL, USA.
| |
Collapse
|
2
|
Ranathunge C, Welch ME. Clinal Variation in Short Tandem Repeats Linked to Gene Expression in Sunflower ( Helianthus annuus L.). Biomolecules 2024; 14:944. [PMID: 39199332 PMCID: PMC11352406 DOI: 10.3390/biom14080944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 07/25/2024] [Accepted: 08/01/2024] [Indexed: 09/01/2024] Open
Abstract
Short tandem repeat (STR) variation is rarely explored as a contributor to adaptive evolution. An intriguing mechanism involving STRs suggests that STRs function as "tuning knobs" of adaptation whereby stepwise changes in STR allele length have stepwise effects on phenotypes. Previously, we tested the predictions of the "tuning knob" model at the gene expression level by conducting an RNA-Seq experiment on natural populations of common sunflower (Helianthus annuus L.) transecting a well-defined cline from Kansas to Oklahoma. We identified 479 STRs with significant allele length effects on gene expression (eSTRs). In this study, we expanded the range to populations further north and south of the focal populations and used a targeted approach to study the relationship between STR allele length and gene expression in five selected eSTRs. Seeds from 96 individuals from six natural populations of sunflower from Nebraska and Texas were grown in a common garden. The individuals were genotyped at the five eSTRs, and gene expression was quantified with qRT-PCR. Linear regression models identified that eSTR length in comp26672 was significantly correlated with gene expression. Further, the length of comp26672 eSTR was significantly correlated with latitude across the range from Nebraska to Texas. The eSTR locus comp26672 was located in the CHUP1 gene, a gene associated with chloroplast movement in response to light intensity, which suggests a potential adaptive role for the eSTR locus. Collectively, our results from this targeted study show a consistent relationship between allele length and gene expression in some eSTRs across a broad geographical range in sunflower and suggest that some eSTRs may contribute to adaptive traits in common sunflower.
Collapse
|
3
|
King DG. Mutation protocols share with sexual reproduction the physiological role of producing genetic variation within 'constraints that deconstrain'. J Physiol 2024; 602:2615-2626. [PMID: 38178567 DOI: 10.1113/jp285478] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 12/14/2023] [Indexed: 01/06/2024] Open
Abstract
Because the universe of possible DNA sequences is inconceivably vast, organisms have evolved mechanisms for exploring DNA sequence space while substantially reducing the hazard that would otherwise accrue to any process of random, accidental mutation. One such mechanism is meiotic recombination. Although sexual reproduction imposes a seemingly paradoxical 50% cost to fitness, sex evidently prevails because this cost is outweighed by the advantage of equipping offspring with genetic variation to accommodate environmental vicissitudes. The potential adaptive utility of additional mechanisms for producing genetic variation has long been obscured by a presumption that the vast majority of mutations are deleterious. Perhaps surprisingly, the probability for adaptive variation can be increased by several mechanisms that generate mutations abundantly. Such mechanisms, here called 'mutation protocols', implement implicit 'constraints that deconstrain'. Like meiotic recombination, they produce genetic variation in forms that minimize potential for harm while providing a reasonably high probability for benefit. One example is replication slippage of simple sequence repeats (SSRs); this process yields abundant, reversible mutations, typically with small quantitative effect on phenotype. This enables SSRs to function as adjustable 'tuning knobs'. There exists a clear pathway for SSRs to be shaped through indirect selection favouring their implicit tuning-knob protocol. Several other molecular mechanisms comprise probable components of additional mutation protocols. Biologists might plausibly regard such mechanisms of mutation not primarily as sources of deleterious genetic mistakes but also as potentially adaptive processes for 'exploring' DNA sequence space.
Collapse
Affiliation(s)
- David G King
- Department of Anatomy, School of Medicine, Southern Illinois University Carbondale, Carbondale, Illinois, USA
- Department of Zoology, College of Agricultural, Life, and Physical Sciences, Southern Illinois University Carbondale, Carbondale, Illinois, USA
| |
Collapse
|
4
|
Caporale LH. Evolutionary feedback from the environment shapes mechanisms that generate genome variation. J Physiol 2024; 602:2601-2614. [PMID: 38194279 DOI: 10.1113/jp284411] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 12/14/2023] [Indexed: 01/10/2024] Open
Abstract
Darwin recognized that 'a grand and almost untrodden field of inquiry will be opened, on the causes and laws of variation.' However, because the Modern Synthesis assumes that the intrinsic probability of any individual mutation is unrelated to that mutation's potential adaptive value, attention has been focused on selection rather than on the intrinsic generation of variation. Yet many examples illustrate that the term 'random' mutation, as widely understood, is inaccurate. The probabilities of distinct classes of variation are neither evenly distributed across a genome nor invariant over time, nor unrelated to their potential adaptive value. Because selection acts upon variation, multiple biochemical mechanisms can and have evolved that increase the relative probability of adaptive mutations. In effect, the generation of heritable variation is in a feedback loop with selection, such that those mechanisms that tend to generate variants that survive recurring challenges in the environment would be captured by this survival and thus inherited and accumulated within lineages of genomes. Moreover, because genome variation is affected by a wide range of biochemical processes, genome variation can be regulated. Biochemical mechanisms that sense stress, from lack of nutrients to DNA damage, can increase the probability of specific classes of variation. A deeper understanding of evolution involves attention to the evolution of, and environmental influences upon, the intrinsic variation generated in gametes, in other words upon the biochemical mechanisms that generate variation across generations. These concepts have profound implications for the types of questions that can and should be asked, as omics databases become more comprehensive, detection methods more sensitive, and computation and experimental analyses even more high throughput and thus capable of revealing the intrinsic generation of variation in individual gametes. These concepts also have profound implications for evolutionary theory, which, upon reflection it will be argued, predicts that selection would increase the probability of generating adaptive mutations, in other words, predicts that the ability to evolve itself evolves.
Collapse
|
5
|
Liang Y, Hao J, Wang J, Zhang G, Su Y, Liu Z, Wang T. Statistical Genomics Analysis of Simple Sequence Repeats from the Paphiopedilum Malipoense Transcriptome Reveals Control Knob Motifs Modulating Gene Expression. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2304848. [PMID: 38647414 PMCID: PMC11200097 DOI: 10.1002/advs.202304848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 02/26/2024] [Indexed: 04/25/2024]
Abstract
Simple sequence repeats (SSRs) are found in nonrandom distributions in genomes and are thought to impact gene expression. The distribution patterns of 48 295 SSRs of Paphiopedilum malipoense are mined and characterized based on the first full-length transcriptome and comprehensive transcriptome dataset from 12 organs. Statistical genomics analyses are used to investigate how SSRs in transcripts affect gene expression. The results demonstrate the correlations between SSR distributions, characteristics, and expression level. Nine expression-modulating motifs (expMotifs) are identified and a model is proposed to explain the effect of their key features, potency, and gene function on an intra-transcribed region scale. The expMotif-transcribed region combination is the most predominant contributor to the expression-modulating effect of SSRs, and some intra-transcribed regions are critical for this effect. Genes containing the same type of expMotif-SSR elements in the same transcribed region are likely linked in function, regulation, or evolution aspects. This study offers novel evidence to understand how SSRs regulate gene expression and provides potential regulatory elements for plant genetic engineering.
Collapse
Affiliation(s)
- Yingyi Liang
- College of Life SciencesSouth China Agricultural UniversityGuangzhou510642China
| | - Jing Hao
- College of Life SciencesSouth China Agricultural UniversityGuangzhou510642China
| | - Jieyu Wang
- College of Forestry and Landscape ArchitectureSouth China Agricultural UniversityGuangzhou510642China
| | - Guoqiang Zhang
- Key Laboratory of National Forestry and Grassland Administration for Orchid Conservation and Utilization at College of Landscape Architecture and ArtFujian Agriculture and Forestry UniversityFuzhou350002China
| | - Yingjuan Su
- School of Life SciencesSun Yat‐sen UniversityGuangzhou510275China
- Research Institute of Sun Yat‐sen University in ShenzhenShenzhen518107China
| | - Zhong‐Jian Liu
- Key Laboratory of National Forestry and Grassland Administration for Orchid Conservation and Utilization at College of Landscape Architecture and ArtFujian Agriculture and Forestry UniversityFuzhou350002China
| | - Ting Wang
- College of Life SciencesSouth China Agricultural UniversityGuangzhou510642China
| |
Collapse
|
6
|
Sureshkumar S, Bandaranayake C, Lv J, Dent CI, Bhagat PK, Mukherjee S, Sarwade R, Atri C, York HM, Tamizhselvan P, Shamaya N, Folini G, Bergey BG, Yadav AS, Kumar S, Grummisch OS, Saini P, Yadav RK, Arumugam S, Rosonina E, Sadanandom A, Liu H, Balasubramanian S. SUMO protease FUG1, histone reader AL3 and chromodomain protein LHP1 are integral to repeat expansion-induced gene silencing in Arabidopsis thaliana. NATURE PLANTS 2024; 10:749-759. [PMID: 38641663 DOI: 10.1038/s41477-024-01672-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 03/15/2024] [Indexed: 04/21/2024]
Abstract
Epigenetic gene silencing induced by expanded repeats can cause diverse phenotypes ranging from severe growth defects in plants to genetic diseases such as Friedreich's ataxia in humans. The molecular mechanisms underlying repeat expansion-induced epigenetic silencing remain largely unknown. Using a plant model with a temperature-sensitive phenotype, we have previously shown that expanded repeats can induce small RNAs, which in turn can lead to epigenetic silencing through the RNA-dependent DNA methylation pathway. Here, using a genetic suppressor screen and yeast two-hybrid assays, we identified novel components required for epigenetic silencing caused by expanded repeats. We show that FOURTH ULP GENE CLASS 1 (FUG1)-an uncharacterized SUMO protease with no known role in gene silencing-is required for epigenetic silencing caused by expanded repeats. In addition, we demonstrate that FUG1 physically interacts with ALFIN-LIKE 3 (AL3)-a histone reader that is known to bind to active histone mark H3K4me2/3. Loss of function of AL3 abolishes epigenetic silencing caused by expanded repeats. AL3 physically interacts with the chromodomain protein LIKE HETEROCHROMATIN 1 (LHP1)-known to be associated with the spread of the repressive histone mark H3K27me3 to cause repeat expansion-induced epigenetic silencing. Loss of any of these components suppresses repeat expansion-associated phenotypes coupled with an increase in IIL1 expression with the reversal of gene silencing and associated change in epigenetic marks. Our findings suggest that the FUG1-AL3-LHP1 module is essential to confer repeat expansion-associated epigenetic silencing and highlight the importance of post-translational modifiers and histone readers in epigenetic silencing.
Collapse
Affiliation(s)
- Sridevi Sureshkumar
- School of Biological Sciences, Monash University, Clayton Campus, Melbourne, Victoria, Australia.
| | - Champa Bandaranayake
- School of Biological Sciences, Monash University, Clayton Campus, Melbourne, Victoria, Australia
| | - Junqing Lv
- National Key Laboratory of Plant Molecular Genetics, CAS Centre for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
| | - Craig I Dent
- School of Biological Sciences, Monash University, Clayton Campus, Melbourne, Victoria, Australia
| | | | - Sourav Mukherjee
- School of Biological Sciences, Monash University, Clayton Campus, Melbourne, Victoria, Australia
| | - Rucha Sarwade
- School of Biological Sciences, Monash University, Clayton Campus, Melbourne, Victoria, Australia
| | - Chhaya Atri
- School of Biological Sciences, Monash University, Clayton Campus, Melbourne, Victoria, Australia
| | - Harrison M York
- Monash Biomedicine Discovery Institute, Faculty of Medicine, Nursing and Health Sciences, Monash University, Clayton Campus, Melbourne, Victoria, Australia
- European Molecular Biology Laboratory, Australia (EMBL Australia), Monash University, Clayton Campus, Melbourne, Victoria, Australia
| | - Prashanth Tamizhselvan
- School of Biological Sciences, Monash University, Clayton Campus, Melbourne, Victoria, Australia
| | - Nawar Shamaya
- School of Biological Sciences, Monash University, Clayton Campus, Melbourne, Victoria, Australia
| | - Giulia Folini
- School of Biological Sciences, Monash University, Clayton Campus, Melbourne, Victoria, Australia
| | | | - Avilash Singh Yadav
- School of Biological Sciences, Monash University, Clayton Campus, Melbourne, Victoria, Australia
| | - Subhasree Kumar
- School of Biological Sciences, Monash University, Clayton Campus, Melbourne, Victoria, Australia
| | - Oliver S Grummisch
- School of Biological Sciences, Monash University, Clayton Campus, Melbourne, Victoria, Australia
| | - Prince Saini
- Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali, India
| | - Ram K Yadav
- Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali, India
| | - Senthil Arumugam
- Monash Biomedicine Discovery Institute, Faculty of Medicine, Nursing and Health Sciences, Monash University, Clayton Campus, Melbourne, Victoria, Australia
- European Molecular Biology Laboratory, Australia (EMBL Australia), Monash University, Clayton Campus, Melbourne, Victoria, Australia
| | - Emanuel Rosonina
- Department of Biology, York University, Toronto, Ontario, Canada
| | - Ari Sadanandom
- Department of Biosciences, Durham University, Durham, UK
| | - Hongtao Liu
- National Key Laboratory of Plant Molecular Genetics, CAS Centre for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai, China
| | | |
Collapse
|
7
|
Reinar WB, Tørresen OK, Nederbragt AJ, Matschiner M, Jentoft S, Jakobsen KS. Teleost genomic repeat landscapes in light of diversification rates and ecology. Mob DNA 2023; 14:14. [PMID: 37789366 PMCID: PMC10546739 DOI: 10.1186/s13100-023-00302-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 09/20/2023] [Indexed: 10/05/2023] Open
Abstract
Repetitive DNA make up a considerable fraction of most eukaryotic genomes. In fish, transposable element (TE) activity has coincided with rapid species diversification. Here, we annotated the repetitive content in 100 genome assemblies, covering the major branches of the diverse lineage of teleost fish. We investigated if TE content correlates with family level net diversification rates and found support for a weak negative correlation. Further, we demonstrated that TE proportion correlates with genome size, but not to the proportion of short tandem repeats (STRs), which implies independent evolutionary paths. Marine and freshwater fish had large differences in STR content, with the most extreme propagation detected in the genomes of codfish species and Atlantic herring. Such a high density of STRs is likely to increase the mutational load, which we propose could be counterbalanced by high fecundity as seen in codfishes and herring.
Collapse
Affiliation(s)
| | - Ole K Tørresen
- Department of Biosciences, University of Oslo, Oslo, Norway
| | - Alexander J Nederbragt
- Department of Biosciences, University of Oslo, Oslo, Norway
- Department of Informatics, University of Oslo, Oslo, Norway
| | - Michael Matschiner
- Department of Biosciences, University of Oslo, Oslo, Norway
- University of Oslo, Natural History Museum, Oslo, Norway
| | - Sissel Jentoft
- Department of Biosciences, University of Oslo, Oslo, Norway
| | | |
Collapse
|
8
|
Shi Y, Niu Y, Zhang P, Luo H, Liu S, Zhang S, Wang J, Li Y, Liu X, Song T, Xu T, He S. Characterization of genome-wide STR variation in 6487 human genomes. Nat Commun 2023; 14:2092. [PMID: 37045857 PMCID: PMC10097659 DOI: 10.1038/s41467-023-37690-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 03/27/2023] [Indexed: 04/14/2023] Open
Abstract
Short tandem repeats (STRs) are abundant and highly mutagenic in the human genome. Many STR loci have been associated with a range of human genetic disorders. However, most population-scale studies on STR variation in humans have focused on European ancestry cohorts or are limited by sequencing depth. Here, we depicted a comprehensive map of 366,013 polymorphic STRs (pSTRs) constructed from 6487 deeply sequenced genomes, comprising 3983 Chinese samples (~31.5x, NyuWa) and 2504 samples from the 1000 Genomes Project (~33.3x, 1KGP). We found that STR mutations were affected by motif length, chromosome context and epigenetic features. We identified 3273 and 1117 pSTRs whose repeat numbers were associated with gene expression and 3'UTR alternative polyadenylation, respectively. We also implemented population analysis, investigated population differentiated signatures, and genotyped 60 known disease-causing STRs. Overall, this study further extends the scale of STR variation in humans and propels our understanding of the semantics of STRs.
Collapse
Affiliation(s)
- Yirong Shi
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yiwei Niu
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Peng Zhang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Huaxia Luo
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Shuai Liu
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Sijia Zhang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jiajia Wang
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Yanyan Li
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Xinyue Liu
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Tingrui Song
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Tao Xu
- National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
- Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, 250117, Shandong, China.
| | - Shunmin He
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
9
|
Reinar WB, Greulich A, Stø IM, Knutsen JB, Reitan T, Tørresen OK, Jentoft S, Butenko MA, Jakobsen KS. Adaptive protein evolution through length variation of short tandem repeats in Arabidopsis. SCIENCE ADVANCES 2023; 9:eadd6960. [PMID: 36947624 PMCID: PMC10032594 DOI: 10.1126/sciadv.add6960] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 02/22/2023] [Indexed: 06/18/2023]
Abstract
Intrinsically disordered protein regions are of high importance for biotic and abiotic stress responses in plants. Tracts of identical amino acids accumulate in these regions and can vary in length over generations because of expansions and retractions of short tandem repeats at the genomic level. However, little attention has been paid to what extent length variation is shaped by natural selection. By environmental association analysis on 2514 length variable tracts in 770 whole-genome sequenced Arabidopsis thaliana, we show that length variation in glutamine and asparagine amino acid homopolymers, as well as in interaction hotspots, correlate with local bioclimatic habitat. We determined experimentally that the promoter activity of a light-stress gene depended on polyglutamine length variants in a disordered transcription factor. Our results show that length variations affect protein function and are likely adaptive. Length variants modulating protein function at a global genomic scale has implications for understanding protein evolution and eco-evolutionary biology.
Collapse
Affiliation(s)
- William B. Reinar
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, 0316 Oslo, Norway
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, 0316 Oslo, Norway
| | - Anne Greulich
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, 0316 Oslo, Norway
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, 0316 Oslo, Norway
| | - Ida M. Stø
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, 0316 Oslo, Norway
| | - Jonfinn B. Knutsen
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, 0316 Oslo, Norway
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, 0316 Oslo, Norway
| | - Trond Reitan
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, 0316 Oslo, Norway
| | - Ole K. Tørresen
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, 0316 Oslo, Norway
| | - Sissel Jentoft
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, 0316 Oslo, Norway
| | - Melinka A. Butenko
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, 0316 Oslo, Norway
| | - Kjetill S. Jakobsen
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, 0316 Oslo, Norway
| |
Collapse
|
10
|
Hu Y, Wang X, Xu Y, Yang H, Tong Z, Tian R, Xu S, Yu L, Guo Y, Shi P, Huang S, Yang G, Shi S, Wei F. Molecular mechanisms of adaptive evolution in wild animals and plants. SCIENCE CHINA. LIFE SCIENCES 2023; 66:453-495. [PMID: 36648611 PMCID: PMC9843154 DOI: 10.1007/s11427-022-2233-x] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Accepted: 08/30/2022] [Indexed: 01/18/2023]
Abstract
Wild animals and plants have developed a variety of adaptive traits driven by adaptive evolution, an important strategy for species survival and persistence. Uncovering the molecular mechanisms of adaptive evolution is the key to understanding species diversification, phenotypic convergence, and inter-species interaction. As the genome sequences of more and more non-model organisms are becoming available, the focus of studies on molecular mechanisms of adaptive evolution has shifted from the candidate gene method to genetic mapping based on genome-wide scanning. In this study, we reviewed the latest research advances in wild animals and plants, focusing on adaptive traits, convergent evolution, and coevolution. Firstly, we focused on the adaptive evolution of morphological, behavioral, and physiological traits. Secondly, we reviewed the phenotypic convergences of life history traits and responding to environmental pressures, and the underlying molecular convergence mechanisms. Thirdly, we summarized the advances of coevolution, including the four main types: mutualism, parasitism, predation and competition. Overall, these latest advances greatly increase our understanding of the underlying molecular mechanisms for diverse adaptive traits and species interaction, demonstrating that the development of evolutionary biology has been greatly accelerated by multi-omics technologies. Finally, we highlighted the emerging trends and future prospects around the above three aspects of adaptive evolution.
Collapse
Affiliation(s)
- Yibo Hu
- CAS Key Lab of Animal Ecology and Conservation Biology, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Xiaoping Wang
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, School of Life Sciences, Yunnan University, Kunming, 650091, China
| | - Yongchao Xu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China
| | - Hui Yang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650201, China
| | - Zeyu Tong
- Institute of Evolution and Ecology, School of Life Sciences, Central China Normal University, Wuhan, 430079, China
| | - Ran Tian
- College of Life Sciences, Nanjing Normal University, Nanjing, 210023, China
| | - Shaohua Xu
- State Key Laboratory of Biocontrol, Guangdong Key Lab of Plant Resources, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, China
| | - Li Yu
- State Key Laboratory for Conservation and Utilization of Bio-Resources in Yunnan, School of Life Sciences, Yunnan University, Kunming, 650091, China.
| | - Yalong Guo
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China.
| | - Peng Shi
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650201, China.
| | - Shuangquan Huang
- Institute of Evolution and Ecology, School of Life Sciences, Central China Normal University, Wuhan, 430079, China.
| | - Guang Yang
- Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou, 511458, China.
- College of Life Sciences, Nanjing Normal University, Nanjing, 210023, China.
| | - Suhua Shi
- State Key Laboratory of Biocontrol, Guangdong Key Lab of Plant Resources, School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, China.
| | - Fuwen Wei
- CAS Key Lab of Animal Ecology and Conservation Biology, Chinese Academy of Sciences, Beijing, 100101, China.
- Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou, 511458, China.
| |
Collapse
|
11
|
Verbiest M, Maksimov M, Jin Y, Anisimova M, Gymrek M, Bilgin Sonay T. Mutation and selection processes regulating short tandem repeats give rise to genetic and phenotypic diversity across species. J Evol Biol 2023; 36:321-336. [PMID: 36289560 PMCID: PMC9990875 DOI: 10.1111/jeb.14106] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 06/29/2022] [Accepted: 08/01/2022] [Indexed: 02/03/2023]
Abstract
Short tandem repeats (STRs) are units of 1-6 bp that repeat in a tandem fashion in DNA. Along with single nucleotide polymorphisms and large structural variations, they are among the major genomic variants underlying genetic, and likely phenotypic, divergence. STRs experience mutation rates that are orders of magnitude higher than other well-studied genotypic variants. Frequent copy number changes result in a wide range of alleles, and provide unique opportunities for modulating complex phenotypes through variation in repeat length. While classical studies have identified key roles of individual STR loci, the advent of improved sequencing technology, high-quality genome assemblies for diverse species, and bioinformatics methods for genome-wide STR analysis now enable more systematic study of STR variation across wide evolutionary ranges. In this review, we explore mutation and selection processes that affect STR copy number evolution, and how these processes give rise to varying STR patterns both within and across species. Finally, we review recent examples of functional and adaptive changes linked to STRs.
Collapse
Affiliation(s)
- Max Verbiest
- Institute of Computational Life Sciences, School of Life Sciences and Facility ManagementZürich University of Applied SciencesWädenswilSwitzerland
- Department of Molecular Life SciencesUniversity of ZurichZurichSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Mikhail Maksimov
- Department of Computer Science & EngineeringUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Ye Jin
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of BioengineeringUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Maria Anisimova
- Institute of Computational Life Sciences, School of Life Sciences and Facility ManagementZürich University of Applied SciencesWädenswilSwitzerland
- Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Melissa Gymrek
- Department of Computer Science & EngineeringUniversity of California San DiegoLa JollaCaliforniaUSA
- Department of MedicineUniversity of California San DiegoLa JollaCaliforniaUSA
| | - Tugce Bilgin Sonay
- Institute of Ecology, Evolution and Environmental BiologyColumbia UniversityNew YorkNew YorkUSA
| |
Collapse
|
12
|
Maddi AMA, Kavousi K, Arabfard M, Ohadi H, Ohadi M. Tandem repeats ubiquitously flank and contribute to translation initiation sites. BMC Genom Data 2022; 23:59. [PMID: 35896982 PMCID: PMC9331589 DOI: 10.1186/s12863-022-01075-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 07/18/2022] [Indexed: 12/31/2022] Open
Abstract
Background While the evolutionary divergence of cis-regulatory sequences impacts translation initiation sites (TISs), the implication of tandem repeats (TRs) in TIS selection remains largely elusive. Here, we employed the TIS homology concept to study a possible link between TRs of all core lengths and repeats with TISs. Methods Human, as reference sequence, and 83 other species were selected, and data was extracted on the entire protein-coding genes (n = 1,611,368) and transcripts (n = 2,730,515) annotated for those species from Ensembl 102. Following TIS identification, two different weighing vectors were employed to assign TIS homology, and the co-occurrence pattern of TISs with the upstream flanking TRs was studied in the selected species. The results were assessed in 10-fold cross-validation. Results On average, every TIS was flanked by 1.19 TRs of various categories within its 120 bp upstream sequence, per species. We detected statistically significant enrichment of non-homologous human TISs co-occurring with human-specific TRs. On the contrary, homologous human TISs co-occurred significantly with non-human-specific TRs. 2991 human genes had at least one transcript, TIS of which was flanked by a human-specific TR. Text mining of a number of the identified genes, such as CACNA1A, EIF5AL1, FOXK1, GABRB2, MYH2, SLC6A8, and TTN, yielded predominant expression and functions in the human brain and/or skeletal muscle. Conclusion We conclude that TRs ubiquitously flank and contribute to TIS selection at the trans-species level. Future functional analyses, such as a combination of genome editing strategies and in vitro protein synthesis may be employed to further investigate the impact of TRs on TIS selection. Supplementary Information The online version contains supplementary material available at 10.1186/s12863-022-01075-5.
Collapse
|
13
|
Lepais O, Aissi A, Véla E, Beghami Y. Joint analysis of microsatellites and flanking sequences enlightens complex demographic history of interspecific gene flow and vicariance in rear-edge oak populations. Heredity (Edinb) 2022; 129:169-182. [PMID: 35725763 PMCID: PMC9411615 DOI: 10.1038/s41437-022-00550-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 06/10/2022] [Accepted: 06/10/2022] [Indexed: 12/25/2022] Open
Abstract
Inference of recent population divergence requires fast evolving markers and necessitates to differentiate shared genetic variation caused by ancestral polymorphism and gene flow. Theoretical research shows that the use of compound marker systems integrating linked polymorphisms with different mutational dynamics, such as a microsatellite and its flanking sequences, can improve estimation of population structure and inference of demographic history, especially in the case of complex population dynamics. However, empirical application in natural populations has so far been limited by lack of suitable methods for data collection. A solution comes from the development of sequence-based microsatellite genotyping which we used to study molecular variation at 36 sequenced nuclear microsatellites in seven Quercus canariensis and four Q. faginea rear-edge populations across Algeria. We aim to decipher their taxonomic relationship, past evolutionary history and recent demographic trajectory. First, we compare the estimation of population genetics parameters and simulation-based inference of demographic history from microsatellite sequence alone, flanking sequence alone or the combination of linked microsatellite and flanking sequence variation. Second, we apply random forest approximate Bayesian computation to identify which of these sequence types is most informative. Whereas analysing microsatellite variation alone indicates recent interspecific gene flow, additional information gained by integrating nucleotide variation in flanking sequences, by reducing homoplasy, suggests ancient interspecific gene flow followed by drift in isolation instead. The weight of each polymorphism in the inference also demonstrates the value of linked variations with contrasted mutation dynamic to improve estimation of both demographic and mutational parameters.
Collapse
Affiliation(s)
- Olivier Lepais
- Univ. Bordeaux, INRAE, BIOGECO, F-33610, Cestas, France.
| | | | - Errol Véla
- AMAP, Université de Montpellier/CIRAD/CNRS/INRA/IRD, Montpellier, France
| | - Yassine Beghami
- LAPAPEZA, Université Batna 1 Hadj Lakhdar, ISVSA, Batna, Algeria
| |
Collapse
|
14
|
Ranathunge C, Chimahusky ME, Welch ME. A comparative study of population genetic structure reveals patterns consistent with selection at functional microsatellites in common sunflower. Mol Genet Genomics 2022; 297:1329-1342. [PMID: 35786764 DOI: 10.1007/s00438-022-01920-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2021] [Accepted: 06/16/2022] [Indexed: 10/17/2022]
Abstract
Microsatellites, also known as short tandem repeats (STRs), have long been considered non-functional, neutrally evolving regions of the genome. Recent findings suggest that they can function as drivers of rapid adaptive evolution. Previous work on the common sunflower identified 479 transcribed microsatellites where allele length significantly correlates with gene expression (eSTRs) in a stepwise manner. Here, a population genetic approach is used to test whether eSTR allele length variation is under selection. Genotypic variation among and within populations at 13 eSTRs was compared with that at 19 anonymous microsatellites in 672 individuals from 17 natural populations of sunflower from across a cline running from Saskatchewan to Oklahoma (distance of approximately 1600 km). Expected heterozygosity, allelic richness, and allelic diversity were significantly lower at eSTRs, a pattern consistent with higher relative rates of purifying selection. Further, an analysis of variation in microsatellite allele lengths (lnRV), and heterozygosities (lnRH), indicate recent selective sweeps at the eSTRs. Mean microsatellite allele lengths at four eSTRs within populations are significantly correlated with latitude consistent with the predictions of the tuning-knob model which predicts stepwise relationships between microsatellite allele length and phenotypes. This finding suggests that shorter or longer alleles at eSTRs may be favored in climatic extremes. Collectively, our results imply that eSTRs are likely under selection and that they may be playing a role in facilitating local adaptation across a well-defined cline in the common sunflower.
Collapse
Affiliation(s)
- Chathurani Ranathunge
- Department of Biological Sciences, Mississippi State University, Starkville, MS, 39762, USA.
- School of Health Professions, Eastern Virginia Medical School, Norfolk, VA, 23507, USA.
| | - Melody E Chimahusky
- Department of Biological Sciences, Mississippi State University, Starkville, MS, 39762, USA
| | - Mark E Welch
- Department of Biological Sciences, Mississippi State University, Starkville, MS, 39762, USA
| |
Collapse
|
15
|
Mei H, Zhao T, Dong Z, Han J, Xu B, Chen R, Zhang J, Zhang J, Hu Y, Zhang T, Fang L. Population-Scale Polymorphic Short Tandem Repeat Provides an Alternative Strategy for Allele Mining in Cotton. FRONTIERS IN PLANT SCIENCE 2022; 13:916830. [PMID: 35599867 PMCID: PMC9120961 DOI: 10.3389/fpls.2022.916830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/10/2022] [Accepted: 04/20/2022] [Indexed: 06/15/2023]
Abstract
Short tandem repeats (STRs), which vary in size due to featuring variable numbers of repeat units, are present throughout most eukaryotic genomes. To date, few population-scale studies identifying STRs have been reported for crops. Here, we constructed a high-density polymorphic STR map by investigating polymorphic STRs from 911 Gossypium hirsutum accessions. In total, we identified 556,426 polymorphic STRs with an average length of 21.1 bp, of which 69.08% were biallelic. Moreover, 7,718 (1.39%) were identified in the exons of 6,021 genes, which were significantly enriched in transcription, ribosome biogenesis, and signal transduction. Only 5.88% of those exonic STRs altered open reading frames, of which 97.16% were trinucleotide. An alternative strategy STR-GWAS analysis revealed that 824 STRs were significantly associated with agronomic traits, including 491 novel alleles that undetectable by previous SNP-GWAS methods. For instance, a novel polymorphic STR consisting of GAACCA repeats was identified in GH_D06G1697, with its (GAACCA)5 allele increasing fiber length by 1.96-4.83% relative to the (GAACCA)4 allele. The database CottonSTRDB was further developed to facilitate use of STR datasets in breeding programs. Our study provides functional roles for STRs in influencing complex traits, an alternative strategy STR-GWAS for allele mining, and a database serving the cotton community as a valuable resource.
Collapse
Affiliation(s)
- Huan Mei
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Ting Zhao
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Zeyu Dong
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Jin Han
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Biyu Xu
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Rui Chen
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Jun Zhang
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Juncheng Zhang
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Yan Hu
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
- Hainan Institute of Zhejiang University, Sanya, China
| | - Tianzhen Zhang
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
- Hainan Institute of Zhejiang University, Sanya, China
| | - Lei Fang
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
- Hainan Institute of Zhejiang University, Sanya, China
| |
Collapse
|
16
|
Wu Z, Gong H, Zhou Z, Jiang T, Lin Z, Li J, Xiao S, Yang B, Huang L. Mapping short tandem repeats for liver gene expression traits helps prioritize potential causal variants for complex traits in pigs. J Anim Sci Biotechnol 2022; 13:8. [PMID: 35034641 PMCID: PMC8762894 DOI: 10.1186/s40104-021-00658-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 11/25/2021] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Short tandem repeats (STRs) were recently found to have significant impacts on gene expression and diseases in humans, but their roles on gene expression and complex traits in pigs remain unexplored. This study investigates the effects of STRs on gene expression in liver tissues based on the whole-genome sequences and RNA-Seq data of a discovery cohort of 260 F6 individuals and a validation population of 296 F7 individuals from a heterogeneous population generated from crosses among eight pig breeds. RESULTS We identified 5203 and 5868 significantly expression STRs (eSTRs, FDR < 1%) in the F6 and F7 populations, respectively, most of which could be reciprocally validated (π1 = 0.92). The eSTRs explained 27.5% of the cis-heritability of gene expression traits on average. We further identified 235 and 298 fine-mapped STRs through the Bayesian fine-mapping approach in the F6 and F7 pigs, respectively, which were significantly enriched in intron, ATAC peak, compartment A and H3K4me3 regions. We identified 20 fine-mapped STRs located in 100 kb windows upstream and downstream of published complex trait-associated SNPs, which colocalized with epigenetic markers such as H3K27ac and ATAC peaks. These included eSTR of the CLPB, PGLS, PSMD6 and DHDH genes, which are linked with genome-wide association study (GWAS) SNPs for blood-related traits, leg conformation, growth-related traits, and meat quality traits, respectively. CONCLUSIONS This study provides insights into the effects of STRs on gene expression traits. The identified eSTRs are valuable resources for prioritizing causal STRs for complex traits in pigs.
Collapse
Affiliation(s)
- Zhongzi Wu
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, China
| | - Huanfa Gong
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, China
| | - Zhimin Zhou
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, China
| | - Tao Jiang
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, China
| | - Ziqi Lin
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, China
| | - Jing Li
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, China
| | - Shijun Xiao
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, China
| | - Bin Yang
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, China.
| | - Lusheng Huang
- State Key Laboratory for Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang, China.
| |
Collapse
|
17
|
Gall-Duncan T, Sato N, Yuen RKC, Pearson CE. Advancing genomic technologies and clinical awareness accelerates discovery of disease-associated tandem repeat sequences. Genome Res 2022; 32:1-27. [PMID: 34965938 PMCID: PMC8744678 DOI: 10.1101/gr.269530.120] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Accepted: 11/29/2021] [Indexed: 11/25/2022]
Abstract
Expansions of gene-specific DNA tandem repeats (TRs), first described in 1991 as a disease-causing mutation in humans, are now known to cause >60 phenotypes, not just disease, and not only in humans. TRs are a common form of genetic variation with biological consequences, observed, so far, in humans, dogs, plants, oysters, and yeast. Repeat diseases show atypical clinical features, genetic anticipation, and multiple and partially penetrant phenotypes among family members. Discovery of disease-causing repeat expansion loci accelerated through technological advances in DNA sequencing and computational analyses. Between 2019 and 2021, 17 new disease-causing TR expansions were reported, totaling 63 TR loci (>69 diseases), with a likelihood of more discoveries, and in more organisms. Recent and historical lessons reveal that properly assessed clinical presentations, coupled with genetic and biological awareness, can guide discovery of disease-causing unstable TRs. We highlight critical but underrecognized aspects of TR mutations. Repeat motifs may not be present in current reference genomes but will be in forthcoming gapless long-read references. Repeat motif size can be a single nucleotide to kilobases/unit. At a given locus, repeat motif sequence purity can vary with consequence. Pathogenic repeats can be "insertions" within nonpathogenic TRs. Expansions, contractions, and somatic length variations of TRs can have clinical/biological consequences. TR instabilities occur in humans and other organisms. TRs can be epigenetically modified and/or chromosomal fragile sites. We discuss the expanding field of disease-associated TR instabilities, highlighting prospects, clinical and genetic clues, tools, and challenges for further discoveries of disease-causing TR instabilities and understanding their biological and pathological impacts-a vista that is about to expand.
Collapse
Affiliation(s)
- Terence Gall-Duncan
- Program of Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario M5G 1L7, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Nozomu Sato
- Program of Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario M5G 1L7, Canada
| | - Ryan K C Yuen
- Program of Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario M5G 1L7, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | - Christopher E Pearson
- Program of Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario M5G 1L7, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
| |
Collapse
|
18
|
Reinar WB, Lalun VO, Reitan T, Jakobsen KS, Butenko MA. Length variation in short tandem repeats affects gene expression in natural populations of Arabidopsis thaliana. THE PLANT CELL 2021; 33:2221-2234. [PMID: 33848350 PMCID: PMC8364236 DOI: 10.1093/plcell/koab107] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Accepted: 04/07/2021] [Indexed: 06/12/2023]
Abstract
The genetic basis for the fine-tuned regulation of gene expression is complex and ultimately influences the phenotype and thus the local adaptation of natural populations. Short tandem repeats (STRs) consisting of repetitive DNA motifs have been shown to regulate gene expression. STRs are variable in length within a population and serve as a heritable, but semi-reversible, reservoir of standing genetic variation. For sessile organisms, such as plants, STRs could be of major importance in fine-tuning gene expression as a response to a shifting local environment. Here, we used a transcriptome dataset from natural accessions of Arabidopsis thaliana to investigate population-wide gene expression patterns in light of genome-wide STR variation. We empirically modeled gene expression as a response to the STR length within and around the gene and demonstrated that an association between gene expression and STR length variation is unequivocally present in the sampled population. To support our model, we explored the promoter activity in a transcriptional regulator involved in root hair formation and provided experimentally determined causality between coding sequence length variation and promoter activity. Our results support a general link between gene expression variation and STR length variation in A. thaliana.
Collapse
Affiliation(s)
- William B. Reinar
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, 0316 Oslo, Norway
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, 0316 Oslo, Norway
| | - Vilde O. Lalun
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, 0316 Oslo, Norway
| | - Trond Reitan
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, 0316 Oslo, Norway
| | - Kjetill S. Jakobsen
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, 0316 Oslo, Norway
| | - Melinka A. Butenko
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, 0316 Oslo, Norway
| |
Collapse
|
19
|
Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network. Nat Commun 2021; 12:3297. [PMID: 34078885 PMCID: PMC8172540 DOI: 10.1038/s41467-021-23143-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 04/13/2021] [Indexed: 02/04/2023] Open
Abstract
Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism.
Collapse
|
20
|
Song X, Yang T, Zhang X, Yuan Y, Yan X, Wei Y, Zhang J, Zhou C. Comparison of the Microsatellite Distribution Patterns in the Genomes of Euarchontoglires at the Taxonomic Level. Front Genet 2021; 12:622724. [PMID: 33719337 PMCID: PMC7953163 DOI: 10.3389/fgene.2021.622724] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Accepted: 02/05/2021] [Indexed: 02/05/2023] Open
Abstract
Microsatellite or simple sequence repeat (SSR) instability within genes can induce genetic variation. The SSR signatures remain largely unknown in different clades within Euarchontoglires, one of the most successful mammalian radiations. Here, we conducted a genome-wide characterization of microsatellite distribution patterns at different taxonomic levels in 153 Euarchontoglires genomes. Our results showed that the abundance and density of the SSRs were significantly positively correlated with primate genome size, but no significant relationship with the genome size of rodents was found. Furthermore, a higher level of complexity for perfect SSR (P-SSR) attributes was observed in rodents than in primates. The most frequent type of P-SSR was the mononucleotide P-SSR in the genomes of primates, tree shrews, and colugos, while mononucleotide or dinucleotide motif types were dominant in the genomes of rodents and lagomorphs. Furthermore, (A)n was the most abundant motif in primate genomes, but (A)n, (AC)n, or (AG)n was the most abundant motif in rodent genomes which even varied within the same genus. The GC content and the repeat copy numbers of P-SSRs varied in different species when compared at different taxonomic levels, reflecting underlying differences in SSR mutation processes. Notably, the CDSs containing P-SSRs were categorized by functions and pathways using Gene Ontology and Kyoto Encyclopedia of Genes and Genomes annotations, highlighting their roles in transcription regulation. Generally, this work will aid future studies of the functional roles of the taxonomic features of microsatellites during the evolution of mammals in Euarchontoglires.
Collapse
Affiliation(s)
- Xuhao Song
- Key Laboratory of Southwest China Wildlife Resources Conservation (Ministry of Education), China West Normal University, Nanchong, China.,Institute of Ecology, China West Normal University, Nanchong, China
| | - Tingbang Yang
- Key Laboratory of Southwest China Wildlife Resources Conservation (Ministry of Education), China West Normal University, Nanchong, China.,Institute of Ecology, China West Normal University, Nanchong, China
| | - Xinyi Zhang
- Key Laboratory of Southwest China Wildlife Resources Conservation (Ministry of Education), China West Normal University, Nanchong, China
| | - Ying Yuan
- Key Laboratory of Southwest China Wildlife Resources Conservation (Ministry of Education), China West Normal University, Nanchong, China
| | - Xianghui Yan
- Key Laboratory of Southwest China Wildlife Resources Conservation (Ministry of Education), China West Normal University, Nanchong, China
| | - Yi Wei
- Key Laboratory of Southwest China Wildlife Resources Conservation (Ministry of Education), China West Normal University, Nanchong, China.,Institute of Ecology, China West Normal University, Nanchong, China
| | - Jun Zhang
- Key Laboratory of Southwest China Wildlife Resources Conservation (Ministry of Education), China West Normal University, Nanchong, China.,Institute of Ecology, China West Normal University, Nanchong, China
| | - Caiquan Zhou
- Key Laboratory of Southwest China Wildlife Resources Conservation (Ministry of Education), China West Normal University, Nanchong, China.,Institute of Ecology, China West Normal University, Nanchong, China
| |
Collapse
|
21
|
Abstract
Individuals within a species can exhibit vast variation in copy number of repetitive DNA elements. This variation may contribute to complex traits such as lifespan and disease, yet it is only infrequently considered in genotype-phenotype associations. Although the possible importance of copy number variation is widely recognized, accurate copy number quantification remains challenging. Here, we assess the technical reproducibility of several major methods for copy number estimation as they apply to the large repetitive ribosomal DNA array (rDNA). rDNA encodes the ribosomal RNAs and exists as a tandem gene array in all eukaryotes. Repeat units of rDNA are kilobases in size, often with several hundred units comprising the array, making rDNA particularly intractable to common quantification techniques. We evaluate pulsed-field gel electrophoresis, droplet digital PCR, and Nextera-based whole genome sequencing as approaches to copy number estimation, comparing techniques across model organisms and spanning wide ranges of copy numbers. Nextera-based whole genome sequencing, though commonly used in recent literature, produced high error. We explore possible causes for this error and provide recommendations for best practices in rDNA copy number estimation. We present a resource of high-confidence rDNA copy number estimates for a set of S. cerevisiae and C. elegans strains for future use. We furthermore explore the possibility for FISH-based copy number estimation, an alternative that could potentially characterize copy number on a cellular level.
Collapse
|
22
|
Tørresen OK, Star B, Mier P, Andrade-Navarro MA, Bateman A, Jarnot P, Gruca A, Grynberg M, Kajava AV, Promponas VJ, Anisimova M, Jakobsen KS, Linke D. Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Res 2019; 47:10994-11006. [PMID: 31584084 PMCID: PMC6868369 DOI: 10.1093/nar/gkz841] [Citation(s) in RCA: 159] [Impact Index Per Article: 31.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 09/03/2019] [Accepted: 10/01/2019] [Indexed: 12/13/2022] Open
Abstract
The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with 'ready-to-use' deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others.
Collapse
Affiliation(s)
- Ole K Tørresen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Bastiaan Star
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton. CB10 1SD, UK
| | - Patryk Jarnot
- Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | - Aleksandra Gruca
- Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | - Marcin Grynberg
- Institute of Biochemistry and Biophysics PAS, Pawińskiego 5A, 02-106 Warsaw, Poland
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237 CNRS, Universite Montpellier 1919 Route de Mende, CEDEX 5, 34293 Montpellier, France
- Institut de Biologie Computationnelle, 34095 Montpellier, France
| | - Vasilis J Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, PO Box 20537, CY 1678 Nicosia, Cyprus
| | - Maria Anisimova
- Institute of Applied Simulations, School of Life Sciences and Facility Management, Zurich University of Applied Sciences (ZHAW), Wädenswil, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Kjetill S Jakobsen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Dirk Linke
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| |
Collapse
|
23
|
Press MO, Hall AN, Morton EA, Queitsch C. Substitutions Are Boring: Some Arguments about Parallel Mutations and High Mutation Rates. Trends Genet 2019; 35:253-264. [PMID: 30797597 PMCID: PMC6435258 DOI: 10.1016/j.tig.2019.01.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Revised: 12/20/2018] [Accepted: 01/14/2019] [Indexed: 12/31/2022]
Abstract
Extant genomes are largely shaped by global transposition, copy-number fluctuation, and rearrangement of DNA sequences rather than by substitutions of single nucleotides. Although many of these large-scale mutations have low probabilities and are unlikely to repeat, others are recurrent or predictable in their effects, leading to stereotyped genome architectures and genetic variation in both eukaryotes and prokaryotes. Such recurrent, parallel mutation modes can profoundly shape the paths taken by evolution and undermine common models of evolutionary genetics. Similar patterns are also evident at the smaller scales of individual genes or short sequences. The scale and extent of this 'non-substitution' variation has recently come into focus through the advent of new genomic technologies; however, it is still not widely considered in genotype-phenotype association studies. In this review we identify common features of these disparate mutational phenomena and comment on the importance and interpretation of these mutational patterns.
Collapse
Affiliation(s)
| | - Ashley N Hall
- Department of Genome Sciences, University of Washington, Seattle, WA 91895, USA; Department of Molecular and Cellular Biology, University of Washington, Seattle, WA 91895, USA
| | - Elizabeth A Morton
- Department of Genome Sciences, University of Washington, Seattle, WA 91895, USA
| | - Christine Queitsch
- Department of Genome Sciences, University of Washington, Seattle, WA 91895, USA.
| |
Collapse
|
24
|
Arabfard M, Kavousi K, Delbari A, Ohadi M. Link between short tandem repeats and translation initiation site selection. Hum Genomics 2018; 12:47. [PMID: 30373661 PMCID: PMC6206671 DOI: 10.1186/s40246-018-0181-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Accepted: 10/10/2018] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Despite their vast biological implication, the relevance of short tandem repeats (STRs)/microsatellites to the protein-coding gene translation initiation sites (TISs) remains largely unknown. METHODS We performed an Ensembl-based comparative genomics study of all annotated orthologous TIS-flanking sequences in human and 46 other species across vertebrates, on the genomic DNA and cDNA platforms (755,956 TISs), aimed at identifying human-specific STRs in this interval. The collected data were used to examine the hypothesis of a link between STRs and TISs. BLAST was used to compare the initial five amino acids (excluding the initial methionine), codons of which were flanked by STRs in human, with the initial five amino acids of all annotated proteins for the orthologous genes in other vertebrates (total of 5,314,979 pair-wise TIS comparisons on the genomic DNA and cDNA platforms) in order to compare the number of events in which human-specific and non-specific STRs occurred with homologous and non-homologous TISs (i.e., ≥ 50% and < 50% similarity of the five amino acids). RESULTS We detected differential distribution of the human-specific STRs in comparison to the overall distribution of STRs on the genomic DNA and cDNA platforms (Mann Whitney U test p = 1.4 × 10-11 and p < 7.9 × 10-11, respectively). We also found excess occurrence of non-homologous TISs with human-specific STRs and excess occurrence of homologous TISs with non-specific STRs on both platforms (p < 0.00001). CONCLUSION We propose a link between STRs and TIS selection, based on the differential co-occurrence rate of human-specific STRs with non-homologous TISs and non-specific STRs with homologous TISs.
Collapse
Affiliation(s)
- Masoud Arabfard
- Department of Bioinformatics, Kish International Campus University of Tehran, Kish, Iran
- Laboratory of Complex Biological Systems and Bioinformatics (CBB), Department of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran
| | - Kaveh Kavousi
- Laboratory of Complex Biological Systems and Bioinformatics (CBB), Department of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran
| | - Ahmad Delbari
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Mina Ohadi
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| |
Collapse
|
25
|
Cheng W, Zhou Y, Miao X, An C, Gao H. The Putative Smallest Introns in the Arabidopsis Genome. Genome Biol Evol 2018; 10:2551-2557. [PMID: 30184083 PMCID: PMC6161759 DOI: 10.1093/gbe/evy197] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/30/2018] [Indexed: 12/15/2022] Open
Abstract
Most eukaryotic genes contain introns, which are noncoding sequences that are removed during premRNA processing. Introns are usually preserved across evolutionary time. However, the sizes of introns vary greatly. In Arabidopsis, some introns are longer than 10 kilo base pairs (bp) and others are predicted to be shorter than 10 bp. To identify the shortest intron in the genome, we analyzed the predicted introns in annotated version 10 of the Arabidopsis thaliana genome and found 103 predicted introns that are 30 bp or shorter, which make up only 0.08% of all introns in the genome. However, our own bioinformatics and experimental analyses found no evidence for the existence of these predicted introns. The predicted introns of 30–39 bp, 40–49 bp, and 50–59 bp in length are also rare and constitute only 0.07%, 0.2%, and 0.28% of all introns in the genome, respectively. An analysis of 30 predicted introns 31–59 bp long verified two in this range, both of which were 59 bp long. Thus, this study suggests that there is a limit to how small introns in A. thaliana can be, which is useful for the understanding of the evolution and processing of small introns in plants in general.
Collapse
Affiliation(s)
- Wenzhen Cheng
- College of Biological Sciences and Technology, Beijing Forestry University, China
| | - Yunlin Zhou
- College of Biological Sciences and Technology, Beijing Forestry University, China
| | - Xin Miao
- College of Biological Sciences and Technology, Beijing Forestry University, China
| | - Chuanjing An
- College of Biological Sciences and Technology, Beijing Forestry University, China
| | - Hongbo Gao
- College of Biological Sciences and Technology, Beijing Forestry University, China
| |
Collapse
|