1
|
Horton CA, Alexandari AM, Hayes MGB, Marklund E, Schaepe JM, Aditham AK, Shah N, Suzuki PH, Shrikumar A, Afek A, Greenleaf WJ, Gordân R, Zeitlinger J, Kundaje A, Fordyce PM. Short tandem repeats bind transcription factors to tune eukaryotic gene expression. Science 2023; 381:eadd1250. [PMID: 37733848 DOI: 10.1126/science.add1250] [Citation(s) in RCA: 37] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 07/26/2023] [Indexed: 09/23/2023]
Abstract
Short tandem repeats (STRs) are enriched in eukaryotic cis-regulatory elements and alter gene expression, yet how they regulate transcription remains unknown. We found that STRs modulate transcription factor (TF)-DNA affinities and apparent on-rates by about 70-fold by directly binding TF DNA-binding domains, with energetic impacts exceeding many consensus motif mutations. STRs maximize the number of weakly preferred microstates near target sites, thereby increasing TF density, with impacts well predicted by statistical mechanics. Confirming that STRs also affect TF binding in cells, neural networks trained only on in vivo occupancies predicted effects identical to those observed in vitro. Approximately 90% of TFs preferentially bound STRs that need not resemble known motifs, providing a cis-regulatory mechanism to target TFs to genomic sites.
Collapse
Affiliation(s)
- Connor A Horton
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Amr M Alexandari
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Michael G B Hayes
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Emil Marklund
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Julia M Schaepe
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Arjun K Aditham
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- ChEM-H Institute, Stanford University, Stanford, CA 94305, USA
| | - Nilay Shah
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Peter H Suzuki
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Avanti Shrikumar
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Ariel Afek
- Center for Genomic and Computational Biology, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | | | - Raluca Gordân
- Center for Genomic and Computational Biology, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Computer Science, Duke University, Durham, NC 27708, USA
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC 27710, USA
| | - Julia Zeitlinger
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
- The University of Kansas Medical Center, Kansas City, KS 66103, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Polly M Fordyce
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- ChEM-H Institute, Stanford University, Stanford, CA 94305, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94110, USA
| |
Collapse
|
2
|
Genovese LM, Mosca MM, Pellegrini M, Geraci F. Dot2dot: accurate whole-genome tandem repeats discovery. Bioinformatics 2019; 35:914-922. [PMID: 30165507 PMCID: PMC6419916 DOI: 10.1093/bioinformatics/bty747] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2018] [Revised: 08/03/2018] [Accepted: 08/24/2018] [Indexed: 01/18/2023] Open
Abstract
MOTIVATION Large-scale sequencing projects have confirmed the hypothesis that eukaryotic DNA is rich in repetitions whose functional role needs to be elucidated. In particular, tandem repeats (TRs) (i.e. short, almost identical sequences that lie adjacent to each other) have been associated to many cellular processes and, indeed, are also involved in several genetic disorders. The need of comprehensive lists of TRs for association studies and the absence of a computational model able to capture their variability have revived research on discovery algorithms. RESULTS Building upon the idea that sequence similarities can be easily displayed using graphical methods, we formalized the structure that TRs induce in dot-plot matrices where a sequence is compared with itself. Leveraging on the observation that a compact representation of these matrices can be built and searched in linear time, we developed Dot2dot: an accurate algorithm fast enough to be suitable for whole-genome discovery of TRs. Experiments on five manually curated collections of TRs have shown that Dot2dot is more accurate than other established methods, and completes the analysis of the biggest known reference genome in about one day on a standard PC. AVAILABILITY AND IMPLEMENTATION Source code and datasets are freely available upon paper acceptance at the URL: https://github.com/Gege7177/Dot2dot. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Marco M Mosca
- Department of Computer Science, University of Liverpool, Liverpool, UK
| | - Marco Pellegrini
- Institute for Informatics and Telematics, CNR, Pisa, Italy.,Laboratory of Integrative Systems Medicine (LISM), Institute of Informatics and Telematics and Institute of Clinical Physiology, Pisa, Italy
| | - Filippo Geraci
- Institute for Informatics and Telematics, CNR, Pisa, Italy
| |
Collapse
|
3
|
Alizadeh F, Moharrami T, Mousavi N, Yazarlou F, Bozorgmehr A, Shahsavand E, Delbari A, Ohadi M. Disease-only alleles at the extreme ends of the human ZMYM3 exceptionally long 5' UTR short tandem repeat in bipolar disorder: A pilot study. J Affect Disord 2019; 251:86-90. [PMID: 30909162 DOI: 10.1016/j.jad.2019.03.056] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Revised: 01/20/2019] [Accepted: 03/19/2019] [Indexed: 12/22/2022]
Abstract
OBJECTIVE The X-linked ZMYM3 gene (also known as ZNF261) contains the longest STR, (GA)32, identified in a human protein-coding gene 5'UTR (ENST00000373998.5: ZMYM3-207). This STR reaches maximum length in human, and is located in a complex string of four consecutive GA-STRs with a human-specific formula across the complex. A previous study in Iranian male schizophrenia (SCZ) patients revealed co-occurrence of the extreme short and long alleles of the STR with SCZ. Here we studied the allelic distribution of this STR in bipolar disorder (BD) type I. The interval encompassing the human ZMYM3 STR complex was PCR-amplified and sequenced in 546 male subjects, consisting of 157 BD patients and 389 controls. RESULTS We found three alleles at the extreme short (17-repeat) and long (38- and 43-repeat) ends of the allele distribution curve in the BD cases (4.4% of the BD alleles) that were not detected in the controls (Mid p < 0.0001). These alleles overlapped with the extreme disease-only alleles detected previously in the SCZ patients. Domain reconstruction of the GA-STR complex revealed significant structural alteration as a result of various sequence repeats and nucleotide compositions at the inter and intraspecies levels. CONCLUSION The ZMYM3 "exceptionally long" 5' UTR STR findings may alter our perspective of disease pathogenesis in psychiatric disorders, and set an example in which the low frequency alleles at the extreme short and long ends of the human STRs are, at least in part, a result of natural selection against these alleles and their unambiguous link to major human disorders.
Collapse
Affiliation(s)
- Fatemeh Alizadeh
- Department of Genomic Psychiatry and Behavioral Genomics (DGPBG), Roozbeh Hospital, School of Medicine, Tehran University of Medical Sciences (TUMS), Tehran, Iran
| | - Tamouchin Moharrami
- Department of Medical Genetics, School of Medicine, Tehran University of Medical Sciences (TUMS), Tehran, Iran
| | - Negar Mousavi
- Department of Biology, Parand Branch, Islamic Azad University, Parand, Iran
| | - Fatemeh Yazarlou
- Department of Medical Genetics, School of Medicine, Tehran University of Medical Sciences (TUMS), Tehran, Iran
| | - Ali Bozorgmehr
- Department of Neuroscience, Faculty of Advanced Technologies in Medicine, Iran University of Medical Sciences, Tehran, Iran
| | - Esmaeil Shahsavand
- Department of Genomic Psychiatry and Behavioral Genomics (DGPBG), Roozbeh Hospital, School of Medicine, Tehran University of Medical Sciences (TUMS), Tehran, Iran
| | - Ahmad Delbari
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Mina Ohadi
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
| |
Collapse
|
4
|
Arabfard M, Kavousi K, Delbari A, Ohadi M. Link between short tandem repeats and translation initiation site selection. Hum Genomics 2018; 12:47. [PMID: 30373661 PMCID: PMC6206671 DOI: 10.1186/s40246-018-0181-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Accepted: 10/10/2018] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Despite their vast biological implication, the relevance of short tandem repeats (STRs)/microsatellites to the protein-coding gene translation initiation sites (TISs) remains largely unknown. METHODS We performed an Ensembl-based comparative genomics study of all annotated orthologous TIS-flanking sequences in human and 46 other species across vertebrates, on the genomic DNA and cDNA platforms (755,956 TISs), aimed at identifying human-specific STRs in this interval. The collected data were used to examine the hypothesis of a link between STRs and TISs. BLAST was used to compare the initial five amino acids (excluding the initial methionine), codons of which were flanked by STRs in human, with the initial five amino acids of all annotated proteins for the orthologous genes in other vertebrates (total of 5,314,979 pair-wise TIS comparisons on the genomic DNA and cDNA platforms) in order to compare the number of events in which human-specific and non-specific STRs occurred with homologous and non-homologous TISs (i.e., ≥ 50% and < 50% similarity of the five amino acids). RESULTS We detected differential distribution of the human-specific STRs in comparison to the overall distribution of STRs on the genomic DNA and cDNA platforms (Mann Whitney U test p = 1.4 × 10-11 and p < 7.9 × 10-11, respectively). We also found excess occurrence of non-homologous TISs with human-specific STRs and excess occurrence of homologous TISs with non-specific STRs on both platforms (p < 0.00001). CONCLUSION We propose a link between STRs and TIS selection, based on the differential co-occurrence rate of human-specific STRs with non-homologous TISs and non-specific STRs with homologous TISs.
Collapse
Affiliation(s)
- Masoud Arabfard
- Department of Bioinformatics, Kish International Campus University of Tehran, Kish, Iran
- Laboratory of Complex Biological Systems and Bioinformatics (CBB), Department of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran
| | - Kaveh Kavousi
- Laboratory of Complex Biological Systems and Bioinformatics (CBB), Department of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran
| | - Ahmad Delbari
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Mina Ohadi
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| |
Collapse
|
5
|
Nazaripanah N, Adelirad F, Delbari A, Sahaf R, Abbasi-Asl T, Ohadi M. Genome-scale portrait and evolutionary significance of human-specific core promoter tri- and tetranucleotide short tandem repeats. Hum Genomics 2018; 12:17. [PMID: 29622039 PMCID: PMC5887250 DOI: 10.1186/s40246-018-0149-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2017] [Accepted: 03/20/2018] [Indexed: 03/05/2023] Open
Abstract
BACKGROUND While there is an ongoing trend to identify single nucleotide substitutions (SNSs) that are linked to inter/intra-species differences and disease phenotypes, short tandem repeats (STRs)/microsatellites may be of equal (if not more) importance in the above processes. Genes that contain STRs in their promoters have higher expression divergence compared to genes with fixed or no STRs in the gene promoters. In line with the above, recent reports indicate a role of repetitive sequences in the rise of young transcription start sites (TSSs) in human evolution. RESULTS Following a comparative genomics study of all human protein-coding genes annotated in the GeneCards database, here we provide a genome-scale portrait of human-specific short- and medium-size (≥ 3-repeats) tri- and tetranucleotide STRs and STR motifs in the critical core promoter region between - 120 and + 1 to the TSS and evidence of skewing of this compartment in reference to the STRs that are not human-specific (Levene's test p < 0.001). Twenty-five percent and 26% enrichment of human-specific transcripts was detected in the tri and tetra human-specific compartments (mid-p < 0.00002 and mid-p < 0.002, respectively). CONCLUSION Our findings provide the first evidence of genome-scale skewing of STRs at a specific region of the human genome and a link between a number of these STRs and TSS selection/transcript specificity. The STRs and genes listed here may have a role in the evolution and development of characteristics and phenotypes that are unique to the human species.
Collapse
Affiliation(s)
- N Nazaripanah
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - F Adelirad
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - A Delbari
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - R Sahaf
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - T Abbasi-Asl
- Department of Biostatistics, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - M Ohadi
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
| |
Collapse
|
6
|
Srivastava A, Kumar AS, Mishra RK. Vertebrate GAF/ThPOK: emerging functions in chromatin architecture and transcriptional regulation. Cell Mol Life Sci 2018; 75:623-633. [PMID: 28856379 PMCID: PMC11105447 DOI: 10.1007/s00018-017-2633-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2017] [Revised: 08/09/2017] [Accepted: 08/25/2017] [Indexed: 12/31/2022]
Abstract
GAGA factor of Drosophila melanogaster (DmGAF) is a multifaceted transcription factor with diverse roles in chromatin regulation. Recently, ThPOK/c-Krox was identified as its vertebrate homologue (vGAF), which has a basic domain structure similar to DmGAF and is decorated with a number of post-translationally modified residues. In vertebrate genomes, vGAF associates with purine-rich GAGA sequences and performs diverse chromatin-mediated functions, viz., gene activation, repression and enhancer blocking. Expansion of regulatory chromatin proteins with the acquisition of PTMs appears to be the general trend that facilitated the evolution of complexity in vertebrates. Here, we compare the structural and functional features of vGAF with those of DmGAF and also assess the possible functional redundancy among paralogues of vGAF. We also discuss the underlying mechanisms which aid in the diverse and context-dependent functions of this protein.
Collapse
Affiliation(s)
- Avinash Srivastava
- CSIR-Centre for Cellular and Molecular Biology (CCMB), Uppal Road, Hyderabad, 500007, India
| | - Amitha Sampath Kumar
- CSIR-Centre for Cellular and Molecular Biology (CCMB), Uppal Road, Hyderabad, 500007, India
| | - Rakesh K Mishra
- CSIR-Centre for Cellular and Molecular Biology (CCMB), Uppal Road, Hyderabad, 500007, India.
| |
Collapse
|
7
|
Skewing of the genetic architecture at the ZMYM3 human-specific 5' UTR short tandem repeat in schizophrenia. Mol Genet Genomics 2018; 293:747-752. [PMID: 29332164 DOI: 10.1007/s00438-018-1415-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2017] [Accepted: 01/02/2018] [Indexed: 02/06/2023]
Abstract
Differential expansion of a number of human short tandem repeats (STRs) at the critical core promoter and 5' untranslated region (UTR) support the hypothesis that at least some of these STRs may provide a selective advantage in human evolution. Following a genome-wide screen of all human protein-coding gene 5' UTRs based on the Ensembl database ( http://www.ensembl.org ), we previously reported that the longest STR in this interval is a (GA)32, which belongs to the X-linked zinc finger MYM-type containing 3 (ZMYM3) gene. In the present study, we analyzed the evolutionary implication of this region across evolution and examined the allele and genotype distribution of the "exceptionally long" STR by direct sequencing of 486 Iranian unrelated male subjects consisting of 196 cases of schizophrenia (SCZ) and 290 controls. We found that the ZMYM3 transcript containing the STR is human-specific (ENST00000373998.5). A significant allele variance difference was observed between the cases and controls (Levene's test for equality of variances F = 4.00, p < 0.03). In addition, six alleles were observed in the SCZ patients that were not detected in the control group ("disease-only" alleles) (mid p exact < 0.0003). Those alleles were at the extreme short and long ends of the allele distribution curve and composed 4% of the genotypes in the SCZ group. In conclusion, we found skewing of the genetic architecture at the ZMYM3 STR in SCZ. Further, we found a bell-shaped distribution of alleles and selection against alleles at the extreme ends of this STR. The ZMYM3 STR sets a prototype, the evolutionary course of which determines the range of alleles in a particular species. Extreme "disease-only" alleles and genotypes may change our perspective of adaptive evolution and complex disorders. The ZMYM3 gene "exceptionally long" STR should be sequenced in SCZ and other human-specific phenotypes/characteristics.
Collapse
|
8
|
Emamalizadeh B, Movafagh A, Darvish H, Kazeminasab S, Andarva M, Namdar-Aligoodarzi P, Ohadi M. The human RIT2 core promoter short tandem repeat predominant allele is species-specific in length: a selective advantage for human evolution? Mol Genet Genomics 2017; 292:611-617. [DOI: 10.1007/s00438-017-1294-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2016] [Accepted: 01/27/2017] [Indexed: 12/17/2022]
|
9
|
Bushehri A, Barez MRM, Mansouri SK, Biglarian A, Ohadi M. Genome-wide identification of human- and primate-specific core promoter short tandem repeats. Gene 2016; 587:83-90. [PMID: 27108803 DOI: 10.1016/j.gene.2016.04.041] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2016] [Revised: 03/23/2016] [Accepted: 04/19/2016] [Indexed: 12/12/2022]
Abstract
Recent reports of a link between human- and primate-specific genetic factors and human/primate-specific characteristics and diseases necessitate genome-wide identification of those factors. We have previously reported core promoter short tandem repeats (STRs) of extreme length (≥6-repeats) that have expanded exceptionally in primates vs. non-primates, and may have a function in adaptive evolution. In the study reported here, we extended our study to the human STRs of ≥3-repeats in the category of penta and hexaucleotide STRs, across the entire human protein coding gene core promoters, and analyzed their status in several superorders and orders of vertebrates, using the Ensembl database. The ConSite software was used to identify the transcription factor (TF) sets binding to those STRs. STR specificity was observed at different levels of human and non-human primate (NHP) evolution. 73% of the pentanucleotide STRs and 68% of the hexanucleotide STRs were found to be specific to human and NHPs. AP-2alpha, Sp1, and MZF were the predominantly selected TFs (90%) binding to the human-specific STRs. Furthermore, the number of TF sets binding to a given STR was found to be a selection factor for that STR. Our findings indicate that selected STRs, the cognate binding TFs, and the number of TF set binding to those STRs function as switch codes at different levels of human and NHP evolution and speciation.
Collapse
Affiliation(s)
- A Bushehri
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - M R Mashhoudi Barez
- Cell and Molecular Biology Research Center, Department of Anatomy and Biology, Faculty of Medicine, Shahid Beheshti University, Velenjak, Tehran, Iran
| | - S K Mansouri
- Clinical Psychology Department, Faculty of Science and Research, Qazvin Azad University, Qazvin, Iran
| | - A Biglarian
- Department of Biostatistics, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - M Ohadi
- Iranian Research Center on Aging, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| |
Collapse
|
10
|
Nikkhah M, Rezazadeh M, Khorram Khorshid HR, Biglarian A, Ohadi M. An exceptionally long CA-repeat in the core promoter of SCGB2B2 links with the evolution of apes and Old World monkeys. Gene 2015; 576:109-14. [PMID: 26437309 DOI: 10.1016/j.gene.2015.09.070] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2015] [Revised: 09/25/2015] [Accepted: 09/28/2015] [Indexed: 12/31/2022]
Abstract
We have recently reported a genome-scale catalog of human protein-coding genes that contain "exceptionally long" STRs (≥6-repeats) in their core promoter, which may be of selective advantage in this species. At the top of that list, SCGB2B2 (also known as SCGBL), contains one of the longest CA-repeat STRs identified in a human gene core promoter, at 25-repeats. In the study reported here, we analyzed the conservation status of this CA-STR across evolution. The functional implication of this STR to alter gene expression activity was also analyzed in the HEK-293 cell line. We report that the SCGB2B2 core promoter CA-repeat reaches exceptional lengths, ranging from 9- to 25-repeats, across Apes (Hominoids) and the Old World monkeys (CA>2-repeats were not detected in any other species). The longest CA-repeats and highest identity in the SCGB2B2 protein sequence were observed between human and bonobo. A trend for increased gene expression activity was observed from the shorter to the longer CA-repeats (p<0.009), and the CA-repeat increased gene expression activity, per se (p<0.02). We propose that the SCGB2B2 gene core promoter CA-repeat functions as an expression code for the evolution of Apes and the Old World monkeys.
Collapse
Affiliation(s)
- M Nikkhah
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - M Rezazadeh
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - H R Khorram Khorshid
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - A Biglarian
- Department of Biostatistics, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - M Ohadi
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
| |
Collapse
|
11
|
Namdar-Aligoodarzi P, Mohammadparast S, Zaker-Kandjani B, Talebi Kakroodi S, Jafari Vesiehsari M, Ohadi M. Exceptionally long 5' UTR short tandem repeats specifically linked to primates. Gene 2015; 569:88-94. [PMID: 26022613 DOI: 10.1016/j.gene.2015.05.053] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2015] [Revised: 05/12/2015] [Accepted: 05/13/2015] [Indexed: 12/23/2022]
Abstract
We have previously reported genome-scale short tandem repeats (STRs) in the core promoter interval (i.e. -120 to +1 to the transcription start site) of protein-coding genes that have evolved identically in primates vs. non-primates. Those STRs may function as evolutionary switch codes for primate speciation. In the current study, we used the Ensembl database to analyze the 5' untranslated region (5' UTR) between +1 and +60 of the transcription start site of the entire human protein-coding genes annotated in the GeneCards database, in order to identify "exceptionally long" STRs (≥5-repeats), which may be of selective/adaptive advantage. The importance of this critical interval is its function as core promoter, and its effect on transcription and translation. In order to minimize ascertainment bias, we analyzed the evolutionary status of the human 5' UTR STRs of ≥5-repeats in several species encompassing six major orders and superorders across mammals, including primates, rodents, Scandentia, Laurasiatheria, Afrotheria, and Xenarthra. We introduce primate-specific STRs, and STRs which have expanded from mouse to primates. Identical co-occurrence of the identified STRs of rare average frequency between 0.006 and 0.0001 in primates supports a role for those motifs in processes that diverged primates from other mammals, such as neuronal differentiation (e.g. APOD and FGF4), and craniofacial development (e.g. FILIP1L). A number of the identified STRs of ≥5-repeats may be human-specific (e.g. ZMYM3 and DAZAP1). Future work is warranted to examine the importance of the listed genes in primate/human evolution, development, and disease.
Collapse
Affiliation(s)
- P Namdar-Aligoodarzi
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - S Mohammadparast
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - B Zaker-Kandjani
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - S Talebi Kakroodi
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - M Jafari Vesiehsari
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - M Ohadi
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
| |
Collapse
|
12
|
A primate-specific functional GTTT-repeat in the core promoter of CYTH4 is linked to bipolar disorder in human. Prog Neuropsychopharmacol Biol Psychiatry 2015; 56:161-7. [PMID: 25240857 DOI: 10.1016/j.pnpbp.2014.09.001] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/30/2014] [Accepted: 09/10/2014] [Indexed: 12/20/2022]
Abstract
Evidence of primate-specific genes and gene regulatory mechanisms linked to bipolar disorder (BD) lend support to evolutionary/adaptive processes in the pathogenesis of this disorder. Following a genome-scale analysis of the entire protein coding genes annotated in the GeneCards database, we have recently reported that cytohesin-4 (CYTH4) contains the longest tetra-nucleotide short tandem repeat (STR) identified in a human protein-coding gene core promoter, which may be of adaptive advantage to this species. In the current study, we analyzed the evolutionary trend of this STR across evolution. We also analyzed the functional implication and distribution of this STR in a group of patients with type 1 BD (n=233) and controls (n=262). We found that this STR is exceptionally expanded in primates (Fisher exact p<0.00003). Association was observed between type I BD and the 6-repeat allele of this STR, (GTTT)₆ (Yates corrected Χ(2)=12.68, p<0.0001, OR: 1.68). This allele is the shortest length of the GTTT-repeat identified in the human subjects studied. Consistent with that finding, excess homozygosity was observed for the shorter alleles, (GTTT)₆ and (GTTT)₇, vs. the longer alleles, (GTTT)₈ and (GTTT)₉ in the BD group (Yates corrected Χ(2)=5.18, p<0.01, 1 df, OR: 1.96). Using Dual Glo luciferase system in HEK-293 cells, a trend for gene expression repression was observed from the 6- to the 9-repeat allele (p<0.003), and the GTTT-repeat significantly down-regulated gene expression, per se (p<0.0006). This is the first evidence of a link between a primate-specific STR and a major psychiatric disorder in human. It may be speculated that the CYTH4 GTTT-repeat in primates may have conferred selective advantage to this order, reflected in neural function and neurophenotypes. The role of the CYTH4 gene in the pathogenesis of type I BD remains to be clarified in the future studies.
Collapse
|
13
|
Association of vWA and TPOX polymorphisms with venous thrombosis in Mexican mestizos. BIOMED RESEARCH INTERNATIONAL 2014; 2014:697689. [PMID: 25250329 PMCID: PMC4164132 DOI: 10.1155/2014/697689] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 06/18/2014] [Revised: 08/15/2014] [Accepted: 08/18/2014] [Indexed: 01/17/2023]
Abstract
Objective. Venous thromboembolism (VTE) is a multifactorial disorder and, worldwide, the most important cause of morbidity and mortality. Genetic factors play a critical role in its aetiology. Microsatellites are the most important source of human genetic variation having more phenotypic effect than many single nucleotide polymorphisms. Hence, we evaluate a possible relationship between VTE and the genetic variants in von Willebrand factor, human alpha fibrinogen, and human thyroid peroxidase microsatellites to identify possible diagnostic markers. Methods. Genotypes were obtained from 177 patients with VTE and 531 nonrelated individuals using validated genotyping methods. The allelic frequencies were compared; Bayesian methods were used to correct population stratification to avoid spurious associations. Results. The vWA-18, TPOX-9, and TPOX-12 alleles were significantly associated with VTE. Moreover, subjects bearing the combination vWA-18/TPOX-12 loci exhibited doubled risk for VTE (95% CI = 1.02–3.64), whereas the combination vWA-18/TPOX-9 showed an OR = 10 (95% CI = 4.93–21.49). Conclusions. The vWA and TPOX microsatellites are good candidate biomarkers in venous thromboembolism diseases and could help to elucidate their origins. Additionally, these polymorphisms could become useful markers for genetic studies of VTE in the Mexican population; however, further studies should be done owing that this data only show preliminary evidence.
Collapse
|
14
|
Ohadi M, Valipour E, Ghadimi-Haddadan S, Namdar-Aligoodarzi P, Bagheri A, Kowsari A, Rezazadeh M, Darvish H, Kazeminasab S. Core promoter short tandem repeats as evolutionary switch codes for primate speciation. Am J Primatol 2014; 77:34-43. [PMID: 25099915 DOI: 10.1002/ajp.22308] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Revised: 04/07/2014] [Accepted: 05/16/2014] [Indexed: 01/27/2023]
Abstract
Alteration in gene expression levels underlies many of the phenotypic differences across species. Because of their highly mutable nature, proximity to the +1 transcription start site (TSS), and the emerging evidence of functional impact on gene expression, core promoter short tandem repeats (STRs) may be considered an ideal source of variation across species. In a genome-scale analysis of the entire Homo sapiens protein-coding genes, we have previously identified core promoters with at least one STR of ≥ 6-repeats, with possible selective advantage in this species. In the current study, we performed reverse analysis of the entire Homo sapiens orthologous genes in mouse in the Ensembl database, in order to identify conserved STRs that have shrunk as an evolutionary advantage to humans. Two protocols were used to minimize ascertainment bias. Firstly, two species sharing a more recent ancestor with Homo sapiens (i.e. Pan troglodytes and Gorilla gorilla gorilla) were also included in the study. Secondly, four non-primate species encompassing the major orders across Mammals, including Scandentia, Laurasiatheria, Afrotheria, and Xenarthra were analyzed as out-groups. We introduce STR evolutionary events specifically identical in primates (i.e. Homo sapiens, Pan troglodytes, and Gorilla gorilla gorilla) vs. non-primate out-groups. The average frequency of the identically shared STR motifs across those primates ranged between 0.00005 and 0.06. The identified genes are involved in important evolutionary and developmental processes, such as normal craniofacial development (TFAP2B), regulation of cell shape (PALMD), learning and long-term memory (RGS14), nervous system development (GFRA2), embryonic limb morphogenesis (PBX2), and forebrain development (APAF1). We provide evidence of core promoter STRs as evolutionary switch codes for primate speciation, and the first instance of identity-by-descent for those motifs at the interspecies level.
Collapse
Affiliation(s)
- Mina Ohadi
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Mohammadparast S, Bayat H, Biglarian A, Ohadi M. Exceptional expansion and conservation of a CT-repeat complex in the core promoter of PAXBP1 in primates. Am J Primatol 2014; 76:747-56. [PMID: 24573656 DOI: 10.1002/ajp.22266] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2013] [Revised: 12/28/2013] [Accepted: 01/28/2014] [Indexed: 11/11/2022]
Abstract
Adaptive evolution may be linked with the genomic distribution and function of short tandem repeats (STRs). Proximity of the core promoter STRs to the +1 transcription start site (TSS), and their mutable nature are characteristics that highlight those STRs as a novel source of interspecies variation. The PAXBP1 gene (alternatively known as GCFC1) core promoter contains the longest STR identified in a Homo sapiens gene core promoter. Indeed, this core promoter is a stretch of four consecutive CT-STRs. In the current study, we used the Ensembl, NCBI, and UCSC databases to analyze the evolutionary trend and functional implication of this CT-STR complex in six major lineages across vertebrates, including primates, non-primate mammals, birds, reptiles, amphibians, and fish. We observed exceptional expansion (≥4-repeats) and conservation of this CT-STR complex across primates, except prosimians, Microcebus murinus and Otolemur garnettii (Fisher exact P<4.1×10(-7)). H. sapiens has the most complex STR formula, and longest repeats. Macaca mulatta and Callithrix jacchus monkeys have the simplest STR formulas, and shortest repeat numbers. CT≥4-repeats were not detected in non-primate lineages. Different length alleles across the PAXBP1 core promoter CT-STRs significantly altered gene expression in vitro (P<0.001, t-test). PAXBP1 has a crucial role in craniofacial development, myogenesis, and spine morphogenesis, properties that have been diverged between primates and non-primates. To our knowledge, this is the first instance of expansion and conservation of a STR complex co-occurring specifically with the primate lineage.
Collapse
Affiliation(s)
- Saeid Mohammadparast
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | | | | | | |
Collapse
|
16
|
Bolton KA, Ross JP, Grice DM, Bowden NA, Holliday EG, Avery-Kiejda KA, Scott RJ. STaRRRT: a table of short tandem repeats in regulatory regions of the human genome. BMC Genomics 2013; 14:795. [PMID: 24228761 PMCID: PMC3840602 DOI: 10.1186/1471-2164-14-795] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2013] [Accepted: 11/05/2013] [Indexed: 11/22/2022] Open
Abstract
Background Tandem repeats (TRs) are unstable regions commonly found within genomes that have consequences for evolution and disease. In humans, polymorphic TRs are known to cause neurodegenerative and neuromuscular disorders as well as being associated with complex diseases such as diabetes and cancer. If present in upstream regulatory regions, TRs can modify chromatin structure and affect transcription; resulting in altered gene expression and protein abundance. The most common TRs are short tandem repeats (STRs), or microsatellites. Promoter located STRs are considerably more polymorphic than coding region STRs. As such, they may be a common driver of phenotypic variation. To study STRs located in regulatory regions, we have performed genome-wide analysis to identify all STRs present in a region that is 2 kilobases upstream and 1 kilobase downstream of the transcription start sites of genes. Results The Short Tandem Repeats in Regulatory Regions Table, STaRRRT, contains the results of the genome-wide analysis, outlining the characteristics of 5,264 STRs present in the upstream regulatory region of 4,441 human genes. Gene set enrichment analysis has revealed significant enrichment for STRs in cellular, transcriptional and neurological system gene promoters and genes important in ion and calcium homeostasis. The set of enriched terms has broad similarity to that seen in coding regions, suggesting that regulatory region STRs are subject to similar evolutionary pressures as STRs in coding regions and may, like coding region STRs, have an important role in controlling gene expression. Conclusions STaRRRT is a readily-searchable resource for investigating potentially polymorphic STRs that could influence the expression of any gene of interest. The processes and genes enriched for regulatory region STRs provide potential novel targets for diagnosing and treating disease, and support a role for these STRs in the evolution of the human genome.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Rodney J Scott
- Centre for Information-Based Medicine, Hunter Medical Research Institute, Newcastle, NSW, Australia.
| |
Collapse
|
17
|
Valipour E, Kowsari A, Bayat H, Banan M, Kazeminasab S, Mohammadparast S, Ohadi M. Polymorphic core promoter GA-repeats alter gene expression of the early embryonic developmental genes. Gene 2013; 531:175-9. [PMID: 24055488 DOI: 10.1016/j.gene.2013.09.032] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2013] [Revised: 09/05/2013] [Accepted: 09/06/2013] [Indexed: 12/20/2022]
Abstract
Protein complexes that bind to 'GAGA' DNA elements are necessary to replace nucleosomes to create a local chromatin environment that facilitates a variety of site-specific regulatory responses. Three to four elements are required for the disruption of a preassembled nucleosome. We have previously identified human protein-coding gene core promoters that are composed of exceptionally long GA-repeats. The functional implication of those GA-repeats is beginning to emerge in the core promoter of the human SOX5 gene, which is involved in multiple developmental processes. In the current study, we analyze the functional implication of GA-repeats in the core promoter of two additional genes, MECOM and GABRA3, whose expression is largely limited to embryogenesis. We report a significant difference in gene expression as a result of different alleles across those core promoters in the HEK-293 cell line. Across-species homology check for the GABRA3 GA-repeats revealed that those repeats are evolutionary conserved in mouse and primates (p<1 × 10(-8)). The MECOM core promoter GA-repeats are also conserved in numerous species, of which human has the longest repeat and complexity. We propose a novel role for GA-repeat core promoters to regulate gene expression in the genes involved in development and evolution.
Collapse
Affiliation(s)
- E Valipour
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | | | | | | | | | | | | |
Collapse
|
18
|
Biased homozygous haplotypes across the human caveolin 1 upstream purine complex in Parkinson's disease. J Mol Neurosci 2013; 51:389-93. [PMID: 23640536 DOI: 10.1007/s12031-013-0021-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2013] [Accepted: 04/22/2013] [Indexed: 01/13/2023]
Abstract
The alpha-synuclein-caveolin 1 axis is suggested to be of role in the pathogenesis of Parkinson's disease in cell line models. The objective of this study was to analyze the homozygous haplotype compartment of the human caveolin 1 gene upstream purine complex in patients afflicted with Parkinson's disease. This complex was screened in patients with Parkinson's disease (n = 141) and compared with a group of controls (n = 760) using polymerase chain reaction and sequencing. The expression activity of the homozygous haplotypes was then examined using luciferase Dual-Glo system in human neuronal cell line, LAN-5. Six haplotypes were found to be homozygous in the patients, and not in the control pool (Fisher exact p < 1 × 10(-6)). Three of those haplotypes were specific to Parkinson's disease (Fisher exact p < 0.002), and the remaining three overlapped with homozygous haplotypes in Alzheimer's disease and multiple sclerosis (Fisher exact p < 0.002). The disease haplotypes contained motif lengths that were nonexistent in the control homozygous haplotype pool and significantly increased gene expression (p < 9 × 10(-6)). We conclude that skew in the caveolin 1 purine complex homozygous haplotype compartment and an additive effect of those haplotypes may be linked with Parkinson's disease.
Collapse
|
19
|
Evolutionary trend of exceptionally long human core promoter short tandem repeats. Gene 2012; 507:61-7. [PMID: 22796130 DOI: 10.1016/j.gene.2012.07.001] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2012] [Revised: 06/26/2012] [Accepted: 07/02/2012] [Indexed: 11/24/2022]
Abstract
Short tandem repeats (STRs) are variable elements that play a significant role in genome evolution by creating and maintaining quantitative genetic variation. Because of their proximity to the +1 transcription start site (TSS) and polymorphic nature, core promoter STRs may be considered a novel source of variation across species. In a genome-scale analysis of the entire human protein-coding genes annotated in the GeneCards database (19,927), we analyze the prevalence and repeat numbers of different classes of core promoter STRs in the interval between -120 and +1 to the TSS. We also analyze the evolutionary trend of exceptionally long core promoter STRs of ≥6-repeats. 133 genes (~2%) had core promoter STRs of ≥6-repeats. In the majority of those genes, the STR motifs were found to be conserved across evolution. Di-nucleotide repeats had the highest representation in the human core promoter long STRs (72 genes). Tri- (52 genes), tetra-, penta-, and hexa-nucleotide STRs (9 genes) were also present in the descending prevalence. The majority of those genes (84 genes) revealed directional expansion of core promoter STRs from mouse to human. However, in a number of genes, the difference in average allele size across species was sufficiently small that there might be a constraint on the evolution of average allele size. Random drift of STRs from mouse to human was also observed in a minority of genes. Future work on the genes listed in the current study may further our knowledge into the potential importance of core promoter STRs in human evolution.
Collapse
|
20
|
Borel C, Migliavacca E, Letourneau A, Gagnebin M, Béna F, Sailani MR, Dermitzakis ET, Sharp AJ, Antonarakis SE. Tandem repeat sequence variation as causative cis-eQTLs for protein-coding gene expression variation: the case of CSTB. Hum Mutat 2012; 33:1302-9. [PMID: 22573514 DOI: 10.1002/humu.22115] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2012] [Accepted: 04/26/2012] [Indexed: 11/05/2022]
Abstract
Association studies have revealed expression quantitative trait loci (eQTLs) for a large number of genes. However, the causative variants that regulate gene expression levels are generally unknown. We hypothesized that copy-number variation of sequence repeats contribute to the expression variation of some genes. Our laboratory has previously identified that the rare expansion of a repeat c.-174CGGGGCGGGGCG in the promoter region of the CSTB gene causes a silencing of the gene, resulting in progressive myoclonus epilepsy. Here, we genotyped the repeat length and quantified CSTB expression by quantitative real-time polymerase chain reaction in 173 lymphoblastoid cell lines (LCLs) and fibroblast samples from the GenCord collection. The majority of alleles contain either two or three copies of this repeat. Independent analysis revealed that the c.-174CGGGGCGGGGCG repeat length is strongly associated with CSTB expression (P = 3.14 × 10(-11)) in LCLs only. Examination of both genotyped and imputed single-nucleotide polymorphisms (SNPs) within 2 Mb of CSTB revealed that the dodecamer repeat represents the strongest cis-eQTL for CSTB in LCLs. We conclude that the common two or three copy variation is likely the causative cis-eQTL for CSTB expression variation. More broadly, we propose that polymorphic tandem repeats may represent the causative variation of a fraction of cis-eQTLs in the genome.
Collapse
Affiliation(s)
- Christelle Borel
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland
| | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Heidari A, Hosseinkhani S, Talebi S, Meshkani R, Esmaeilzadeh-Gharedaghi E, Banan M, Darvish H, Ohadi M. Haplotypes across the human caveolin 1 gene upstream purine complex significantly alter gene expression: implication in neurodegenerative disorders. Gene 2012; 505:186-9. [PMID: 22659071 DOI: 10.1016/j.gene.2012.05.018] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2012] [Revised: 05/04/2012] [Accepted: 05/11/2012] [Indexed: 12/19/2022]
Abstract
We have previously reported a polymorphic purine complex at the 1.5 kb upstream region of the human caveolin 1 (CAV1) gene that is conserved across several species in respect with sequence motifs and the location of the complex. The IRF and Ets transcription factors have common binding sites for this region across those species. We have also shown skew in the homozygote haplotype compartment of this complex in two neurodegenerative disorders, sporadic late-onset Alzheimer's disease (AD), and multiple sclerosis (MS), versus disease-free controls (p<0.0000001). In the current study, we analyze the functional implication of the disease homozygote haplotypes (i.e. 102-bp and 142-bp) vs. control homozygote haplotype (110-bp) in three neuronal cell lines, LAN-5, U-87 MG, and N2A, using dual luciferase reporter system. A significant increase in gene expression was observed in the cell lines with the disease haplotype constructs vs. control haplotype in the three cell lines (t-test p<4 × 10(-4), 1 × 10(-6), and 3 × 10(-4)), respectively. We conclude that the human CAV1 upstream purine complex modifies gene expression. An additive effect of the haplotypes in the homozygous status is speculated based on the skew in the homozygote haplotypes in neurodegenerative disorders.
Collapse
Affiliation(s)
- A Heidari
- Genetics Research Center, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | | | | | | | | | | | | | | |
Collapse
|