1
|
Morbia I, Kumar P, Satish AL, Mudgal A, Datta S, Singh U. The amniote-conserved DNA-binding domain of CGGBP1 restricts cytosine methylation of transcription factor binding sites in proximal promoters to regulate gene expression. BMC Genom Data 2024; 25:98. [PMID: 39558239 PMCID: PMC11575156 DOI: 10.1186/s12863-024-01282-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Accepted: 11/14/2024] [Indexed: 11/20/2024] Open
Abstract
CGGBP1 is a GC-rich DNA-binding protein which is important for genomic integrity, gene expression and epigenome maintenance through regulation of CTCF occupancy and cytosine methylation. It has remained unclear how CGGBP1 integrates multiple diverse functions with its simple architecture of only a DNA-binding domain tethered to a C-terminal tail with low structural rigidity. We have used truncated forms of CGGBP1 with or without the DNA-binding domain (DBD) to assay cytosine methylation and global gene expression. Proximal promoters of CGGBP1-repressed genes, although significantly GC-poor, contain GC-rich transcription factor binding motifs and exhibit base compositions indicative of low C-T transition rates due to prevention of cytosine methylation. Genome-wide analyses of cytosine methylation and binding of CGGBP1 DBD show that CGGBP1 restricts cytosine methylation in a manner that depends on its DBD and its DNA-binding. The CGGBP1-repressed genes show an increase in promoter cytosine methylation alongside a decrease in transcript abundance when the DBD-deficient CGGBP1 is expressed. Our findings suggest that CGGBP1 protects transcription factor binding sites (TFBS) from cytosine methylation-associated loss and thereby regulates gene expression. By analysing orthologous promoter sequences we show that restriction of cytosine methylation is a function of CGGBP1 progressively acquired during vertebrate evolution. A superimposition of our results and evolution of CGGBP1 suggests that mitigation of cytosine methylation is majorly achieved by its N-terminal DBD. Our results position CGGBP1 DNA-binding as a major evolutionarily acquired mechanism through which it keeps cytosine methylation under check and regulates TFBS retention and gene activity.
Collapse
Affiliation(s)
- Ishani Morbia
- Department of Biological Sciences and Engineering, Indian Institute of Technology Gandhinagar, Gandhinagar, India
| | - Praveen Kumar
- Department of Biological Sciences and Engineering, Indian Institute of Technology Gandhinagar, Gandhinagar, India
| | - Aditi Lakshmi Satish
- Department of Biological Sciences and Engineering, Indian Institute of Technology Gandhinagar, Gandhinagar, India
| | - Akanksha Mudgal
- Department of Biopharmacy, Medical University of Lublin, Lublin, Poland
| | - Subhamoy Datta
- Applied Tumor Genomics, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Umashankar Singh
- Department of Biological Sciences and Engineering, Indian Institute of Technology Gandhinagar, Gandhinagar, India.
| |
Collapse
|
2
|
Olson DR, Wheeler TJ. ULTRA-effective labeling of tandem repeats in genomic sequence. BIOINFORMATICS ADVANCES 2024; 4:vbae149. [PMID: 39575229 PMCID: PMC11580682 DOI: 10.1093/bioadv/vbae149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 09/19/2024] [Accepted: 10/07/2024] [Indexed: 11/24/2024]
Abstract
In the age of long read sequencing, genomics researchers now have access to accurate repetitive DNA sequence (including satellites) that, due to the limitations of short read-sequencing, could previously be observed only as unmappable fragments. Tools that annotate repetitive sequence are now more important than ever, so that we can better understand newly uncovered repetitive sequences, and also so that we can mitigate errors in bioinformatic software caused by those repetitive sequences. To that end, we introduce the 1.0 release of our tool for identifying and annotating locally repetitive sequence, ULTRA Locates Tandemly Repetitive Areas (ULTRA). ULTRA is fast enough to use as part of an efficient annotation pipeline, produces state-of-the-art reliable coverage of repetitive regions containing many mutations, and provides interpretable statistics and labels for repetitive regions. Availability and implementation ULTRA is released under an open source license, and is available for download at https://github.com/TravisWheelerLab/ULTRA.
Collapse
Affiliation(s)
- Daniel R Olson
- Department of Computer Science, University of Montana, Missoula, MT 59812, United States
| | - Travis J Wheeler
- Department of Computer Science, University of Montana, Missoula, MT 59812, United States
- Department of Pharmacy Practice & Science, R. Ken Coit College of Pharmacy, University of Arizona, Tucson, AZ 85721, United States
| |
Collapse
|
3
|
Horton CA, Alexandari AM, Hayes MGB, Marklund E, Schaepe JM, Aditham AK, Shah N, Suzuki PH, Shrikumar A, Afek A, Greenleaf WJ, Gordân R, Zeitlinger J, Kundaje A, Fordyce PM. Short tandem repeats bind transcription factors to tune eukaryotic gene expression. Science 2023; 381:eadd1250. [PMID: 37733848 DOI: 10.1126/science.add1250] [Citation(s) in RCA: 72] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 07/26/2023] [Indexed: 09/23/2023]
Abstract
Short tandem repeats (STRs) are enriched in eukaryotic cis-regulatory elements and alter gene expression, yet how they regulate transcription remains unknown. We found that STRs modulate transcription factor (TF)-DNA affinities and apparent on-rates by about 70-fold by directly binding TF DNA-binding domains, with energetic impacts exceeding many consensus motif mutations. STRs maximize the number of weakly preferred microstates near target sites, thereby increasing TF density, with impacts well predicted by statistical mechanics. Confirming that STRs also affect TF binding in cells, neural networks trained only on in vivo occupancies predicted effects identical to those observed in vitro. Approximately 90% of TFs preferentially bound STRs that need not resemble known motifs, providing a cis-regulatory mechanism to target TFs to genomic sites.
Collapse
Affiliation(s)
- Connor A Horton
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Amr M Alexandari
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Michael G B Hayes
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Emil Marklund
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Julia M Schaepe
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Arjun K Aditham
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- ChEM-H Institute, Stanford University, Stanford, CA 94305, USA
| | - Nilay Shah
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Peter H Suzuki
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Avanti Shrikumar
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Ariel Afek
- Center for Genomic and Computational Biology, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | | | - Raluca Gordân
- Center for Genomic and Computational Biology, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27710, USA
- Department of Computer Science, Duke University, Durham, NC 27708, USA
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC 27710, USA
| | - Julia Zeitlinger
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
- The University of Kansas Medical Center, Kansas City, KS 66103, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
| | - Polly M Fordyce
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
- ChEM-H Institute, Stanford University, Stanford, CA 94305, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94110, USA
| |
Collapse
|
4
|
Martí E. RNA toxicity induced by expanded CAG repeats in Huntington's disease. Brain Pathol 2018; 26:779-786. [PMID: 27529325 DOI: 10.1111/bpa.12427] [Citation(s) in RCA: 75] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2016] [Accepted: 06/09/2016] [Indexed: 02/03/2023] Open
Abstract
Huntington's disease (HD) belongs to the group of inherited polyglutamine (PolyQ) diseases caused by an expanded CAG repeat in the coding region of the Huntingtin (HTT) gene that results in an elongated polyQ stretch. Abnormal function and aggregation of the mutant protein has been typically delineated as the main molecular cause underlying disease development. However, the most recent advances have revealed novel pathogenic pathways directly dependent on an RNA toxic gain-of-function. Expanded CAG repeats within exon 1 of the HTT mRNA induce toxicity through mechanisms involving, at least in part, gene expression perturbations. This has important implications not only for basic and translational research in HD, but also for other types of diseases carrying the expanded CAG in other genes, which likely share pathogenic aspects. Here I will review the evidence and mechanisms underlying RNA toxicity in CAG repeat expansions, with particular focus on HD. These comprise abnormal subcellular localization of the transcripts containing the expanded CAG repeats; sequestration of several types of proteins by the expanded CAG repeat which results in defects of alternative splicing events and gene expression; and aberrant biogenesis and detrimental activity of small CAG repeated RNAs (sCAG) that produce altered gene silencing. Although these altered pathways have been detected in HD models, their contribution to disease development and progress requires further study.
Collapse
Affiliation(s)
- Eulàlia Martí
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,Institut Hospital del Mar d'Investigacions Mediques (IMIM), Barcelona, 08003, Spain.,Centro de Investigacion Biomedica en Red (CIBERESP), Madrid, Spain
| |
Collapse
|
5
|
Gonthier P, Sillo F, Lagostina E, Roccotelli A, Cacciola OS, Stenlid J, Garbelotto M. Selection processes in simple sequence repeats suggest a correlation with their genomic location: insights from a fungal model system. BMC Genomics 2015; 16:1107. [PMID: 26714466 PMCID: PMC4696308 DOI: 10.1186/s12864-015-2274-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2015] [Accepted: 12/03/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Adaptive processes shape the evolution of genomes and the diverse functions of different genomic regions are likely to have an impact on the trajectory and outcome of this evolution. The main underlying hypothesis of this study is that the evolution of Simple Sequence Repeats (SSRs) is correlated with the evolution of the genomic region in which they are located, resulting in differences of motif size, number of repeats, and levels of polymorphisms. These differences should be clearly detectable when analyzing the frequency and type of SSRs within the genome of a species, when studying populations within a species, and when comparing closely related sister taxa. By coupling a genome-wide SSR survey in the genome of the plant pathogenic fungus Heterobasidion irregulare with an analysis of intra- and interspecific variability of 39 SSR markers in five populations of the two sibling species H. irregulare and H. annosum, we investigated mechanisms of evolution of SSRs. RESULTS Results showed a clear dominance of trirepeats and a selection against other repeat number, i.e. di- and tetranucleotides, both in regions inside Open Reading Frames (ORFs) and upstream 5' untranslated region (5'UTR). Locus per locus AMOVA showed SSRs both inside ORFs and upstream 5'UTR were more conserved within species compared to SSRs in other genomic regions, suggesting their evolution is constrained by the functions of the regions they are in. Principal coordinates analysis (PCoA) indicated that even if SSRs inside ORFs were less polymorphic than those in intergenic regions, they were more powerful in differentiating species. These findings indicate SSRs evolution undergoes a directional selection pressure comparable to that of the ORFs they interrupt and to that of regions involved in regulatory functions. CONCLUSIONS Our work linked the variation and the type of SSRs with regions upstream 5'UTR, putatively harbouring regulatory elements, and shows that the evolution of SSRs might be affected by their location in the genome. Additionally, this study provides a first glimpse on a possible molecular basis for fast adaptation to the environment mediated by SSRs.
Collapse
Affiliation(s)
- Paolo Gonthier
- Department of Agricultural, Forest and Food Sciences, University of Torino, 10095, Grugliasco, Italy.
| | - Fabiano Sillo
- Department of Agricultural, Forest and Food Sciences, University of Torino, 10095, Grugliasco, Italy.
| | - Elisa Lagostina
- Department of Environmental Sciences, Policy and Management, University of California at Berkeley, CA, 94720, Berkeley, USA. .,Department of Earth and Environmental Sciences, University of Pavia, 27100, Pavia, Italy.
| | - Angela Roccotelli
- Department of Environmental Sciences, Policy and Management, University of California at Berkeley, CA, 94720, Berkeley, USA. .,Department of Agriculture, Mediterranean University of Reggio Calabria, 89122, Reggio Calabria, Italy.
| | - Olga Santa Cacciola
- Department of Agriculture, Food and Environment, University of Catania, 95123, Catania, Italy.
| | - Jan Stenlid
- Department of Forest Mycology and Plant Pathology, Swedish University of Agricultural Sciences, 75007, Uppsala, Sweden.
| | - Matteo Garbelotto
- Department of Environmental Sciences, Policy and Management, University of California at Berkeley, CA, 94720, Berkeley, USA.
| |
Collapse
|
6
|
Abstract
The human genome contains multiple stretches of CGG trinucleotide repeats, which act as transcription- and translation-regulatory elements but at the same time form secondary structures that impede replication and give rise to sites of chromosome fragility. Proteins binding to such DNA elements may be involved in divergent cellular processes such as transcription, DNA damage, and epigenetic state of the chromatin. We review here the work done on CGG repeats and associated proteins with special focus on a factor called CGGBP1. CGGBP1 presents with an interesting example of factors that do not have any single dedicated function, but participate indispensably in multiple processes. Both experimental results and data from cancer genome sequencing have revealed that any alteration in CGGBP1 that compromises its function is not tolerated by normal or cancer cells alike. Based upon a large amount of published data, information from databases, and unpublished results, we decipher in this review how CGGBP1 is a classic example of the 'one factor, divergent functions' paradigm of cytoprotection. By taking cues from the studies on CGGBP1, more such factors can be discovered for a better understanding of the evolution of mechanisms of cellular survival.
Collapse
Affiliation(s)
- Umashankar Singh
- Biological Sciences and Engineering, Indian Institute of Technology, Gandhinagar, Gujarat, India
- Correspondence: Umashankar Singh, Biological Sciences and Engineering, Indian Institute of Technology, Gandhinagar, Gujarat, India.
| | - Bengt Westermark
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Rudbeck Laboratory, Uppsala University, Sweden
| |
Collapse
|
7
|
Chen CM, Sio CP, Lu YL, Chang HT, Hu CH, Pai TW. Identification of conserved and polymorphic STRs for personal genomes. BMC Genomics 2014; 15 Suppl 10:S3. [PMID: 25560225 PMCID: PMC4304208 DOI: 10.1186/1471-2164-15-s10-s3] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Background Short tandem repeats (STRs) are abundant in human genomes. Numerous STRs have been shown to be associated with genetic diseases and gene regulatory functions, and have been selected as genetic markers for evolutionary and forensic analyses. High-throughput next generation sequencers have fostered new cutting-edge computing techniques for genome-scale analyses, and cross-genome comparisons have facilitated the efficient identification of polymorphic STR markers for various applications. Results An automated and efficient system for detecting human polymorphic STRs at the genome scale is proposed in this study. Assembled contigs from next generation sequencing data were aligned and calibrated according to selected reference sequences. To verify identified polymorphic STRs, human genomes from the 1000 Genomes Project were employed for comprehensive analyses, and STR markers from the Combined DNA Index System (CODIS) and disease-related STR motifs were also applied as cases for evaluation. In addition, we analyzed STR variations for highly conserved homologous genes and human-unique genes. In total 477 polymorphic STRs were identified from 492 human-unique genes, among which 26 STRs were retrieved and clustered into three different groups for efficient comparison. Conclusions We have developed an online system that efficiently identifies polymorphic STRs and provides novel distinguishable STR biomarkers for different levels of specificity. Candidate polymorphic STRs within a personal genome could be easily retrieved and compared to the constructed STR profile through query keywords, gene names, or assembled contigs.
Collapse
|
8
|
Girgis HZ, Sheetlin SL. MsDetector: toward a standard computational tool for DNA microsatellites detection. Nucleic Acids Res 2013; 41:e22. [PMID: 23034809 PMCID: PMC3592430 DOI: 10.1093/nar/gks881] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2012] [Revised: 08/29/2012] [Accepted: 08/30/2012] [Indexed: 11/12/2022] Open
Abstract
Microsatellites (MSs) are DNA regions consisting of repeated short motif(s). MSs are linked to several diseases and have important biomedical applications. Thus, researchers have developed several computational tools to detect MSs. However, the currently available tools require adjusting many parameters, or depend on a list of motifs or on a library of known MSs. Therefore, two laboratories analyzing the same sequence with the same computational tool may obtain different results due to the user-adjustable parameters. Recent studies have indicated the need for a standard computational tool for detecting MSs. To this end, we applied machine-learning algorithms to develop a tool called MsDetector. The system is based on a hidden Markov model and a general linear model. The user is not obligated to optimize the parameters of MsDetector. Neither a list of motifs nor a library of known MSs is required. MsDetector is memory- and time-efficient. We applied MsDetector to several species. MsDetector located the majority of MSs found by other widely used tools. In addition, MsDetector identified novel MSs. Furthermore, the system has a very low false-positive rate resulting in a precision of up to 99%. MsDetector is expected to produce consistent results across studies analyzing the same sequence.
Collapse
Affiliation(s)
| | - Sergey L. Sheetlin
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 9600 Rockville Pike, Bethesda, MD 20896, USA
| |
Collapse
|
9
|
Krzyzosiak WJ, Sobczak K, Wojciechowska M, Fiszer A, Mykowska A, Kozlowski P. Triplet repeat RNA structure and its role as pathogenic agent and therapeutic target. Nucleic Acids Res 2011; 40:11-26. [PMID: 21908410 PMCID: PMC3245940 DOI: 10.1093/nar/gkr729] [Citation(s) in RCA: 134] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
This review presents detailed information about the structure of triplet repeat RNA and addresses the simple sequence repeats of normal and expanded lengths in the context of the physiological and pathogenic roles played in human cells. First, we discuss the occurrence and frequency of various trinucleotide repeats in transcripts and classify them according to the propensity to form RNA structures of different architectures and stabilities. We show that repeats capable of forming hairpin structures are overrepresented in exons, which implies that they may have important functions. We further describe long triplet repeat RNA as a pathogenic agent by presenting human neurological diseases caused by triplet repeat expansions in which mutant RNA gains a toxic function. Prominent examples of these diseases include myotonic dystrophy type 1 and fragile X-associated tremor ataxia syndrome, which are triggered by mutant CUG and CGG repeats, respectively. In addition, we discuss RNA-mediated pathogenesis in polyglutamine disorders such as Huntington's disease and spinocerebellar ataxia type 3, in which expanded CAG repeats may act as an auxiliary toxic agent. Finally, triplet repeat RNA is presented as a therapeutic target. We describe various concepts and approaches aimed at the selective inhibition of mutant transcript activity in experimental therapies developed for repeat-associated diseases.
Collapse
Affiliation(s)
- Wlodzimierz J Krzyzosiak
- Laboratory of Cancer Genetics, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland.
| | | | | | | | | | | |
Collapse
|
10
|
Abstract
Advances in sequencing technologies have fundamentally changed the pace of genome sequencing projects and have contributed to the ever-increasing volume of genomic data. This has been paralleled by an increase in computational power and resources to process and translate raw sequence data into meaningful information. In addition to protein coding regions, an integral part of all the genomes studied so far has been the presence of repetitive sequences. Previously considered as "junk," numerous studies have implicated repetitive sequences in important biological and structural roles in the genome. Therefore, the identification and characterization of these repetitive sequences has become an indispensable part of genome sequencing projects. Numerous similarity-based and de novo methods have been developed to search for and annotate repeats in the genome, many of which have been discussed in this chapter.
Collapse
|
11
|
Kozlowski P, de Mezer M, Krzyzosiak WJ. Trinucleotide repeats in human genome and exome. Nucleic Acids Res 2010; 38:4027-39. [PMID: 20215431 PMCID: PMC2896521 DOI: 10.1093/nar/gkq127] [Citation(s) in RCA: 94] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Trinucleotide repeats (TNRs) are of interest in genetics because they are used as markers for tracing genotype–phenotype relations and because they are directly involved in numerous human genetic diseases. In this study, we searched the human genome reference sequence and annotated exons (exome) for the presence of uninterrupted triplet repeat tracts composed of six or more repeated units. A list of 32 448 TNRs and 878 TNR-containing genes was generated and is provided herein. We found that some triplet repeats, specifically CNG, are overrepresented, while CTT, ATC, AAC and AAT are underrepresented in exons. This observation suggests that the occurrence of TNRs in exons is not random, but undergoes positive or negative selective pressure. Additionally, TNR types strongly determine their localization in mRNA sections (ORF, UTRs). Most genes containing exon-overrepresented TNRs are associated with gene ontology-defined functions. Surprisingly, many groups of genes that contain TNR types coding for different homo-amino acid tracts associate with the same transcription-related GO categories. We propose that TNRs have potential to be functional genetic elements and that their variation may be involved in the regulation of many common phenotypes; as such, TNR polymorphisms should be considered a priority in association studies.
Collapse
Affiliation(s)
- Piotr Kozlowski
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland.
| | | | | |
Collapse
|
12
|
Shelenkov A, Korotkov A, Korotkov E. MMsat—a database of potential micro- and minisatellites. Gene 2008; 409:53-60. [DOI: 10.1016/j.gene.2007.11.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2007] [Revised: 10/08/2007] [Accepted: 11/16/2007] [Indexed: 11/28/2022]
|
13
|
Domaniç NO, Preparata FP. A novel approach to the detection of genomic approximate tandem repeats in the Levenshtein metric. J Comput Biol 2008; 14:873-91. [PMID: 17803368 DOI: 10.1089/cmb.2007.0018] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022] Open
Abstract
An efficient algorithm for detecting approximate tandem repeats in genomic sequences is presented. The algorithm is based on innovative statistical criteria to detect candidate regions which may include tandem repeats; these regions are subsequently verified by alignments based on dynamic programming. No prior information about the period size or pattern is needed. Also, the algorithm is virtually capable of detecting repeats with any period. An implementation of the algorithm is compared with the two state-of-the-art tandem repeats detection tools to demonstrate its effectiveness both on natural and synthetic data. The algorithm is available at www.cs.brown.edu/people/domanic/tandem/.
Collapse
Affiliation(s)
- Nevzat Onur Domaniç
- Department of Computer Science, Brown University, Providence, Rhode Island 02912, USA
| | | |
Collapse
|
14
|
Shelenkov AA, Skryabin KG, Korotkov EV. Classification analysis of a latent dinucleotide periodicity of plant genomes. RUSS J GENET+ 2008. [DOI: 10.1134/s1022795408010134] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
15
|
|
16
|
Orlov SV, Kuteykin-Teplyakov KB, Ignatovich IA, Dizhe EB, Mirgorodskaya OA, Grishin AV, Guzhova OB, Prokhortchouk EB, Guliy PV, Perevozchikov AP. Novel repressor of the human FMR1 gene - identification of p56 human (GCC)(n)-binding protein as a Krüppel-like transcription factor ZF5. FEBS J 2007; 274:4848-62. [PMID: 17714511 DOI: 10.1111/j.1742-4658.2007.06006.x] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
A series of relatively short (GCC)(n) triplet repeats (n = 3-30) located within regulatory regions of many mammalian genes may be considered as putative cis-acting transcriptional elements (GCC-elements). Fragile X-mental retardation syndrome is caused by an expansion of (GCC)(n) triplet repeats within the 5'-untranslated region of the human fragile X-mental retardation 1 (FMR1) gene. The present study aimed to characterize a novel human (GCC)(n)-binding protein and investigate its possible role in the regulation of the FMR1 gene. A novel human (GCC)(n)-binding protein, p56, was isolated and identified as a Krüppel-like transcription factor, ZF5, by MALDI-TOF analysis. The capacity of ZF5 to specifically interact with (GCC)(n) triplet repeats was confirmed by the electrophoretic mobility shift assay with purified recombinant ZF5 protein. In cotransfection experiments, ZF5 overexpression repressed activity of the GCC-element containing mouse ribosomal protein L32 gene promoter. Moreover, RNA interference assay results showed that endogenous ZF5 acts as a repressor of the human FMR1 gene. Thus, these data identify a new class of ZF5 targets, a subset of genes containing GCC-elements in their regulatory regions, and raise the question of whether transcription factor ZF5 is implicated in the pathogenesis of fragile X syndrome.
Collapse
Affiliation(s)
- Sergey V Orlov
- Department of Biochemistry, Institute of Experimental Medicine, Russian Academy of Medical Sciences, St Petersburg, Russia.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Shelenkov A, Skryabin K, Korotkov E. Search and Classification of Potential Minisatellite Sequences from Bacterial Genomes. DNA Res 2006; 13:89-102. [PMID: 16980713 DOI: 10.1093/dnares/dsl004] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
We used the method of Information Decomposition developed by us to identify the latent dinucleotide periodicity regions in bacterial genomes. The number of potential minisatellite sequences obtained at high level of statistical significance was 454. Then we classified the periodicity matrices and obtained 45 classes. We used the other new method developed by us--Modified Profile Analysis--to reveal more periodic sequences in the presence of indels using the classes obtained. The number of sequences found by combination of these two methods was 3949. Most of them cannot be revealed by other methods including dynamic programming and Fourier transformation.
Collapse
Affiliation(s)
- Andrew Shelenkov
- Bioengineering Centre of Russian Academy of Sciences, Prospect 60-tya Oktyabrya 7/1, 117312 Moscow, Russia.
| | | | | |
Collapse
|
18
|
Zhang Z, Xue Q. Tri-nucleotide repeats and their association with genes in rice genome. Biosystems 2005; 82:248-56. [PMID: 16226835 DOI: 10.1016/j.biosystems.2005.08.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2005] [Accepted: 08/16/2005] [Indexed: 11/27/2022]
Abstract
Tri-nucleotide repeats (TNRs) are extremely abundant in rice genome, of which CCG/CGG repeats have an advantage over other repeats, with approximate half of all the TNRs in the genome. Our results show that rice genome has relatively abundant TNRs with high GC content, and containing only purines or pyrimidines under the same GC content. The AAT/ATT repeats that occur predominantly in intergenic and intronic regions have a considerably higher average length than that of other repeats. The highest frequency of TNRs occurs in 5'-UTR regions, followed by in coding and 5'-flanking regions. Purines-rich TNRs prefer to the coding regions, but pyrimidines-rich TNRs exhibit a stronger bias to upstream regions, suggesting that they might be considered as the regulatory elements in gene expression. As if TNRs located predominantly near the start of coding regions do not significantly influence on the protein function.
Collapse
Affiliation(s)
- Zhonghua Zhang
- James D. Watson Institute of Genome Science, Zhejiang University, Hangzhou 310008, China
| | | |
Collapse
|
19
|
O'Dushlaine CT, Edwards RJ, Park SD, Shields DC. Tandem repeat copy-number variation in protein-coding regions of human genes. Genome Biol 2005; 6:R69. [PMID: 16086851 PMCID: PMC1273636 DOI: 10.1186/gb-2005-6-8-r69] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2005] [Revised: 05/31/2005] [Accepted: 07/13/2005] [Indexed: 12/01/2022] Open
Abstract
Tandem repeat polymorphisms in human proteins were characterized using the UniGene dataset. This analysis suggests that 1 in 20 proteins are likely to contain tandem repeat copy-number polymorphisms within coding regions; these were prevalent among protein-binding proteins. Background Tandem repeat variation in protein-coding regions will alter protein length and may introduce frameshifts. Tandem repeat variants are associated with variation in pathogenicity in bacteria and with human disease. We characterized tandem repeat polymorphism in human proteins, using the UniGene database, and tested whether these were associated with host defense roles. Results Protein-coding tandem repeat copy-number polymorphisms were detected in 249 tandem repeats found in 218 UniGene clusters; observed length differences ranged from 2 to 144 nucleotides, with unit copy lengths ranging from 2 to 57. This corresponded to 1.59% (218/13,749) of proteins investigated carrying detectable polymorphisms in the copy-number of protein-coding tandem repeats. We found no evidence that tandem repeat copy-number polymorphism was significantly elevated in defense-response proteins (p = 0.882). An association with the Gene Ontology term 'protein-binding' remained significant after covariate adjustment and correction for multiple testing. Combining this analysis with previous experimental evaluations of tandem repeat polymorphism, we estimate the approximate mean frequency of tandem repeat polymorphisms in human proteins to be 6%. Because 13.9% of the polymorphisms were not a multiple of three nucleotides, up to 1% of proteins may contain frameshifting tandem repeat polymorphisms. Conclusion Around 1 in 20 human proteins are likely to contain tandem repeat copy-number polymorphisms within coding regions. Such polymorphisms are not more frequent among defense-response proteins; their prevalence among protein-binding proteins may reflect lower selective constraints on their structural modification. The impact of frameshifting and longer copy-number variants on protein function and disease merits further investigation.
Collapse
Affiliation(s)
- Colm T O'Dushlaine
- Bioinformatics Core, Department of Clinical Pharmacology and Institute of Biopharmaceutical Sciences, Royal College of Surgeons in Ireland, 123 St Stephen's Green, Dublin 2, Ireland
| | - Richard J Edwards
- Bioinformatics Core, Department of Clinical Pharmacology and Institute of Biopharmaceutical Sciences, Royal College of Surgeons in Ireland, 123 St Stephen's Green, Dublin 2, Ireland
| | - Stephen D Park
- Bioinformatics Core, Department of Clinical Pharmacology and Institute of Biopharmaceutical Sciences, Royal College of Surgeons in Ireland, 123 St Stephen's Green, Dublin 2, Ireland
| | - Denis C Shields
- Bioinformatics Core, Department of Clinical Pharmacology and Institute of Biopharmaceutical Sciences, Royal College of Surgeons in Ireland, 123 St Stephen's Green, Dublin 2, Ireland
| |
Collapse
|
20
|
Trotta E, Del Grosso N, Erba M, Melino S, Cicero D, Paci M. Interaction of DAPI with individual strands of trinucleotide repeats. Effects of replication in vitro of the AAT x ATT triplet. ACTA ACUST UNITED AC 2004; 270:4755-61. [PMID: 14622264 DOI: 10.1046/j.1432-1033.2003.03877.x] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
The structural changes produced by the minor-groove binding ligand DAPI (4',6-diamidine-2-phenylindole) on individual strands of trinucleotide repeat sequences were detected by electrophoretic band-shift analysis and related to their effects on DNA replication in vitro. Among the 20 possible single-stranded trinucleotide repeats, only the T-rich strand of the AAT.ATT triplet exhibits an observable fluorescence band and a change in electrophoretic mobility due to the drug binding. This is attributable to the property of DAPI that favours folding of the random coil ATT strand into a fast-migrating hairpin structure by a minor-groove binding mechanism. Electrophoretic characteristics of AAT, ACT, AGT, ATG and ATC are unchanged by DAPI, suggesting the crucial role of T.T with respect to A.A, C.C and G.G mismatch, in favouring the binding properties and the structural features of the ATT-DAPI complexes. Primer extension experiments, using the Klenow fragment of DNA polymerase I, demonstrate that such a selective structural change at ATT targets presents a marked property to stall DNA replication in vitro in comparison with the complementary AAT and a random GC-rich sequence. The results suggest a novel molecular mechanism of action of the DNA minor-groove binding ligand DAPI.
Collapse
Affiliation(s)
- Edoardo Trotta
- Istituto di Neurobiologia e Medicina Molecolare, Consiglio Nazionale delle Ricerche, Roma, Italy.
| | | | | | | | | | | |
Collapse
|
21
|
Zhang YA, Nie P, Luo HY, Wang YP, Sun YH, Zhu ZY. Characterization of cDNA encoding immunoglobulin light chain of the mandarin fish (Siniperca chuatsi). Vet Immunol Immunopathol 2003; 95:81-90. [PMID: 12969639 DOI: 10.1016/s0165-2427(03)00105-3] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Immunoglobulin light chain cDNA sequences of a perciform fish, the mandarin fish Siniperca chuatsi were amplified from head kidney mRNA by reverse transcription (RT)-PCR and RACE methods using degenerated primer and gene specific ones. In cDNA sequences of the VL region, nucleotide exchanges were present mainly within CDRs, although a lesser degree of variability was also found in FRs. Moreover, the length of CDR1 and CDR3 in the mandarin fish is shorter than in most other fish species. In the middle of S. chuatsi CL region, a microsatellite sequence (AGC)(6-8) was found, which is also present in another perciform species, the spotted wolffish (Anarhichas minor). The comparison of amino acid sequence of the mandarin fish CL domain with those of other vertebrates showed the highest degree of similarity of 94.5% to the spotted wolffish, while the similarity with rainbow trout (Oncorhynchus mykiss) Ig L1 (62.7%) and channel catfish (Ictalurus punctatus) Ig LG (55.9%) isotypes is also higher. However, there is only 50% identity in the VL regions between the mandarin fish and the wolffish. The sequence similarity of the mandarin fish CL domain with those of higher vertebrate did not readily allow it to be classified as kappa or lambda isotype. The phylogenetic analyses also demonstrated that the CL genes of the mandarin fish and most other teleost fish cluster as a separate branch out of the mammal kappa and lambda branches.
Collapse
Affiliation(s)
- Y A Zhang
- State Key Laboratory of Freshwater Ecology and Biotechnology, and Laboratory of Fish Diseases, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, Hubei Province, PR China
| | | | | | | | | | | |
Collapse
|
22
|
Kolpakov R, Bana G, Kucherov G. mreps: Efficient and flexible detection of tandem repeats in DNA. Nucleic Acids Res 2003; 31:3672-8. [PMID: 12824391 PMCID: PMC169196 DOI: 10.1093/nar/gkg617] [Citation(s) in RCA: 228] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The presence of repeated sequences is a fundamental feature of genomes. Tandemly repeated DNA appears in both eukaryotic and prokaryotic genomes, it is associated with various regulatory mechanisms and plays an important role in genomic fingerprinting. In this paper, we describe mreps, a powerful software tool for a fast identification of tandemly repeated structures in DNA sequences. mreps is able to identify all types of tandem repeats within a single run on a whole genomic sequence. It has a resolution parameter that allows the program to identify 'fuzzy' repeats. We introduce main algorithmic solutions behind mreps, describe its usage, give some execution time benchmarks and present several case studies to illustrate its capabilities. The mreps web interface is accessible through http://www.loria.fr/mreps/.
Collapse
Affiliation(s)
- Roman Kolpakov
- French-Russian Institute for Informatics and Applied Mathematics, Moscow University, 119899 Moscow, Russia
| | | | | |
Collapse
|
23
|
Bizzaro JW, Marx KA. Poly: a quantitative analysis tool for simple sequence repeat (SSR) tracts in DNA. BMC Bioinformatics 2003; 4:22. [PMID: 12791171 PMCID: PMC165442 DOI: 10.1186/1471-2105-4-22] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2003] [Accepted: 06/05/2003] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Simple sequence repeats (SSRs), microsatellites or polymeric sequences are common in DNA and are important biologically. From mononucleotide to trinucleotide repeats and beyond, they can be found in long (> 6 repeating units) tracts and may be characterized by quantifying the frequencies in which they are found and their tract lengths. However, most of the existing computer programs that find SSR tracts do not include these methods. RESULTS A computer program named Poly has been written not only to find SSR tracts but to analyze the results quantitatively. CONCLUSIONS Poly is significant in its use of non-standard, quantitative methods of analysis. And, with its flexible object model and data structure, Poly and its generated data can be used for even more sophisticated analyses.
Collapse
Affiliation(s)
- Jeff W Bizzaro
- Bioinformatics Organization, Inc., 28 Pope Street, Hudson, MA 01749 USA
- Center for Intelligent Biomaterials, Dept. of Chemistry, University of Massachusetts Lowell, One University Ave., Lowell, MA 01854 USA
| | - Kenneth A Marx
- Center for Intelligent Biomaterials, Dept. of Chemistry, University of Massachusetts Lowell, One University Ave., Lowell, MA 01854 USA
| |
Collapse
|
24
|
Abstract
The fragile X syndrome represents the most common inherited cause of mental retardation worldwide. It is caused by a stretch of CGG repeats within the fragile X gene, which increases in length as it is transmitted from generation to generation. Once the repeat exceeds a threshold length, no protein is produced resulting in the fragile X phenotype. Ten years after the discovery of the gene, much has been learned about the function of the fragile X protein. Knowledge has been collected about the mutation mechanism, although still not all players that allow the destabilization of the CGG repeat are known.
Collapse
Affiliation(s)
- B A Oostra
- Department of Clinical Genetics, Erasmus Universitry, Rotterdam, The Netherlands.
| | | |
Collapse
|
25
|
Arnold R, Mäueler W, Bassili G, Lutz M, Burke L, Epplen TJ, Renkawitz R. The insulator protein CTCF represses transcription on binding to the (gt)(22)(ga)(15) microsatellite in intron 2 of the HLA-DRB1(*)0401 gene. Gene 2000; 253:209-14. [PMID: 10940558 DOI: 10.1016/s0378-1119(00)00271-7] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
The insulator and transcription factor CTCF is a highly conserved 11 zinc finger protein possessing multiple specifities in DNA sequence recognition. CTCF regulates transcription of several genes, like the human oncogene c-myc or the chicken lysozyme gene by binding extremely divergent DNA sequences with different sets of its 11 zinc fingers. Recently, an insulator function was demonstrated for several CTCF binding elements. Here we show that CTCF binds to the (gt)(22)(ga)(15) microsatellite repeat A9 in intron 2 of the HLA-DRB1(*)0401 gene. Reporter gene activity is repressed by the A9 element. This repression is dependent on coexpressed CTCF and is even stronger compared with the CTCF binding site F1 of the chicken lysozyme gene, for which a silencer activity has been shown. This is the first report suggesting a function for microsatellite sequences in regulating specific gene expression.
Collapse
Affiliation(s)
- R Arnold
- Genetisches Institut der Justus-Liebig Universität, 35392, Giessen, Germany
| | | | | | | | | | | | | |
Collapse
|
26
|
Wren JD, Forgacs E, Fondon JW, Pertsemlidis A, Cheng SY, Gallardo T, Williams RS, Shohet RV, Minna JD, Garner HR. Repeat polymorphisms within gene regions: phenotypic and evolutionary implications. Am J Hum Genet 2000; 67:345-56. [PMID: 10889045 PMCID: PMC1287183 DOI: 10.1086/303013] [Citation(s) in RCA: 112] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2000] [Accepted: 06/02/2000] [Indexed: 11/03/2022] Open
Abstract
We have developed an algorithm that predicted 11,265 potentially polymorphic tandem repeats within transcribed sequences. We estimate that 22% (2,207/9,717) of the annotated clusters within UniGene contain at least one potentially polymorphic locus. Our predictions were tested by allelotyping a panel of approximately 30 individuals for 5% of these regions, confirming polymorphism for more than half the loci tested. Our study indicates that tandem-repeat polymorphisms in genes are more common than is generally believed. Approximately 8% of these loci are within coding sequences and, if polymorphic, would result in frameshifts. Our catalogue of putative polymorphic repeats within transcribed sequences comprises a large set of potentially phenotypic or disease-causing loci. In addition, from the anomalous character of the repetitive sequences within unannotated clusters, we also conclude that the UniGene cluster count substantially overestimates the number of genes in the human genome. We hypothesize that polymorphisms in repeated sequences occur with some baseline distribution, on the basis of repeat homogeneity, size, and sequence composition, and that deviations from that distribution are indicative of the nature of selection pressure at that locus. We find evidence of selective maintenance of the ability of some genes to respond very rapidly, perhaps even on intragenerational timescales, to fluctuating selective pressures.
Collapse
Affiliation(s)
- J D Wren
- Program in Genetics, Southwestern Graduate School of Biomedical Sciences, Dallas, TX, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Mäueler W, Bassili G, Epplen C, Keyl HG, Epplen JT. Protein binding to simple repetitive sequences depends on DNA secondary structure(s). Chromosome Res 1999; 7:163-6. [PMID: 10421375 DOI: 10.1023/a:1009275914130] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Simple repetitive DNA sequences are ubiquitous constituents of eukaryotic chromosomes. The properties of simple repeats generate increased interest as expansions of certain trinucleotide blocks cause human diseases. We studied protein binding and structural features of (gaa x ttc)n tracts e.g. in the polymorphic frataxin intron 1 and (gt)n(ga)m stretches from different HLA-DRB1 alleles in their original genomic environments. Electrophoretic mobility shift assays revealed that HeLa nuclear proteins bind to DNA fragments containing these simple repeat blocks. The major retarded protein/DNA complexes comprise, in both cases, zinc-dependent proteins present in nuclear extracts from different cell types. Competition experiments using various simple repeats differing in length and flanking regions demonstrate specific interactions. DNase I footprinting shows protein-binding sites located either within the repeats alone or within the repeats as well as their flanking regions, often with preference for one strand. Comparing different (gt)n(ga)m alleles, a regular pattern of footprints was not detectable in the (gt)n part indicating that the zinc-dependent protein recognizes structural rather than sequence-specific features. OsO4 and DEPC modifications followed by electrophoretic and electron microscopical analyses demonstrate that the homopurine blocks often form different types of intramolecular triple helices. A similar situation was evident using (gaa x ttc)n blocks of different lengths within frataxin intron 1 as targets. These data have functional implications for non-coding (gaa x ttc)n and (gt)n(ga)m tracts with regard to gene expression in vivo.
Collapse
Affiliation(s)
- W Mäueler
- Molecular Human Genetics, Ruhr-University, Bochum, Germany
| | | | | | | | | |
Collapse
|
28
|
Abstract
A tandem repeat in DNA is two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats have been shown to cause human disease, may play a variety of regulatory and evolutionary roles and are important laboratory and analytic tools. Extensive knowledge about pattern size, copy number, mutational history, etc. for tandem repeats has been limited by the inability to easily detect them in genomic sequence data. In this paper, we present a new algorithm for finding tandem repeats which works without the need to specify either the pattern or pattern size. We model tandem repeats by percent identity and frequency of indels between adjacent pattern copies and use statistically based recognition criteria. We demonstrate the algorithm's speed and its ability to detect tandem repeats that have undergone extensive mutational change by analyzing four sequences: the human frataxin gene, the human beta T cellreceptor locus sequence and two yeast chromosomes. These sequences range in size from 3 kb up to 700 kb. A World Wide Web server interface atc3.biomath.mssm.edu/trf.html has been established for automated use of the program.
Collapse
Affiliation(s)
- G Benson
- Department of Biomathematical Sciences, Mount Sinai School of Medicine, New York, NY 10029-6574, USA.
| |
Collapse
|
29
|
Mäueler W, Bassili G, Arnold R, Renkawitz R, Epplen JT. The (gt)n(ga)m containing intron 2 of HLA-DRB alleles binds a zinc-dependent protein and forms non B-DNA structures. Gene 1999; 226:9-23. [PMID: 9889299 DOI: 10.1016/s0378-1119(98)00573-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
We studied protein binding and structural features of perfect and imperfect composite (gt)n(ga)m blocks from different HLA-DRB1 alleles in their original genomic and artificial environments. The major retarded protein/DNA complex of the genomic (gt)n(ga)m fragments comprises a zinc-dependent protein present in nuclear extracts from different cell types. The protein binding is characterized by moderate affinities independent of the polymorphic form of the physiological microsatellite allele. The binding affinity depends on the 5' and 3' adjacent single copy parts. DNase I footprinting of genome-derived fragments revealed that the 5' adjacent sequence and the (gt)n repeat are preferentially protected on the (gt)n(ga)m strand. Comparing three alleles, a regular pattern of footprints was not detectable in the (gt)n part, indicating that the zinc-dependent protein recognizes structural rather than sequence-specific features in this region. Chemical probing resulted in a pattern characteristic for Z-DNA in the (gt)n tract of the fragments. However, EMSA experiments using the Z-DNA specific monoclonal antibody mABZ-22 did not prove the presence of Z-DNA. As demonstrated by chemical modifications of the different (ga)m targets, only one of three (gt)n(ga)m fragments formed intramolecular triplexes of the type H-y3 and H-y5. DNase I footprinting revealed only weak protection, if any, in the homopurine tract. Rather, the (tc)m strands are hypersensitive for DNase I. This is probably due to structural conversions into intramolecular *H-triplexes after binding of HIZP.
Collapse
Affiliation(s)
- W Mäueler
- Molekulare Humangenetik, Ruhr-Universität, 44801, Bochum, Germany
| | | | | | | | | |
Collapse
|
30
|
Mäueler W, Kyas A, Keyl HG, Epplen JT. A genome-derived (gaa.ttc)24 trinucleotide block binds nuclear protein(s) specifically and forms triple helices. Gene 1998; 215:389-403. [PMID: 9714838 DOI: 10.1016/s0378-1119(98)00266-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
The properties of simple trinucleotide repeats generate increased interest as expansions of certain trinucleotide blocks cause human diseases. Here, we studied protein binding and structural features of a perfect (gaa.ttc)24 tract in its original genomic environment. Electrophoretic mobility shift assays revealed that HeLa nuclear proteins bind to the DNA fragment containing the (gaa.ttc)24 block. Competition experiments using simple (gt.ac)n repeats differing in length and flanking regions showed no cross-reactivity with the major retarded band. For the specific (gaa. ttc)n/protein complex, a binding constant of 9.3x10-9 mol/l was determined. DNase I footprinting revealed protein binding sites located exclusively within the repeat with a preference for the (gaa)24 strand. OsO4 and DEPC modifications followed by electrophoretic and electron microscopical analyses showed that the (gaa.ttc)24 block forms different types of intramolecular triple helices: Under superhelical stress, different H-DNA isomers are evident, whereas exclusively H-Y forms were detected in the relaxed state. Together, these data have functional implications for genomic (gaa.ttc)n tracts.
Collapse
Affiliation(s)
- W Mäueler
- Department of Molecular Human Genetics, Ruhr University, 44780, Bochum, Germany
| | | | | | | |
Collapse
|
31
|
Kooy RF, Oostra BA, Willems PJ. The fragile X syndrome and other fragile site disorders. Results Probl Cell Differ 1998; 21:1-46. [PMID: 9670313 DOI: 10.1007/978-3-540-69680-3_1] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Affiliation(s)
- R F Kooy
- Department of Medical Genetics, University of Antwerp, Belgium.
| | | | | |
Collapse
|
32
|
|
33
|
Abstract
Algorithm development for comparing and aligning biological sequences has, until recently, been based on the SI model of mutational events which assumes that modification of sequences proceeds through any of the operations of substitution, insertion or deletion (the latter two collectively termed indels). While this model has worked fairly well, it has long been apparent that other mutational events occur. In this paper, we introduce a new model, the DSI model which includes another common mutational event, tandem duplication. Tandem duplication produces tandem repeats which are common in DNA, making up perhaps 10% of the human genome. They are responsible for some human diseases and may serve a multitude of functions in DNA regulation and evolution. Using the DSI model, we develop new exact and heuristic algorithms for comparing and aligning DNA sequences when they contain tandem repeats.
Collapse
Affiliation(s)
- G Benson
- Department of Biomathematical Sciences, Mount Sinai School of Medicine, New York, New York 10029-6574, USA.
| |
Collapse
|
34
|
Grønskov K, Hjalgrim H, Bjerager MO, Brøndum-Nielsen K. Deletion of all CGG repeats plus flanking sequences in FMR1 does not abolish gene expression. Am J Hum Genet 1997; 61:961-7. [PMID: 9382110 PMCID: PMC1716002 DOI: 10.1086/514872] [Citation(s) in RCA: 24] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
The fragile X syndrome is due to the new class of dynamic mutations. It is associated with an expansion of a trinucleotide repeat (CGG) in exon 1 of the fragile X mental retardation gene 1 gene (FMR1). Here we present a fragile X family with an unique female patient who was rendered hemizygous for the FRAXA locus due to a large deletion of one X chromosome. In addition, the other X had a microdeletion in FMR1. PCR and sequence analysis revealed that the microdeletion included all CGG repeats plus 97 bp of flanking sequences, leaving transcription start site and translation start site intact. Despite this total lack of CGG repeats in the FMR1 gene, Western blot analysis showed expression of FMRP, and the patient's phenotype was essentially normal. X-inactivation studies of the androgen-receptor (AR) locus and haplotype determination of microsatellite markers gave evidence that the deletion probably originated from regression of a fully mutated FMR1 gene. Although the minimal number of CGG repeats hitherto reported in FRAXA is six, and at least four other genes associated with CGG repeats are known, suggesting an as yet unknown function of these repeats, our study clearly demonstrates that the absence of CGG repeats does not abolish expression of the FMR1 gene in lymphoblastoid cells.
Collapse
Affiliation(s)
- K Grønskov
- Department of Medical Genetics, John F. Kennedy Institute, Glostrup, Denmark
| | | | | | | |
Collapse
|
35
|
Debrauwere H, Gendrel CG, Lechat S, Dutreix M. Differences and similarities between various tandem repeat sequences: minisatellites and microsatellites. Biochimie 1997; 79:577-86. [PMID: 9466695 DOI: 10.1016/s0300-9084(97)82006-8] [Citation(s) in RCA: 80] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Tandemly repetitive DNA sequences are abundantly interspersed in the genome of practically all eukaryotic species studied. The relative occurrence of one type of repetitive sequence and its location in the genome appear to be species specific. A common property of repetitive sequences within the living world is their ability to give rise to variants with increased or reduced number of repeats. This instability depends upon numerous parameters whose exact role is unclear: the number of repeats, their sequence content, their chromosomal location, the mismatch repair capability of the cell, the developmental stage of the cell (mitotic or meiotic) and/or the sex of the transmitting parent. It is now apparent that mutations in repetitive sequences are a common cause of human disease, including cancer and disorders which may exhibit a dominant mode of inheritance. Two mechanisms have been proposed to explain the instability of repetitive sequences: DNA polymerase slippage, which may account for the instability of short repeats and unequal recombination which reshuffles repeat variants and maintains repeat heterogeneity in minisatellites. The purpose of this review is to show that no general rule can explain the instability of repetitive sequence. Each sequence of repeats is under the influence of local and general biological activities that determine its level of instability.
Collapse
Affiliation(s)
- H Debrauwere
- Institut Curie, Section de Recherche UMR144-CNRS, Paris, France
| | | | | | | |
Collapse
|
36
|
Bacolla A, Gellibolian R, Shimizu M, Amirhaeri S, Kang S, Ohshima K, Larson JE, Harvey SC, Stollar BD, Wells RD. Flexible DNA: genetically unstable CTG.CAG and CGG.CCG from human hereditary neuromuscular disease genes. J Biol Chem 1997; 272:16783-92. [PMID: 9201983 DOI: 10.1074/jbc.272.27.16783] [Citation(s) in RCA: 91] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
The properties of duplex CTG.CAG and CGG.CCG, which are involved in the etiology of several hereditary neurodegenerative diseases, were investigated by a variety of methods, including circularization kinetics, apparent helical repeat determination, and polyacrylamide gel electrophoresis. The bending moduli were 1.13 x 10(-19) erg.cm for CTG and 1.27 x 10(-19) erg.cm for CGG, approximately 40% less than for random B-DNA. Also, the persistence lengths of the triplet repeat sequences were approximately 60% the value for random B-DNA. However, the torsional moduli and the helical repeats were 2.3 x 10(-19) erg.cm and 10.4 base pairs (bp)/turn for CTG and 2.4 x 10(-19) erg.cm and 10.3 bp/turn for CGG, respectively, all within the range for random B-DNA. Determination of the apparent helical repeat by the band shift assay indicated that the writhe of the repeats was different from that of random B-DNA. In addition, molecules of 224-245 bp in length (64-71 triplet repeats) were able to form topological isomers upon cyclization. The low bending moduli are consistent with predictions from crystallographic variations in slide, roll, and tilt. No unpaired bases or non-B-DNA structures could be detected by chemical and enzymatic probe analyses, two-dimensional agarose gel electrophoresis, and immunological studies. Hence, CTG and CGG are more flexible and highly writhed than random B-DNA and thus would be expected to act as sinks for the accumulation of superhelical density.
Collapse
Affiliation(s)
- A Bacolla
- Center for Genome Research, Institute of Biosciences and Technology, Texas A & M University, Texas Medical Center, 2121 Holcombe Blvd., Houston, Texas 77030, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Deissler H, Wilm M, Genç B, Schmitz B, Ternes T, Naumann F, Mann M, Doerfler W. Rapid protein sequencing by tandem mass spectrometry and cDNA cloning of p20-CGGBP. A novel protein that binds to the unstable triplet repeat 5'-d(CGG)n-3' in the human FMR1 gene. J Biol Chem 1997; 272:16761-8. [PMID: 9201980 DOI: 10.1074/jbc.272.27.16761] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
The autonomous expansion of the unstable 5'-d(CGG)n-3' repeat in the 5'-untranslated region of the human FMR1 gene leads to the fragile X syndrome, one of the most frequent causes of mental retardation in human males. We have recently described the isolation of a protein p20-CGGBP that binds sequence-specifically to the double-stranded trinucleotide repeat 5'-d(CGG)-3' (Deissler, H., Behn-Krappa, A., and Doerfler, W. (1996) J. Biol. Chem. 271, 4327-4334). We demonstrate now that the p20-CGGBP can also bind to an interrupted repeat sequence. Peptide sequence tags of p20-CGGBP obtained by nanoelectrospray mass spectrometry were screened against an expressed sequence tag data base, retrieving a clone that contained the full-length coding sequence for p20-CGGBP. A bacterially expressed fusion protein p20-CGGBP-6xHis exhibits a binding pattern to the double-stranded 5'-d(CGG)n-3' repeat similar to that of the authentic p20-CGGBP. This novel protein lacks any overall homology to other known proteins but carries a putative nuclear localization signal. The p20-CGGBP gene is conserved among mammals but shows no homology to non-vertebrate species. The gene encoding the sequence for the new protein has been mapped to human chromosome 3.
Collapse
Affiliation(s)
- H Deissler
- Institut für Genetik, Universität zu Köln, D-50931 Köln, Federal Republic of Germany
| | | | | | | | | | | | | | | |
Collapse
|
38
|
Schwemmle S, de Graaff E, Deissler H, Gläser D, Wöhrle D, Kennerknecht I, Just W, Oostra BA, Döerfler W, Vogel W, Steinbach P, Dörfler W. Characterization of FMR1 promoter elements by in vivo-footprinting analysis. Am J Hum Genet 1997; 60:1354-62. [PMID: 9199556 PMCID: PMC1716109 DOI: 10.1086/515456] [Citation(s) in RCA: 46] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Fragile X syndrome is associated with silencing of the FMR1 gene. We studied the transcriptional regulation, by analysis of the FMR1 promoter region for the presence of in vivo protein/DNA interactions and for cytosine methylation at the single-nucleotide level. Four protein-binding sites were present in the unmethylated promoter of the active FMR1 gene. In the methylated promoter of inactive genes no footprints were detected, and no evidence of active repression was found in the region investigated. We propose that the silencing of FMR1 gene transcription results from a lack of transcription-factor binding.
Collapse
Affiliation(s)
- S Schwemmle
- Abteilung Medizinische Genetik, Universität Ulm, Germany
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Freudenreich CH, Stavenhagen JB, Zakian VA. Stability of a CTG/CAG trinucleotide repeat in yeast is dependent on its orientation in the genome. Mol Cell Biol 1997; 17:2090-8. [PMID: 9121457 PMCID: PMC232056 DOI: 10.1128/mcb.17.4.2090] [Citation(s) in RCA: 170] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Trinucleotide repeat expansion is the causative mutation for a growing number of diseases including myotonic dystrophy, Huntington's disease, and fragile X syndrome. A (CTG/CAG)130 tract cloned from a myotonic dystrophy patient was inserted in both orientations into the genome of Saccharomyces cerevisiae. This insertion was made either very close to the 5' end or very close to the 3' end of a URA3 transcription unit. Regardless of its orientation, no evidence was found for triplet-mediated transcriptional repression of the nearby gene. However, the stability of the tract correlated with its orientation on the chromosome. In one orientation, the (CTG/CAG)130 tract was very unstable and prone to deletions. In the other orientation, the tract was stable, with fewer deletions and two possible cases of expansion detected. Analysis of the direction of replication through the region showed that in the unstable orientation the CTG tract was on the lagging-strand template and that in the stable orientation the CAG tract was on the lagging-strand template. The orientation dependence of CTG/CAG tract instability seen in this yeast system supports models involving hairpin-mediated polymerase slippage previously proposed for trinucleotide repeat expansion.
Collapse
Affiliation(s)
- C H Freudenreich
- Department of Molecular Biology, Princeton University, New Jersey 08544, USA
| | | | | |
Collapse
|
40
|
Aoki T, Koch KS, Leffert HL. Attenuation of gene expression by a trinucleotide repeat-rich tract from the terminal exon of the rat hepatic polymeric immunoglobulin receptor gene. J Mol Biol 1997; 267:229-36. [PMID: 9096221 DOI: 10.1006/jmbi.1997.0890] [Citation(s) in RCA: 25] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
A 359 bp terminal exon fragment of the rat polymeric immunoglobulin receptor gene has been tested for biological effects. The fragment contains an S1 nuclease-sensitive microsatellite with d(GGA) and d(GAA) trinucleotide repeats that are expressed discordantly in the 3'UTRs of liver mRNAs encoded by the single copy gene. When human A293 cells are transfected with expression plasmids carrying this fragment in forward orientations, flanking or replacing poly(A) cassettes in the 3' ends of the transcription units, luciferase reporter gene expression is attenuated 47 to 59% or 98.5%, respectively. In contrast, when the fragment is tested similarly in reverse orientation, there is significantly less or no attenuation of gene expression. These observations, and computer models of partial triplet repeat DNA tertiary and RNA secondary structures, suggest that this fragment might regulate gene expression by orientation and position-dependent mechanisms at transcriptional and post-transcriptional levels.
Collapse
Affiliation(s)
- T Aoki
- Department of Biochemistry, Faculty of Pharmaceutical Sciences, Health Sciences University of Hokkaido, Ishikari-Tobetsu, Japan
| | | | | |
Collapse
|
41
|
Abstract
Most traits in biological populations appear to be under stabilizing selection, which acts to eliminate quantitative genetic variation. Yet, virtually all measured traits in biological populations continue to show significant quantitative genetic variation. The paradox can be resolved by postulating the existence of an abundant, though unspecified, source of mutations that has quantitative effects on phenotype, but does not reduce fitness. Does such a source actually exist? We propose that it does, in the form of repeat-number variation in SSRs (simple sequence repeats, of which the triplet repeats of human neurodegenerative diseases are a special case). Viewing SSRs as a major source of quantitative mutation has broad implications for understanding molecular processes of evolutionary adaptation, including the evolutionary control of the mutation process itself.
Collapse
Affiliation(s)
- Y Kashi
- Department of Food Engineering and Biotechnology, The Technion, Technion City, Haifa, Israel.
| | | | | |
Collapse
|
42
|
Murray J, Cuckle H, Taylor G, Hewison J. Screening for fragile X syndrome: information needs for health planners. J Med Screen 1997; 4:60-94. [PMID: 9275266 DOI: 10.1177/096914139700400204] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Affiliation(s)
- J Murray
- Centre for Reproduction, Growth & Development, Research School of Medicine, University of Leeds, United Kingdom
| | | | | | | |
Collapse
|
43
|
Iber H. Sequence specific binding of cytosolic proteins to a 12 nucleotide sequence in the 5' untranslated region of FMR1 mRNA. BIOCHIMICA ET BIOPHYSICA ACTA 1996; 1309:167-73. [PMID: 8982249 DOI: 10.1016/s0167-4781(96)00154-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
The 5' untranslated region of human FMR1 mRNA is highly conserved, including a 26 nucleotide sequence containing a tandem 12 nucleotide repeat of (G/C)CU(C/G)CCGG(G/A)G(G/C)(G/C) which predates the evolutionary divergence between birds and mammals. We show here that this 12 nucleotide sequence in FMR1 mRNA is a specific binding site for small (< 20 kDa) cytosolic proteins of rat brain. Point mutation analysis identified two guanine residues in this 12 nucleotide repeat which are essential for protein binding. The 12 nucleotide motif sequence was found in the 5'UTR of at least 15 other genes and could be a common target site for these cytosolic RNA-binding proteins.
Collapse
Affiliation(s)
- H Iber
- Howard Hughes Medical Institute, Emory University School of Medicine, Atlanta, GA 30322, USA
| |
Collapse
|
44
|
Vriz S, Joly C, Boulekbache H, Condamine H. Zygotic expression of the zebrafish Sox-19, an HMG box-containing gene, suggests an involvement in central nervous system development. BRAIN RESEARCH. MOLECULAR BRAIN RESEARCH 1996; 40:221-8. [PMID: 8872306 DOI: 10.1016/0169-328x(96)00052-6] [Citation(s) in RCA: 24] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
The zebrafish Sox-19 belongs to the Sry subfamily of HMG (Hight Mobility Group) box genes and is closely related to the Sox sub-group B, comprising the mouse Sox-1, Sox-2 and Sox-3 genes, with respect to both HMG box homology (95.3%) and neural expression during embryogenesis. Analysis of Sox-19 expression during embryogenesis by whole-mount in-situ hybridization revealed interesting features. In early gastrula embryos, Sox-19 transcripts are detected within a circular area in the region of the presomptive central nervous system (CNS) and appears to be the earliest molecular marker of the CNS in vertebrates. In the developing brain, ZfSox-19 mRNA is distributed in the ventral region of the diencephalon, midbrain and hindbrain whereas the expression is excluded from the telencephalon. In spite of the ventral localisation of its mRNA, the expression of this ZfSox-19 gene is completely normal in cyclops embryos which implies that ZfSox-19 expression is independent of the presence of the floor plate.
Collapse
Affiliation(s)
- S Vriz
- Unité de Génétique des Mammifères, Institut Pasteur, Paris, France,
| | | | | | | |
Collapse
|
45
|
Holden JJ, Walker M, Chalifoux M, White BN. Trinucleotide repeats at the FRAXF locus: frequency and distribution in the general population. AMERICAN JOURNAL OF MEDICAL GENETICS 1996; 64:424-7. [PMID: 8844097 DOI: 10.1002/(sici)1096-8628(19960809)64:2<424::aid-ajmg38>3.0.co;2-f] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
FRAXF, the third X-chromosomal fragile site to be cloned, has been shown to harbour a polymorphic compound triplet array: (GCCGTC)n (GCC)n. Expansion and methylation of the GCC-repeat and the neighbouring CpG-rich region result in chromosomal fragility. DNAs from 500 anonymous consecutive newborn males were examined to determine the incidence of various repeat numbers. The range of repeats was from 10-38, with the most common alleles having 14 (52.7%), 12 (16.6%), 21 (9.0%), and 22 (5.2%) triplets. Based on the distribution of repeat numbers, we suggest that the 21-repeat allele resulted from hairpin formation involving 7 GCC-repeats in a 14-repeat allele, accompanied by polymerase slippage. Examination of dinucleotide repeats near the FRAXF repeat will be important in testing this hypothesis. Since the clinical phenotype, if any, of FRAXF is unknown, this database will also be valuable for comparisons with repeat numbers in individuals from special populations.
Collapse
Affiliation(s)
- J J Holden
- Department of Psychiatry, Queen's University, Kingston, Ontario, Canada
| | | | | | | |
Collapse
|
46
|
Longshore JW, Tarleton J. Dynamic mutations in human genes: A review of trinucleotide repeat diseases. J Genet 1996. [DOI: 10.1007/bf02931762] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
47
|
Ohshima K, Kang S, Larson JE, Wells RD. TTA.TAA triplet repeats in plasmids form a non-H bonded structure. J Biol Chem 1996; 271:16784-91. [PMID: 8663378 DOI: 10.1074/jbc.271.28.16784] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
CTG.CAG, CGG.CCG, and AAG.CTT triplet repeats proximal to or in disease genes expand by a non-Mendelian genetic process to cause several human hereditary syndromes. As part of our physical, biological, and genetic studies on the 10 possible triplet repeats, we discovered that the TTA.TAA repeat, isolated from the upstream region of the variant surface glycoprotein gene of Trypanosoma brucei, shows a propensity to adopt a non-H bonded structure under appropriate conditions. The other nine triplet repeat sequences do not exhibit this property. (TTA.TAA)n, where n = 90, 60, 30, and 18, cloned into pUC19 was studied by chemical and enzymatic probes as well as two-dimensional gel electrophoretic analyses under a variety of conditions. The helix opening was observed for all four inserts in supercoiled plasmids as a function of temperature, pH, metal ions, and buffer conditions using OsO4, diethyl pyrocarbonate, and chloroacetaldehyde probes. This unusual property of the TTA.TAA repeat suggests that it plays a different role from the other nine triplet repeats in gene expression.
Collapse
Affiliation(s)
- K Ohshima
- Department of Biochemistry and Biophysics, Texas A&M University, Texas Medical Center, Houston, Texas 77030-3303, USA
| | | | | | | |
Collapse
|
48
|
Epplen JT, Kyas A, Mäueler W. Genomic simple repetitive DNAs are targets for differential binding of nuclear proteins. FEBS Lett 1996; 389:92-5. [PMID: 8682214 DOI: 10.1016/0014-5793(96)00526-1] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
The biological meaning of abundant simple repetitive DNA sequences in eukaryote genomes is obscure. Therefore, (GAA)n, (GT)n, and composite (GT)n(GA)m, blocks were characterized for protein binding in the repeat and flanking sequences of cloned genomic DNA fragments. In gel mobility shift and competition assays the binding of nuclear proteins to the repeats was specific (including some flanking single copy sequences). DNase footprinting revealed the target sequences within and adjacent to the repeats. Chemical modifications (OsO4, DEPC) demonstrated non-B DNA structures in the polypurine blocks. The binding of nuclear proteins in and around simple repeat sequences refute biological insignificance of all of these ubiquitously interspersed elements.
Collapse
|
49
|
Nakamura A, Kojo T, Arahata K, Takeda S. Reduction of serum IgG level and peripheral T-cell counts are correlated with CTG repeat lengths in myotonic dystrophy patients. Neuromuscul Disord 1996; 6:203-10. [PMID: 8784809 DOI: 10.1016/0960-8966(96)00010-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Myotonic dystrophy (DM) is an autosomal dominant multisystem disorder associated with expansion of the CTG repeat within the 3' non-coding region of the myotonin protein kinase (MT-PK) gene. CTG repeat length has been shown to correlate with the clinical category and age at onset of the disease. The relationship between CTG repeat length and immunological parameters were analyzed in this study. We determined CTG repeat length in 14 DM patients and 15 normal controls using Southern and PCR analyses, and then correlated their CTG repeat lengths with their serum immunoglobulin (IgG, IgA, IgM) levels and the number of peripheral white blood cells, including lymphocyte subsets. In DM patients, increasing CTG repeat lengths correlated significantly with decreasing serum IgG levels, decreasing total lymphocyte counts, and decreasing CD2+, CD3+, and CD4+ cell counts. Immunological parameters were also influenced by the expansion of CTG repeat in DM patients.
Collapse
Affiliation(s)
- A Nakamura
- Department of Neuromuscular Research, National Institute of Neuroscience, National Center of Neurology and Psychiatry, Tokyo, Japan
| | | | | | | |
Collapse
|
50
|
Abstract
It is becoming increasingly clear that repetitive DNA is of biological significance as well as experimental importance. Here we review the information available about one type of repetitive DNA, the trinucleotide repeat (CAC)n, and briefly compare it with other trinucleotide repeats. Although much work has been done in analysing DNA fingerprinting patterns produced using the synthetic oligonucleotide (CAC)5 as a probe, there is relatively little information about individual (CAC)n-containing sequences and their abundance, organisation and distribution in mammalian DNA. From the data that is available, it is clear that there are at least two areas that should repay further study: (1) the organisation and generation of long sequences that contain (CAC)n motifs as part of a larger repeating unit (minisatellites) and (2) the distribution of small (CAC)n sequences (microsatellites), in particular their relationship to genes.
Collapse
Affiliation(s)
- A Sertedaki
- Department of Human Genetics, University of Newcastle upon Tyne, UK
| | | |
Collapse
|