1
|
Sharma VK, Kumar N, Brahmachari SK, Ramachandran S. Abundance of dinucleotide repeats and gene expression are inversely correlated: a role for gene function in addition to intron length. Physiol Genomics 2007; 31:96-103. [PMID: 17550993 DOI: 10.1152/physiolgenomics.00183.2006] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
High and broad transcription of eukaryotic genes is facilitated by cost minimization, clustered localization in the genome, elevated G+C content, and low nucleosome formation potential. In this scenario, illumination of correlation between abundance of (TG/CA)(n>or=12) repeats, which are negative cis modulators of transcription, and transcriptional levels and other commonly occurring dinucleotide repeats, is required. Three independent microarray datasets were used to examine the correlation of (TG/CA)(n>or=12) and other dinucleotide repeats with gene expression. Compared with the expected equi-distribution pattern under neutral model, highly transcribed genes were poor in repeats, and conversely, weakly transcribed genes were rich in repeats. Furthermore, the inverse correlation between repeat abundance and transcriptional levels appears to be a global phenomenon encompassing all genes regardless of their breadth of transcription. This selective pattern of exclusion of (TG/CA)(n>or=12) and (AT)(n>or=12) repeats in highly transcribed genes is an additional factor along with cost minimization and elevated GC, and therefore, multiple factors govern high transcription of genes. We observed that even after controlling for the effects of GC and average intron lengths, the effect of repeats albeit somewhat weaker was persistent and definite. In the ribosomal protein coding genes, sequence analysis of orthologs suggests that negative selection for repeats perhaps occurred early in evolution. These observations suggest that negative selection of (TG/CA)(n>or=12) microsatellites in the evolution of the highly expressed genes was also controlled by gene function in addition to intron length.
Collapse
Affiliation(s)
- Vineet K Sharma
- G. N. Ramachandran Knowledge Centre for Genome Informatics, Institute of Genomics and Integrative Biology, Delhi, India
| | | | | | | |
Collapse
|
2
|
Sharma VK, Sharma A, Kumar N, Khandelwal M, Mandapati KK, Horn-Saban S, Strichman-Almashanu L, Lancet D, Brahmachari SK, Ramachandran S. Expoldb: expression linked polymorphism database with inbuilt tools for analysis of expression and simple repeats. BMC Genomics 2006; 7:258. [PMID: 17038195 PMCID: PMC1618849 DOI: 10.1186/1471-2164-7-258] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2006] [Accepted: 10/13/2006] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Quantitative variation in gene expression has been proposed to underlie phenotypic variation among human individuals. A facilitating step towards understanding the basis for gene expression variability is associating genome wide transcription patterns with potential cis modifiers of gene expression. DESCRIPTION EXPOLDB, a novel Database, is a new effort addressing this need by providing information on gene expression levels variability across individuals, as well as the presence and features of potentially polymorphic (TG/CA)n repeats. EXPOLDB thus enables associating transcription levels with the presence and length of (TG/CA)n repeats. One of the unique features of this database is the display of expression data for 5 pairs of monozygotic twins, which allows identification of genes whose variability in expression, are influenced by non-genetic factors including environment. In addition to queries by gene name, EXPOLDB allows for queries by a pathway name. Users can also upload their list of HGNC (HUGO (The Human Genome Organisation) Gene Nomenclature Committee) symbols for interrogating expression patterns. The online application 'SimRep' can be used to find simple repeats in a given nucleotide sequence. To help illustrate primary applications, case examples of Housekeeping genes and the RUNX gene family, as well as one example of glycolytic pathway genes are provided. CONCLUSION The uniqueness of EXPOLDB is in facilitating the association of genome wide transcription variations with the presence and type of polymorphic repeats while offering the feature for identifying genes whose expression variability are influenced by non genetic factors including environment. In addition, the database allows comprehensive querying including functional information on biochemical pathways of the human genes. EXPOLDB can be accessed at http://expoldb.igib.res.in/expol.
Collapse
Affiliation(s)
- Vineet K Sharma
- G.N. Ramachandran Knowledge Centre for Genome Informatics, Institute of Genomics and Integrative Biology, Mall Road, Delhi 110 007, India
| | - Anu Sharma
- Functional Genomics Unit, Institute of Genomics and Integrative Biology, Mall Road, Delhi 110 007, India
| | - Naveen Kumar
- G.N. Ramachandran Knowledge Centre for Genome Informatics, Institute of Genomics and Integrative Biology, Mall Road, Delhi 110 007, India
| | - Mamta Khandelwal
- G.N. Ramachandran Knowledge Centre for Genome Informatics, Institute of Genomics and Integrative Biology, Mall Road, Delhi 110 007, India
| | - Kiran Kumar Mandapati
- Functional Genomics Unit, Institute of Genomics and Integrative Biology, Mall Road, Delhi 110 007, India
| | - Shirley Horn-Saban
- Microarray facility, Department of Biological Services, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Liora Strichman-Almashanu
- Department of Molecular Genetics and Crown Human Genome Center, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Doron Lancet
- Department of Molecular Genetics and Crown Human Genome Center, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Samir K Brahmachari
- G.N. Ramachandran Knowledge Centre for Genome Informatics, Institute of Genomics and Integrative Biology, Mall Road, Delhi 110 007, India
| | - Srinivasan Ramachandran
- G.N. Ramachandran Knowledge Centre for Genome Informatics, Institute of Genomics and Integrative Biology, Mall Road, Delhi 110 007, India
| |
Collapse
|
3
|
Sharma VK, Brahmachari SK, Ramachandran S. (TG/CA)n repeats in human gene families: abundance and selective patterns of distribution according to function and gene length. BMC Genomics 2005; 6:83. [PMID: 15935094 PMCID: PMC1177943 DOI: 10.1186/1471-2164-6-83] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2004] [Accepted: 06/03/2005] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND Creation of human gene families was facilitated significantly by gene duplication and diversification. The (TG/CA)n repeats exhibit length variability, display genome-wide distribution, and are abundant in the human genome. Accumulation of evidences for their multiple functional roles including regulation of transcription and stimulation of recombination and splicing elect them as functional elements. Here, we report analysis of the distribution of (TG/CA)n repeats in human gene families. RESULTS The 1,317 human gene families were classified into six functional classes. Distribution of (TG/CA)n repeats were analyzed both from a global perspective and from a stratified perspective based on their biological properties. The number of genes with repeats decreased with increasing repeat length and several genes (53%) had repeats of multiple types in various combinations. Repeats were positively associated with the class of Signaling and communication whereas, they were negatively associated with the classes of Immune and related functions and of Information. The proportion of genes with (TG/CA)n repeats in each class was proportional to the corresponding average gene length. The repeat distribution pattern in large gene families generally mirrored the global distribution pattern but differed particularly for Collagen gene family, which was rich in repeats. The position and flanking sequences of the repeats of Collagen genes showed high conservation in the Chimpanzee genome. However the majority of these repeats displayed length polymorphism. CONCLUSION Positive association of repeats with genes of Signaling and communication points to their role in modulation of transcription. Negative association of repeats in genes of Information relates to the smaller gene length, higher expression and fundamental role in cellular physiology. In genes of Immune and related functions negative association of repeats perhaps relates to the smaller gene length and the directional nature of the recombinogenic processes to generate immune diversity. Thus, multiple factors including gene length, function and directionality of recombinogenic processes steered the observed distribution of (TG/CA)n repeats. Furthermore, the distribution of repeat patterns is consistent with the current model that long repeats tend to contract more than expand whereas, the reverse dynamics operates in short repeats.
Collapse
Affiliation(s)
- Vineet K Sharma
- G.N. Ramachandran Knowledge Centre for Genome Informatics, Institute of Genomics and Integrative Biology, Mall Road, Delhi 110 007, India
| | - Samir K Brahmachari
- G.N. Ramachandran Knowledge Centre for Genome Informatics, Institute of Genomics and Integrative Biology, Mall Road, Delhi 110 007, India
| | - Srinivasan Ramachandran
- G.N. Ramachandran Knowledge Centre for Genome Informatics, Institute of Genomics and Integrative Biology, Mall Road, Delhi 110 007, India
| |
Collapse
|
4
|
Sharma VK, B-Rao C, Sharma A, Brahmachari SK, Ramachandran S. (TG:CA)(n) repeats in human housekeeping genes. J Biomol Struct Dyn 2003; 21:303-10. [PMID: 12956614 DOI: 10.1080/07391102.2003.10506926] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
The unravelling of human genome sequence gives a new opportunity to investigate the role of repetitive sequences in gene regulation. Among the various types of repetitive sequences, the dinucleotide (TG:CA)(n) repeats are one of the most abundant in human genome and exhibit polymorphism. Early on, it was observed that the (TG:CA)(n) repeats could modulate gene expression and has the propensity to undergo conformational transitions in in vivo conditions. Recent reports describe the role of polymorphic (TG:CA)(n) repeats in gene regulation in several genes. In this work, we have analysed the distribution of (TG:CA)(n) (n >or= 6) repeats in human 'housekeeping genes' on which recently released Gene Chip data is available. Our results indicate that (i). The number of short intragenic (TG:CA)(n) repeats is significantly higher than the number of long repeats (ii). the proportion of genes with (TG:CA)(n) repeats (n >or= 12 units) had lower mean expression levels compared to those without these repeats, (iii). the genes belonging to the functional class of 'signalling and communication' had a positive association with repeats in contrast to the genes belonging to the 'information' class that were negatively associated with repeats.
Collapse
Affiliation(s)
- Vineet K Sharma
- G N Ramachandran Knowledge Centre for Genome Informatics, Institute of Genomics and Integrative Biology, Mall Road, Delhi 110 007, India
| | | | | | | | | |
Collapse
|
5
|
Abstract
The vast number of proteins that sustain the currently living organisms have been generated from a relatively small number of ancestral genes that has involved a variety of processes. Lysozyme is an ancient protein whose origin goes back an estimated 400 to 600 million years. This protein was originally a bacteriolytic defensive agent and has been adapted to serve a digestive function on at least two occasions, separated by nearly 40 million years. The origins of the related goose type and T4 phage lysozyme that are distinct from the more common C type are obscure. They share no discernable amino acid sequence identity and yet they possess common secondary and tertiary structures. Lysozyme C gene also gave rise, after gene duplication 300 to 400 million years ago, to a gene that currently codes for alpha-lactalbumin, a protein expressed only in the lactating mammary gland of all but a few species of mammals. It is required for the synthesis of lactose, the sugar secreted in milk. alpha-Lactalbumin shares only 40% identity in amino acid sequence with lysozyme C, but it has a closer spatial structure and gene organization. Although structurally similar, functionally they are quite distinct. Specific amino acid substitutions in alpha-lactalbumin account for the loss of the enzyme activity of lysozyme and the acquisition of the features necessary for its role in lactose synthesis. Evolutionary implications are as yet unclear but are being unraveled in many laboratories.
Collapse
Affiliation(s)
- P K Qasba
- Structural Glycobiology Section, National Cancer Institute, N.I.H., Frederick, MD 21702-1201, USA.
| | | |
Collapse
|
6
|
Brahmachari SK, Meera G, Sarkar PS, Balagurumoorthy P, Tripathi J, Raghavan S, Shaligram U, Pataskar S. Simple repetitive sequences in the genome: structure and functional significance. Electrophoresis 1995; 16:1705-14. [PMID: 8582360 DOI: 10.1002/elps.11501601283] [Citation(s) in RCA: 48] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
The current explosion of DNA sequence information has generated increasing evidence for the claim that noncoding repetitive DNA sequences present within and around different genes could play an important role in genetic control processes, although the precise role and mechanism by which these sequences function are poorly understood. Several of the simple repetitive sequences which occur in a large number of loci throughout the human and other eukaryotic genomes satisfy the sequence criteria for forming non-B DNA structures in vitro. We have summarized some of the features of three different types of simple repeats that highlight the importance of repetitive DNA in the control of gene expression and chromatin organization. (i) (TG/CA)n repeats are widespread and conserved in many loci. These sequences are associated with nucleosomes of varying linker length and may play a role in chromatin organization. These Z-potential sequences can help absorb superhelical stress during transcription and aid in recombination. (ii) Human telomeric repeat (TTAGGG)n adopts a novel quadruplex structure and exhibits unusual chromatin organization. This unusual structural motif could explain chromosome pairing and stability. (iii) Intragenic amplification of (CTG)n/(CAG)n trinucleotide repeat, which is now known to be associated with several genetic disorders, could down-regulate gene expression in vivo. The overall implications of these findings vis-à-vis repetitive sequences in the genome are summarized.
Collapse
Affiliation(s)
- S K Brahmachari
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | | | | | | | | | | | | | | |
Collapse
|
7
|
Herbert AG, Rich A. A method to identify and characterize Z-DNA binding proteins using a linear oligodeoxynucleotide. Nucleic Acids Res 1993; 21:2669-72. [PMID: 8332463 PMCID: PMC309597 DOI: 10.1093/nar/21.11.2669] [Citation(s) in RCA: 41] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
An oligodeoxynucleotide that readily flips to the Z-DNA conformation in 10mM MgCl2 was produced by using Klenow enzyme to incorporate 5-bromodeoxycytosine and deoxyguanosine into a (dC-dG)22 template. During synthesis the oligomer can be labeled with 32P to high specific activity. The labeled oligodeoxynucleotide can be used in bandshift experiment to detect proteins that bind Z-DNA. This allows the binding specificity of such proteins to be determined with high reliability using unlabeled linear and supercoiled DNA competitors. In addition, because the radioactive oligodeoxynucleotide contains bromine atoms, DNA-protein complexes can be readily crosslinked using UV light. This allows an estimate to be made of the molecular weight of the proteins that bind to the radioactive probe. Both techniques are demonstrated using a goat polyclonal anti-Z-DNA antiserum.
Collapse
Affiliation(s)
- A G Herbert
- Department of Biology, Massachusetts Institute of Technology, Cambridge 02139
| | | |
Collapse
|
8
|
Vilotte JL, Soulier S. Isolation and characterization of the mouse alpha-lactalbumin-encoding gene: interspecies comparison, tissue- and stage-specific expression. Gene 1992; 119:287-92. [PMID: 1398111 DOI: 10.1016/0378-1119(92)90285-w] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
The murine alpha-lactalbumin-encoding gene (m alpha La) was isolated and completely sequenced. The 2.3-kb transcription unit shared a similar organization with that of its counterparts from other species. Sequence comparison for the proximal 5'-flanking region indicated the presence of a consensus motif that occurs in all milk-protein-encoding genes, except the kappa-casein-encoding gene. This may correspond to the binding site for the recently identified mammary-gland-specific factor. The m alpha La gene occurs in a single copy per haploid genome and is specifically expressed in the mammary gland where it is induced during late pregnancy.
Collapse
Affiliation(s)
- J L Vilotte
- Laboratoire de Génétique Biochimique, INRA-CRJ, Jouy-en-Josas, France
| | | |
Collapse
|
9
|
Tripathi J, Brahmachari SK. Distribution of simple repetitive (TG/CA)n and (CT/AG)n sequences in human and rodent genomes. J Biomol Struct Dyn 1991; 9:387-97. [PMID: 1741969 DOI: 10.1080/07391102.1991.10507919] [Citation(s) in RCA: 55] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Sixteen million nucleotide sequence of genome of various organisms have been analysed to detect and study the extent of occurrence of simple repetitive sequences. Two sequence motifs (TG/CA)n and (CT/AG)n capable of adopting unusual DNA structures, left handed Z-conformation and triple-helical conformation respectively, are found to be abundant in rodent and human genomes, but almost completely absent in bacterial genome. (TG/CA)n and (CT/AG)n sequences are present mostly in the intron or 5'/3' flanking regions of the genes. The presence of such repeat motifs in genomic sequence of higher eukaryotes has been correlated with their possible functional significance in nucleosome organization, recombination and gene expression.
Collapse
Affiliation(s)
- J Tripathi
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore
| | | |
Collapse
|