151
|
Pavlícek A, Paces J, Zíka R, Hejnar J. Length distribution of long interspersed nucleotide elements (LINEs) and processed pseudogenes of human endogenous retroviruses: implications for retrotransposition and pseudogene detection. Gene 2002; 300:189-94. [PMID: 12468100 DOI: 10.1016/s0378-1119(02)01047-8] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Deciphering the human genome includes reliable identification and structural characterization of individual retrotransposon elements. The most active group of autonomous transposable elements, the long interspersed nuclear elements (LINE), transpose themselves as well as other RNAs, including those of human endogenous retroviruses (HERV). During this transposition, however, the LINE-encoded reverse transcriptase (RT) often abortively dissociates from the RNA template, leaving a prematurely terminated, 5' truncated copy. We have analyzed the length distributions of LINEs and of processed pseudogenes derived from HERV-W. As expected, we have found that the majority of 5' truncated LINEs and HERV-W processed pseudogenes show a prevalence of very short elements terminated close to the 3' end. On the other hand, the number of complete elements is far above the expectation. The characteristic distribution in both cases indicates two important conclusions: (i) dissociation of LINE RT from the template cannot be fully explained by low processivity of RT modelled as a stochastic, Poisson-type process. (ii) Currently cited numbers of pseudogenes within the human genome are underestimated, since a large percentage of pseudogenes are terminated in the 3' untranslated region and remain undetectable in translated homology searches of protein databases against the human genome.
Collapse
Affiliation(s)
- Adam Pavlícek
- Institute of Molecular Genetics, Academy of Sciences of the Czech Republic, Flemingovo nam. 2, Prague 6, CZ-16637, Czech Republic
| | | | | | | |
Collapse
|
152
|
Thomas CP, Zhou J, Liu KZ, Mick VE, MacLaughlin E, Knowles M. Systemic pseudohypoaldosteronism from deletion of the promoter region of the human Beta epithelial na(+) channel subunit. Am J Respir Cell Mol Biol 2002; 27:314-9. [PMID: 12204893 DOI: 10.1165/rcmb.2002-0029oc] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Systemic pseudohypoaldosteronism type I (PHAI) is an autosomal recessive disorder that arises from loss of function mutations of the alpha, beta, or gamma subunit of Epithelial Na(+) Channel (ENaC). In addition to a severe renal phenotype in the neonatal period, patients with PHAI develop a childhood pulmonary syndrome characterized by cough and frequent respiratory infections. We tested a patient, born to consanguineous parents, who presented with dehydration, metabolic acidosis, hyperkalemia, elevated renin and aldosterone levels at birth, and recurrent respiratory symptoms in his first year. He demonstrated defective epithelial Na(+) transport in multiple organs (raised sweat Cl(-), 120 mM; raised salivary Na(+) and Cl(-), 118 and 111 mM, respectively; and little nasal amiloride-sensitive potential difference). No deleterious mutation was identified in the coding region of the three ENaC subunits. Reverse transcriptase-polymerase chain reaction of nasal epithelial RNA showed reduced betaENaC expression, and inability to amplify promoter elements indicated the possibility of a deletion in the 5' region. Using a probe that corresponded to exon 1A of betaENaC, we confirmed a large deletion (> 1,300 bp). In summary, a homozygous mutation in the promoter region of betaENaC leads to PHAI, the first description of a mutation in the regulatory regions of an ENaC subunit leading to a clinical phenotype.
Collapse
Affiliation(s)
- Christie P Thomas
- Department of Internal Medicine, University of Iowa, Iowa City, Iowa 52242-1081, USA.
| | | | | | | | | | | |
Collapse
|
153
|
Ackerman H, Udalova I, Hull J, Kwiatkowski D. Evolution of a polymorphic regulatory element in interferon-gamma through transposition and mutation. Mol Biol Evol 2002; 19:884-90. [PMID: 12032244 DOI: 10.1093/oxfordjournals.molbev.a004145] [Citation(s) in RCA: 21] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Mammalian transposable elements have intrinsic regulatory elements that can activate neighboring genes, and it is speculated that they can also carry extrinsic transactivating DNA sequences to new genomic locations. We have identified a polymorphic segment of the human interferon-gamma promoter region where two adjacent binding sites for NF-kappaB and NFAT originated from the insertion of an Alu element approximately 22-34 MYA. Both binding sites lie outside the Alu consensus sequence but within the boundaries of the insertion, suggesting that this segment of DNA was comobilized when the Alu element moved from another part of the genome. Sequence comparisons and examination of DNA-protein interactions across nine different primate species indicate that the inserted sequence contained the intact NFAT binding site, whereas the ability to bind NF-kappaB evolved through a series of mutations after the insertion. These observations are consistent with the notion that retropseudogenes can comobilize intact regulatory sequences to new locations and thereby influence the evolution of gene regulatory networks; however, the extent to which such events have shaped the evolution of gene regulation remains unknown.
Collapse
Affiliation(s)
- Hans Ackerman
- Wellcome Trust Centre for Human Genetics, Oxford, United Kingdom. University Department of Paediatrics, Oxford, United Kingdom.
| | | | | | | |
Collapse
|
154
|
Kwun HJ, Han HJ, Lee WJ, Kim HS, Jang KL. Transactivation of the human endogenous retrovirus K long terminal repeat by herpes simplex virus type 1 immediate early protein 0. Virus Res 2002; 86:93-100. [PMID: 12076833 DOI: 10.1016/s0168-1702(02)00058-8] [Citation(s) in RCA: 41] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
We found that LTR-directed transcription of the human endogenous retrovirus K can be induced by HSV-1 infection. The effect was mediated by the action of a HSV-1 immediate early protein, ICP0 and required the AP-1 binding site present on the HERV-K LTR. In addition, ICP0 could up-regulate AP-1 activity, suggesting that ICP0 increases transcription of HERV-K through AP-1 site. This effect might be important to understand both HERV-K- and HSV-1-mediated pathogenesis because HERV-K LTR represents an important class of retrotranspositional mutagens and also could provide a new regulatory element for the linked DNA sequences.
Collapse
Affiliation(s)
- Hyun Jin Kwun
- Department of Microbiology, College of Natural Sciences, Pusan National University, Pusan, South Korea
| | | | | | | | | |
Collapse
|
155
|
Jurka J, Krnjajic M, Kapitonov VV, Stenger JE, Kokhanyy O. Active Alu elements are passed primarily through paternal germlines. Theor Popul Biol 2002; 61:519-30. [PMID: 12167372 DOI: 10.1006/tpbi.2002.1602] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Repetitive elements are distributed non-randomly in the human genome but, as reviewed in this paper, biological processes underlying the observed patterns appear to be complex and remain relatively obscure. Recent findings indicate that chromosomal distribution of Alu retroelements deposited in the past is different from the distribution of Alu elements that continue to be inserted in human population. These active elements from AluY sub(sub)families are the major focus of this paper. In particular, we analyzed chromosomal proportions of 19 AluY subfamilies, of which nine are reported for the first time in this paper. These 19 subfamilies contain over 80% of Alu elements that are polymorphic in the human genome. The chromosomal density of these most recent Alu insertions is around three times higher on chromosome Y than on chromosome X and over two times higher than the average density for all human autosomes. Based on this observation and other data we propose that active Alu elements are passed through paternal germlines. There is also some evidence that a small fraction of active Alu elements from less abundant subfamilies can be retroposed in female germlines or in the early embryos. Finally, we propose that the origin of Alu subfamilies in human populations may be related to evolution of chromosome Y.
Collapse
Affiliation(s)
- Jerzy Jurka
- Genetic Information Research Institute, Mountain View, California 94043, USA.
| | | | | | | | | |
Collapse
|
156
|
Abstract
The BRCA1 gene is involved in sporadic breast and ovarian cancer mainly through reduced expression. BRCA1 mRNAs containing different leader sequences show different patterns of expression. In a normal mammary gland mRNA with a shorter leader sequence, 5'-UTRa is expressed only, whereas in breast cancer tissue mRNA with a longer leader, 5'-UTRb is expressed also. We show that the translation efficiency of transcripts containing 5'-UTRb is 10 times lower than those containing 5'-UTRa. The structures of 5'-UTRa and 5'-UTRb were determined by chemical and enzymatic probing aided by a new method developed for monitoring the number of co-existing stable conformers. Specific factors responsible for reduced translation of mRNA containing 5'-UTRb were determined using a variety of transcripts with mutations in the leader sequence. These factors include a stable secondary structure formed by truncated Alu element and upstream AUG codons. The novel mechanism by which BRCA1 may be involved in sporadic breast and ovarian cancer is proposed. It is based on the expression patterns of BRCA1 mRNAs and differences in their translatability. According to this mechanism the deregulation of the BRCA1 transcription in cancer, resulting in a higher proportion of translationally inhibited transcripts containing 5'-UTRb, contributes to the decrease in the BRCA1 protein observed in sporadic breast and ovarian cancers.
Collapse
Affiliation(s)
- Krzysztof Sobczak
- Laboratory of Cancer Genetics, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | | |
Collapse
|
157
|
Iglesias JM, Morgan RO, Jenkins NA, Copeland NG, Gilbert DJ, Fernandez MP. Comparative genetics and evolution of annexin A13 as the founder gene of vertebrate annexins. Mol Biol Evol 2002; 19:608-18. [PMID: 11961095 DOI: 10.1093/oxfordjournals.molbev.a004120] [Citation(s) in RCA: 39] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Annexin A13 (ANXA13) is believed to be the original founder gene of the 12-member vertebrate annexin A family, and it has acquired an intestine-specific expression associated with a highly differentiated intracellular transport function. Molecular characterization of this subfamily in a range of vertebrate species was undertaken to assess coding region conservation, gene organization, chromosomal linkage, and phylogenetic relationships relevant to its progenitor role in the structure-function evolution of the annexin gene superfamily. Protein diagnostic features peculiar to this subfamily include an alternate isoform containing a KGD motif, an elevated basic amino acid content with polyhistidine expansion in the 5'-translated region, and the conservation of 15% core tetrad residues specific to annexin A13 members. The 12 coding exons comprising the 58-kb human ANXA13 gene were deduced from BAC clone sequencing, whereas internal repetitive elements and neighboring genes in chromosome 8q24.12 were identified by contig analysis of the draft sequence from the human genome project. A unique exon splicing pattern in the annexin A13 gene was corroborated by coanalysis of mouse, rat, zebrafish, and pufferfish genomic DNA and determined to be the most distinct of all vertebrate annexins. The putative promoter region was identified by phylogenetic footprinting of potential binding sites for intestine-specific transcription factors. Mouse annexin A13 cDNA was used to map the gene to an orthologous linkage group in mouse chromosome 15 (between Sdc2 and Myc by backcross analysis), and the zebrafish cDNA permitted its localization to linkage group 24. Comparative analysis of annexin A13 from nine species traced this gene's speciation history and assessed coding region variation, whereas phylogenetic analysis showed it to be the deepest-branching vertebrate annexin, and computational analysis estimated the gene age and divergence rate. The unique, conserved aspects of annexin A13 primary structure, gene organization, and genetic maps identify it as the probable common ancestor of all vertebrate annexins, beginning with the sequential duplication to annexins A7 and A11 approximately 700 MYA, before the emergence of chordates.
Collapse
Affiliation(s)
- Juan-Manuel Iglesias
- Department of Biochemistry and Molecular Biology, Edificio Santiago Gaston, University of Oviedo, E-33006 Oviedo, Spain
| | | | | | | | | | | |
Collapse
|
158
|
Abstract
To gauge the processes that might direct the length of introns, I studied the balance of indels (insertions or deletions, determined using Alu and LINE1 retroposon repeats) and the density of these repeats in the introns of the human genome. The indel balance is biased in favour of deletions and correlated with the divergence of repeats. At fixed repeat divergence, the indel bias correlated with the intron size: the shorter the intron, the more deletions were favoured over insertions. This correlation with the intron size was stronger than with the gene-wide or isochore-wide parameters. The density of repeats (the number of repeats in a unit of intron length) correlated positively with the intron size. Thus, quite different mechanisms, the indel bias and the integration and/or persistence of retroposons, act in the same direction in regards to intron size, which suggests selection for the size of individual introns.
Collapse
Affiliation(s)
- Alexander E Vinogradov
- Institute of Cytology, Russian Academy of Sciences, Tikhoretsky Ave. 4, St Petersburg 194064, Russia.
| |
Collapse
|
159
|
Nagano H, Kunii M, Azuma T, Kishima Y, Sano Y. Characterization of the repetitive sequences in a 200-kb region around the rice waxy locus: diversity of transposable elements and presence of veiled repetitive sequences. Genes Genet Syst 2002; 77:69-79. [PMID: 12087189 DOI: 10.1266/ggs.77.69] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Repetitive genomic sequences might have various structural features and properties distinct from those of the known transposable elements (TE). Here, the content and properties of the repetitive sequences present in a 200-kb region around the rice waxy locus were analyzed using the available rice genomic database. In our previous Southern blotting analysis, 70% of the segments in this region showed smeared patterns, but according to the present database analysis, the proportion of repetitive sequences in this region was only 15%. The repetitive segments in this 200-kb region comprised 75 repetitive sequences that we classified into 46 subfamilies: 21 subfamilies were known TEs or repetitive sequences and 25 subfamilies consisted of newly identified TEs or novel types of repetitive sequences. The region contains no long terminal repeat (LTR) retrotransposable elements, but miniature inverted repeat transposable elements (MITEs) constituted a major class among the elements identified. These MITEs showed remarkable structural divergence: 12 elements were found to be new members of known MITE superfamilies, while five elements had novel terminal structures, and did not belong to any known TE families. Interestingly, about 10% of the repetitive sequences, including virus-like sequences did not have any of the usual characteristics of TEs, suggesting that a certain proportion of repetitive sequences that might not share the transpositional mechanisms of known elements are dispersed in the compact rice genome.
Collapse
Affiliation(s)
- Hironori Nagano
- Laboratory of Plant Breeding, Faculty of Agriculture, Hokkaido University, Sapporo, Japan
| | | | | | | | | |
Collapse
|
160
|
Chen C, Gentles AJ, Jurka J, Karlin S. Genes, pseudogenes, and Alu sequence organization across human chromosomes 21 and 22. Proc Natl Acad Sci U S A 2002; 99:2930-5. [PMID: 11867739 PMCID: PMC122450 DOI: 10.1073/pnas.052692099] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/21/2001] [Indexed: 11/18/2022] Open
Abstract
Human chromosomes 21 and 22 (mainly the q-arms) were the first complete parts of the human genome released. Our analysis of genes, pseudogenes (Psig), and Alu repeats across these chromosomes include the following findings: The number of gene structures containing untranslated exons exceeds 25%; the terminal exon tends to be the largest among exons, whereas, the initial intron tends to be the largest among introns; single-exon gene length is approximately the mean gene exon number times the mean internal exon length; processed Psig lengths are on average approximately the same as single-exon gene length; and the G+C content and length of genes are uncorrelated. The counts and distribution of genes, Psig, and Alu sequences and G+C variation are evaluated with respect to clusters and overdispersions. Other assessments concern comparisons of intergenic lengths, properties of Psig sequences, and correlations between Alu and Psig sequences.
Collapse
Affiliation(s)
- Chingfer Chen
- Department of Mathematics, Stanford University, Stanford, CA 94305-2125, USA
| | | | | | | |
Collapse
|
161
|
Pavlícek A, Paces J, Elleder D, Hejnar J. Processed pseudogenes of human endogenous retroviruses generated by LINEs: their integration, stability, and distribution. Genome Res 2002; 12:391-9. [PMID: 11875026 PMCID: PMC155283 DOI: 10.1101/gr.216902] [Citation(s) in RCA: 80] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
We report here the presence of numerous processed pseudogenes derived from the W family of endogenous retroviruses in the human genome. These pseudogenes are structurally colinear with the retroviral mRNA followed by a poly(A) tail. Our analysis of insertion sites of HERV-W processed pseudogenes shows a strong preference for the insertion motif of long interspersed nuclear element (LINE) retrotransposons. The genomic distribution, stability during evolution, and frequent truncations at the 5' end resemble those of the pseudogenes generated by LINEs. We therefore suggest that HERV-W processed pseudogenes arose by multiple and independent LINE-mediated retrotransposition of retroviral mRNA. These data document that the majority of HERV-W copies are actually nontranscribed promoterless pseudogenes. The current search for HERV-Ws associated with several human diseases should concentrate on a small subset of transcriptionally competent elements.
Collapse
Affiliation(s)
- Adam Pavlícek
- Institute of Molecular Genetics, Academy of Sciences of the Czech Republic, Prague 6, CZ-16637, Czech Republic
| | | | | | | |
Collapse
|
162
|
Du J, Fisher DE. Identification of Aim-1 as the underwhite mouse mutant and its transcriptional regulation by MITF. J Biol Chem 2002; 277:402-6. [PMID: 11700328 DOI: 10.1074/jbc.m110229200] [Citation(s) in RCA: 74] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Animal pigmentation mutants have provided rich models for the identification of genes modulating pathways from melanocyte development to melanoma. One mouse model is the underwhite locus, alleles of which manifest altered pigmentation of both eye and fur, sometimes in an age-dependent fashion. Here we show that the mouse homolog of a recently identified gene whose mutation produces Japanese gold-colored fish, medaka b, maps to the mouse underwhite locus. We identify distinct mutations of this gene, known as Aim-1, in three underwhite mouse alleles and find that structure/function differences correlate with recessive versus dominant inheritance. The human ortholog of AIM-1 was originally identified as a melanocyte-restricted antigen that is recognized by autologous T cells from a patient with melanoma. We also provide evidence that AIM-1 is transcriptionally modulated by MITF, a melanocyte-specific transcription factor essential to pigmentation and a clinical diagnostic marker in human melanoma. Although AIM-1 appears to reside downstream of MITF, chromatin immunoprecipitations do not reveal binding of MITF to a 5'-flanking region containing histone 3 acetylation, indicating that MITF either acts indirectly on AIM-1 or it binds to a remote regulatory sequence. Nevertheless, MITF links AIM-1 expression and the underwhite phenotype to a transcriptional network central to pigmentation in mammals.
Collapse
Affiliation(s)
- Jinyan Du
- Division of Pediatric Hematology/Oncology, Children's Hospital and Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts 02115, USA
| | | |
Collapse
|
163
|
Abstract
The human endogenous retroviruses database (HERVd) is maintained at the Institute of Molecular Genetics, Academy of Sciences of the Czech Republic, and is accessible via the World Wide Web at http://herv.img.cas.cz. The HERVd provides complex information on and analysis of retroviral elements found in the human genome. It can be used for searches of individual HERV families, identification of HERV parts, graphical output of HERV structures, comparison of HERVs and identification of retrovirus integration sites.
Collapse
Affiliation(s)
- Jan Paces
- Institute of Molecular Genetics, Academy of Sciences of the Czech Republic, Flemingovo 2, CZ-16637 Prague, Czech Republic
| | | | | |
Collapse
|
164
|
Lenoir A, Lavie L, Prieto JL, Goubely C, Coté JC, Pélissier T, Deragon JM. The evolutionary origin and genomic organization of SINEs in Arabidopsis thaliana. Mol Biol Evol 2001; 18:2315-22. [PMID: 11719581 DOI: 10.1093/oxfordjournals.molbev.a003778] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We have characterized the two families of SINE retroposons present in Arabidopsis thaliana. The origin, distribution, organization, and evolutionary history of RAthE1 and RAthE2 elements were studied and compared to the well-characterized SINE S1 element from Brassica. Our studies show that RAthE1, RAthE2, and S1 retroposons were generated independently from three different tRNAs. The RAthE1 and RAthE2 families are older than the S1 family and are present in all tested Cruciferae species. The evolutionary history of the RAthE1 family is unusual for SINEs. The 144 RAthE1 elements of the Arabidopsis genome cannot be classified in distinct subfamilies of different evolutionary ages as is the case for S1, RAthE2, and mammalian SINEs. Instead, most RAthE1 elements were probably derived steadily from a single source gene that was maintained intact and active for at least 12-20 Myr, a result suggesting that the RAthE1 source gene was under selection. The distribution of RAthE1 and RAthE2 elements on the Arabidopsis physical map was studied. We observed that, in contrast to other Arabidopsis transposable elements, SINEs are not concentrated in the heterochromatic regions. Instead, SINEs are grouped in the euchromatic chromosome territories several hundred kilobase pairs long. In these territories, SINE elements are closely associated with genes. A retroposition partnership between Arabidopsis SINEs and LINEs is proposed.
Collapse
Affiliation(s)
- A Lenoir
- Centre National de la Recherche Scientifique, Université Blaise Pascal Clermont-Ferrand II, Aubière cedex, France
| | | | | | | | | | | | | |
Collapse
|
165
|
Quesneville H, Anxolabéhère D. Genetic algorithm-based model of evolutionary dynamics of class II transposable elements. J Theor Biol 2001; 213:21-30. [PMID: 11708852 DOI: 10.1006/jtbi.2001.2401] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We propose a new conceptual framework to study the dynamics of transposable elements. Based on a genetic algorithm, our model is designed as a self-organizing system. Our results show that transposable elements could emerge from a single endonuclease gene. The DNA repair mechanisms appear to condition the emergence success of class II TEs. Antagonist selective forces acting on transposable elements and their hosts induce by their opposition differences in the sequence evolution of the functional domains and of the copies.
Collapse
Affiliation(s)
- H Quesneville
- Laboratoire de Dynamique du Génome et Evolution, Institut Jacques Monod, 2, Place Jussieu, Paris Cedex 05, 75251, France.
| | | |
Collapse
|
166
|
Pesole G, Mignone F, Gissi C, Grillo G, Licciulli F, Liuni S. Structural and functional features of eukaryotic mRNA untranslated regions. Gene 2001; 276:73-81. [PMID: 11591473 DOI: 10.1016/s0378-1119(01)00674-6] [Citation(s) in RCA: 292] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
The crucial role of the non-coding portion of genomes is now widely acknowledged. In particular, mRNA untranslated regions are involved in many post-transcriptional regulatory pathways that control mRNA localization, stability and translation efficiency. We review in this paper the major structural and compositional features of eukaryotic mRNA untranslated regions and provide some examples of bioinformatic analyses for their functional characterization.
Collapse
Affiliation(s)
- G Pesole
- Dipartimento di Fisiologia e Biochimica Generali, Università di Milano, via Celoria, 26, 20133, Milan, Italy.
| | | | | | | | | | | |
Collapse
|
167
|
Carcedo MT, Iglesias JM, Bances P, Morgan RO, Fernandez MP. Functional analysis of the human annexin A5 gene promoter: a downstream DNA element and an upstream long terminal repeat regulate transcription. Biochem J 2001; 356:571-9. [PMID: 11368787 PMCID: PMC1221871 DOI: 10.1042/0264-6021:3560571] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Human annexin A5 is a ubiquitous protein implicated in diverse signal transduction processes associated with cell growth and differentiation, and its gene regulation is an important component of this function. Promoter transcriptional activity was determined for a wide 5' portion of the human annexin A5 gene, from bp -1275 to +79 relative to the most 5' of several discrete transcription start points. Transfection experiments carried out in HeLa cells identified the segment from bp -202 to +79 as the minimal promoter conferring optimal transcriptional activity. Two canonical Sp1 sites in the immediate 5' flanking region of a CpG island were required for significant transcription. Strong repressive activity in the distal promoter region between bp -717 to -1153 was attributed to the presence of an endogenous retroviral long terminal repeat, homologous with long terminal repeat 47B. The downstream sequence from bp position +31 to +79 in untranslated exon 1 was also essential for transcription, as its deletion from any of the plasmid constructs abolished activity in transfection assays. Electrophoretic mobility-shift assays, Southwestern-blot analysis and affinity chromatography were used to identify a protein doublet of relative molecular mass 35 kDa that bound an octanucleotide palindromic sequence in exon 1. The DNA cis-element resembled an E-box, but did not bind higher molecular mass transcription factors, such as upstream stimulatory factor or activator protein 4. The discovery of a downstream element crucial for annexin A5 gene transcription, and its interaction with a potentially novel transcription factor or complex, may provide a clue to understanding the initiation of transcription by TATA-less, multiple start site promoters.
Collapse
Affiliation(s)
- M T Carcedo
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Oviedo, E-33006 Oviedo, Spain
| | | | | | | | | |
Collapse
|
168
|
Kuryshev VY, Skryabin BV, Kremerskothen J, Jurka J, Brosius J. Birth of a gene: locus of neuronal BC200 snmRNA in three prosimians and human BC200 pseudogenes as archives of change in the Anthropoidea lineage. J Mol Biol 2001; 309:1049-66. [PMID: 11399078 DOI: 10.1006/jmbi.2001.4725] [Citation(s) in RCA: 41] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The gene encoding brain-specific dendritic BC200 small non-messenger RNA is limited to the primate order and arose from a monomeric Alu element. It is present and neuronally expressed in all Anthropoidea examined. By comparing the human sequence of about 13.2 kb with each of the prosimian (lemur 14.6 kb, galago 12 kb, and tarsier 13.8 kb) orthologous loci, we could establish that the BC200 RNA gene is absent from the prosimian lineages. In Strepsirhini (lemurs and lorises), a dimeric AluJ-like element integrated very close to the BC200 insertion point, while the corresponding tarsier region is devoid of any repetitive element. Consequently, insertion of the Alu monomer that gave rise to the BC200 RNA gene must have occurred after the anthropoid lineage diverged from the prosimian lineage(s). Shared insertions of other repetitive elements favor proximity of simians and tarsiers in support of their grouping into Haplorhini and the omomyid hypothesis. On the other hand, the nucleotide sequences in the segment that is available for comparison in all four species reveal less exchanges between Strepsirhini (lemur and galago) and human than between tarsier and human. Our data imply that the early activity of dimeric Alu sequences must have been concurrent with the activity of monomeric Alu elements that persisted longer than is usually thought. As BC200 RNA gave rise to more than 200 pseudogenes, we used their consensus sequence variations as a molecular archive recording the BC200 RNA sequence changes in the anthropoid lineage leading to Homo sapiens and timed these alterations over the past 35-55 million years.
Collapse
Affiliation(s)
- V Y Kuryshev
- Institute of Experimental Pathology/Molecular Neurobiology, ZMBE, University of Münster, Von-Esmarch-Str. 56, Münster, D-48149, Germany.
| | | | | | | | | |
Collapse
|
169
|
Bogdanova N, Markoff A, Gerke V, McCluskey M, Horst J, Dworniczak B. Homologues to the first gene for autosomal dominant polycystic kidney disease are pseudogenes. Genomics 2001; 74:333-41. [PMID: 11414761 DOI: 10.1006/geno.2001.6568] [Citation(s) in RCA: 68] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
PKD1 is the first gene identified to be causative for the condition of autosomal dominant polycystic kidney disease. There are several genes homologous to PKD1 that are located proximal to the master gene on the same chromosome. Two of these genes have been recently covered in a large sequencing work on chromosome 16, and their structure has been broadly analyzed. However, the major question whether homologous genes (HG) code for functionally active polypeptides has not been resolved so far. The current study identifies and partially characterizes four more homologues of PKD1, different from the previously published sequence, two of which were found by screening of a BAC library and the other two contained in available databases. Analysis of HG transcripts shows that they are not translated in the model cell line T98G. Taken together, these findings suggest that homologues to PKD1 form a family of pseudogenes.
Collapse
Affiliation(s)
- N Bogdanova
- Institut für Humangenetik, Westfälische Wilchelms-Universität Münster, Münster, D-48149, Germany
| | | | | | | | | | | |
Collapse
|
170
|
Rogan PK, Cazcarro PM, Knoll JH. Sequence-based design of single-copy genomic DNA probes for fluorescence in situ hybridization. Genome Res 2001; 11:1086-94. [PMID: 11381034 PMCID: PMC311125 DOI: 10.1101/gr.171701] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2000] [Accepted: 03/02/2001] [Indexed: 11/24/2022]
Abstract
Chromosomal rearrangements are frequently monitored by fluorescence in situ hybridization (FISH) using large, recombinant DNA probes consisting of contiguous genomic intervals that are often distant from disease loci. We developed smaller, targeted, single-copy probes directly from the human genome sequence. These single-copy FISH (scFISH) probes were designed by computational sequence analysis of approximately 100-kb genomic sequences. ScFISH probes are produced by long PCR, then purified, labeled, and hybridized individually or in combination to human chromosomes. Preannealing or blocking with unlabeled, repetitive DNA is unnecessary, as scFISH probes lack repetitive DNA sequences. The hybridization results are analogous to conventional FISH, except that shorter probes can be readily visualized. Combinations of probes from the same region gave single hybridization signals on metaphase chromosomes. ScFISH probes are produced directly from genomic DNA, and thus more quickly than by recombinant DNA techniques. We developed single-copy probes for three chromosomal regions-the CDC2L1 (chromosome 1p36), MAGEL2 (chromosome 15q11.2), and HIRA (chromosome 22q11.2) genes-and show their utility for FISH. The smallest probe tested was 2290 bp in length. To assess the potential utility of scFISH for high-resolution analysis, we determined chromosomal distributions of such probes. Single-copy intervals of this length or greater are separated by an average of 29.2 and 22.3 kb on chromosomes 21 and 22, respectively. This indicates that abnormalities seen on metaphase chromosomes could be characterized with scFISH probes at a resolution greater than previously possible.
Collapse
Affiliation(s)
- P K Rogan
- Section of Medical Genetics and Molecular Medicine, Children's Mercy Hospital and Clinics, University of Missouri-Kansas City School of Medicine, Kansas City, Missouri 64108, USA.
| | | | | |
Collapse
|
171
|
Gemünd C, Ramu C, Altenberg-Greulich B, Gibson TJ. Gene2EST: a BLAST2 server for searching expressed sequence tag (EST) databases with eukaryotic gene-sized queries. Nucleic Acids Res 2001; 29:1272-7. [PMID: 11238992 PMCID: PMC29756 DOI: 10.1093/nar/29.6.1272] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Expressed sequence tags (ESTs) are randomly sequenced cDNA clones. Currently, nearly 3 million human and 2 million mouse ESTs provide valuable resources that enable researchers to investigate the products of gene expression. The EST databases have proven to be useful tools for detecting homologous genes, for exon mapping, revealing differential splicing, etc. With the increasing availability of large amounts of poorly characterised eukaryotic (notably human) genomic sequence, ESTs have now become a vital tool for gene identification, sometimes yielding the only unambiguous evidence for the existence of a gene expression product. However, BLAST-based Web servers available to the general user have not kept pace with these developments and do not provide appropriate tools for querying EST databases with large highly spliced genes, often spanning 50 000-100 000 bases or more. Here we describe Gene2EST (http://woody.embl-heidelberg.de/gene2est/), a server that brings together a set of tools enabling efficient retrieval of ESTs matching large DNA queries and their subsequent analysis. RepeatMasker is used to mask dispersed repetitive sequences (such as Alu elements) in the query, BLAST2 for searching EST databases and Artemis for graphical display of the findings. Gene2EST combines these components into a Web resource targeted at the researcher who wishes to study one or a few genes to a high level of detail.
Collapse
Affiliation(s)
- C Gemünd
- European Molecular Biology Laboratory, Postfach 10.2209, 69012 Heidelberg, Germany
| | | | | | | |
Collapse
|
172
|
Abstract
We calculated nucleotide distribution curves along the DNA molecules of the human chromosomes 21 and 22, their correlations in more than 10,000 equidistant positions, and subjected the correlations to cluster analysis. The cluster analysis demonstrated that both DNA molecules were composed of two types of segments exhibiting qualitatively different correlations. The segments differed most in the correlation of the distribution curves of cytosine and guanine, which was very high in type I segments but weak in type II segments. The type I and II segments also significantly differed in the correlations of the distribution curves of adenine with thymine. In addition, adenine strongly anticorrelated with cytosine but this anticorrelation was uniform along both chromosomes and, therefore, it did not contribute to the distinction of the two types of segments. The segments were up to 100 kbp long but they had nothing in common with isochores. Building blocks of the mosaic structure of the DNA molecules of the human chromosomes 21 and 22 are very similar but different in several interesting aspects from those of E. coli.
Collapse
Affiliation(s)
- D Häring
- Institute of Biophysics, Academy of Sciences of the Czech Republic, Brno
| | | |
Collapse
|
173
|
Rhodes DA, Stammers M, Malcherek G, Beck S, Trowsdale J. The cluster of BTN genes in the extended major histocompatibility complex. Genomics 2001; 71:351-62. [PMID: 11170752 DOI: 10.1006/geno.2000.6406] [Citation(s) in RCA: 110] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We sequenced the 170-kb cluster of BTN genes in the extended major histocompatibility complex region, 4 Mb telomeric of human leukocyte antigen class I genes, at 6p22.1. The cluster consists of seven genes belonging to the expanding B7/butyrophilin-like group, a subset of the immunoglobulin gene superfamily. The main complex is composed of six genes, from two subfamilies, BTN2 and BTN3, arranged in pairs. This alternating pattern must have evolved by duplications of an original block of two genes, one from each subfamily. The sequences from the two subfamilies share approximately 50% amino acid identity. By analysis of repeat elements within each block, these duplications may be dated to approximately 100 million years ago, at about the time of the branching of the Rodentia and Primate lineages. The single BTN1A1 (butyrophilin) gene was positioned approximately 25 kb centromeric to the cluster. Each gene covers approximately 12 kb and consists of seven (BTN2 subfamily) or nine (BTN3 subfamily) coding exons. The predicted leader sequence, immunoglobulin-like IgV (variable)/IgC (constant) ectodomains, and the predicted transmembrane domain are encoded on separate exons and are separated from a B30.2 domain by a variable number of very short exons, 21 and 27 nucleotides in length. BTN transcripts were detected in all tissues examined. Alternative splicing, involving particularly the carboxyl-terminal B30.2 domain, was a notable feature. Most transcripts of BTN2 subfamily genes contained this domain, whereas BTN3 genes did not. Using immunofluorescence, we showed surface expression of BTN-green fluorescent protein fusions in mammalian cell transfectants.
Collapse
MESH Headings
- Alternative Splicing
- Amino Acid Sequence
- Animals
- Blotting, Northern
- Blotting, Southern
- Butyrophilins
- CHO Cells
- Cell Membrane/metabolism
- Chromosome Mapping
- Chromosomes, Human, Pair 6
- Cricetinae
- DNA, Complementary/metabolism
- Exons
- Expressed Sequence Tags
- Genetic Markers
- Green Fluorescent Proteins
- Haplotypes
- HeLa Cells
- Homozygote
- Humans
- Luminescent Proteins/metabolism
- Major Histocompatibility Complex
- Membrane Glycoproteins/genetics
- Microscopy, Fluorescence
- Models, Genetic
- Molecular Sequence Data
- Multigene Family
- Polymorphism, Genetic
- Protein Structure, Tertiary
- RNA, Messenger/metabolism
- RNA, Spliced Leader
- Recombinant Fusion Proteins/metabolism
- Repetitive Sequences, Nucleic Acid
- Reverse Transcriptase Polymerase Chain Reaction
- Sequence Homology, Amino Acid
- Transfection
Collapse
Affiliation(s)
- D A Rhodes
- Department of Immunology, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QP, United Kingdom.
| | | | | | | | | |
Collapse
|
174
|
King LM, Francomano CA. Characterization of a human gene encoding nucleosomal binding protein NSBP1. Genomics 2001; 71:163-73. [PMID: 11161810 DOI: 10.1006/geno.2000.6443] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We characterize the cDNA and genomic structure of NSBP1, and demonstrate that it is a nuclear protein and the homologue of mouse Nsbp1, which is known to encode a nucleosomal binding and transcriptional activating protein related to the HMG-14/-17 chromosomal proteins. The encoded NSBP1 protein has 86% amino acid similarity to Nsbp1, including identity in nucleosomal binding domains of the HMG-14/-17 proteins. Our radiation hybrid data localize NSBP1 and Nsbp1 to homologous regions of chromosome X, with NSBP1 in Xq13.3 between DXS983 and DXS995 and Nsbp1 in the interval DXMit65 and DXMit39. Although Nsbp1 produces one mRNA transcript, NSBP1 produces three transcripts with alternate polyadenylated sites. The 3' untranslated region (UTR) of NSPB1 mRNA also contains several AU-rich elements (AREs), which are associated with rapid mRNA turnover. Northern analysis of NSBP1/Nsbp1 shows differences in transcript abundance among adult and fetal tissues, with predominant expression in liver, kidney, trabecular bone, and bone marrow stromal cells. However, a reverse transcriptase-PCR analysis shows nearly ubiquitous expression of the three NSBP1 transcripts in all tissues examined, although the abundance of each transcript was not quantified. NSBP1 is encoded by six exons and has exon-intron boundaries identical to the HMG-14/-17 genes. The last exon and the 3' UTR of NSBP1 contain retrotransposon sequences of HAL1, HERV-H, and L1MB7, suggesting that these retrotransposons were involved in the origin of NSPB1 from an ancestral-like HMG-14/-17 gene. The similarities among NSBP1, Nsbp1, and the HMG-14/-17 proteins suggest that NSBP1 may function as a nucleosomal binding and transcriptional activating element. Further, the AREs in the 3' UTR of NSPB1 suggest that alternate poly(A) site selection may mediate the mRNA stability of this gene.
Collapse
Affiliation(s)
- L M King
- Medical Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA.
| | | |
Collapse
|
175
|
Schön U, Seifarth W, Baust C, Hohenadl C, Erfle V, Leib-Mösch C. Cell type-specific expression and promoter activity of human endogenous retroviral long terminal repeats. Virology 2001; 279:280-91. [PMID: 11145909 DOI: 10.1006/viro.2000.0712] [Citation(s) in RCA: 68] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Evolution over millions of years has adapted several thousand copies of retrovirus-like elements and over 10 times as many solitary long terminal repeats (LTRs) to their present location in the human genome. Transcription of these human endogenous retroviruses (HERVs) has been detected in various cells and tissues, and in some cases their transcriptional control elements have been recruited by cellular genes. We used a retroviral pol-specific expression array to obtain a HERV transcription profile in a variety of human cells such as epidermal keratinocytes, liver cells, kidney cells, pancreatic cells, lymphocytes, and lung fibroblasts. This rapid screening test revealed a distinct HERV pol-expression pattern in each cell type tested so far. About 40 different U3/R regulatory sequences from the HERV-H and HERV-W families were then amplified from actively transcribed 3'HERV LTRs of various cell lines and tissues. Their promoter activities were compared with LTR sequences of other known HERV families in 12 human cell lines using a transient luciferase reporter system. Expression of the isolated HERV LTRs varied significantly in these cell lines, in some cases showing strict cell type specificity. These results suggest that endogenous retroviral LTRs may be a valuable source of transcriptional regulatory elements for the construction of targeted retroviral expression vectors.
Collapse
Affiliation(s)
- U Schön
- Institute of Molecular Virology, Oberschleissheim, D-85764, Germany.
| | | | | | | | | | | |
Collapse
|
176
|
Stenger JE, Lobachev KS, Gordenin D, Darden TA, Jurka J, Resnick MA. Biased distribution of inverted and direct Alus in the human genome: implications for insertion, exclusion, and genome stability. Genome Res 2001; 11:12-27. [PMID: 11156612 DOI: 10.1101/gr.158801] [Citation(s) in RCA: 98] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Alu sequences, the most abundant class of large dispersed DNA repeats in human chromosomes, contribute to human genome dynamics. Recently we reported that long inverted repeats, including human Alus, can be strong initiators of genetic change in yeast. We proposed that the potential for interactions between adjacent, closely related Alus would influence their stability and this would be reflected in their distribution. We have undertaken an extensive computational analysis of all Alus (the database is at http://dir.niehs.nih.gov/ALU) to better understand their distribution and circumstances under which Alu sequences might affect genome stability. Alus separated by <650 bp were categorized according to orientation, length of regions sharing high sequence identity, distance between highly identical regions, and extent of sequence identity. Nearly 50% of all Alu pairs have long alignable regions (>275 bp), corresponding to nearly full-length Alus, regardless of orientation. There are dramatic differences in the distributions and character of Alu pairs with closely spaced, nearly identical regions. For Alu pairs that are directly repetitive, approximately 30% have highly identical regions separated by <20 bp, but only when the alignments correspond to near full-size or half-size Alus. The opposite is found for the distribution of inverted repeats: Alu pairs with aligned regions separated by <20 bp are rare. Furthermore, closely spaced direct and inverted Alus differ in their truncation patterns, suggesting differences in the mechanisms of insertion. At larger distances, the direct and inverted Alu pairs have similar distributions. We propose that sequence identity, orientation, and distance are important factors determining insertion of adjacent Alus, the frequency and spectrum of Alu-associated changes in the genome, and the contribution of Alu pairs to genome instability. Based on results in model systems and the present analysis, closely spaced inverted Alu pairs with long regions of alignment are likely at-risk motifs (ARMs) for genome instability.
Collapse
Affiliation(s)
- J E Stenger
- Laboratory of Structural Biology, National Institute for Environmental Health Sciences, NIH, Research Triangle Park, North Carolina 27709, USA
| | | | | | | | | | | |
Collapse
|
177
|
Pertsemlidis A, Pande A, Miller B, Schilling P, Wei MH, Lerman MI, Minna JD, Garner HR, Mittelman D. PANORAMA: an integrated Web-based sequence analysis tool and its role in gene discovery. Genomics 2000; 70:300-6. [PMID: 11161780 DOI: 10.1006/geno.2000.6359] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
As the exponential growth of DNA sequence information in databases continues, the task of converting this deposited information into knowledge becomes more dependent on integrative sequence analysis and visualization tools. PANORAMA is an Internet-accessible software package that performs a variety of informatics analyses on a given DNA sequence and returns a visual and interactive representation of the results. Its design is modular, so that further sequence analysis tools can be integrated with minimal effort. The utility of PANORAMA is demonstrated in the analysis of 650 kb of human genomic DNA from chromosome region 3p21.3, a region of potential tumor suppressor genes involved in lung cancer, breast cancer, and other forms of cancer. PANORAMA aided in the discovery of genes and alternate splice forms of known exons, in the demarcation of intron-exon boundaries, and in the identification of promoter regions and polymorphisms, all of which contributed to a better understanding of the region. PANORAMA is available on the World Wide Web at http://atlas.swmed.edu.
Collapse
Affiliation(s)
- A Pertsemlidis
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
178
|
Miller WJ, Nagel A, Bachmann J, Bachmann L. Evolutionary dynamics of the SGM transposon family in the Drosophila obscura species group. Mol Biol Evol 2000; 17:1597-609. [PMID: 11070048 DOI: 10.1093/oxfordjournals.molbev.a026259] [Citation(s) in RCA: 55] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
SGM (Drosophila subobscura, Drosophila guanche, and Drosophila madeirensis) transposons are a family of transposable elements (TEs) in Drosophila with some functional and structural similarities to miniature inverted-repeat transposable elements (MITEs). These elements were recently active in D. subobscura and D. madeirensis (1-2 MYA), but in D. guanche (3-4 MYA), they gave rise to a species-specifically amplified satellite DNA making up approximately 10% of its genome. SGM elements were already active in the common ancestor of all three species, giving rise to the A-type specific promoter section of the P:-related neogene cluster. SGM sequences are similar to elements found in other obscura group species, such as the ISY elements in D. miranda and the ISamb elements in Drosophila ambigua. SGM elements are composed of different sequence modules, and some of them, i.e., LS and LS-core, are found throughout the Drosophila and Sophophora radiation with similarity to more distantly related TEs. The LS-core module is highly enriched in the noncoding sections of the Drosophila melanogaster genome, suggesting potential regulatory host gene functions. The SGM elements can be considered as a model system elucidating the evolutionary dynamics of mobile elements in their arms race with host-directed silencing mechanisms and their evolutionary impact on the structure and composition of their respective host genomes.
Collapse
Affiliation(s)
- W J Miller
- Institute of Medical Biology, General Genetics, University of Vienna, Austria.
| | | | | | | |
Collapse
|
179
|
Bances P, Fernandez MR, Rodriguez-Garcia MI, Morgan RO, Fernandez MP. Annexin A11 (ANXA11) gene structure as the progenitor of paralogous annexins and source of orthologous cDNA isoforms. Genomics 2000; 69:95-103. [PMID: 11013079 DOI: 10.1006/geno.2000.6309] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The genomic organization of the annexin A11 gene was determined in mouse and human to assess its congruity with other family members and to examine the species variation in alternative splicing patterns. Mouse annexin A11 genomic clones were characterized by restriction analysis, Southern blotting, and DNA sequencing, and the homologous human gene (HGMW-approved gene symbol ANXA11) was deciphered from high-throughput genomic sequence with coanalysis of expressed sequence tags. Exons 6-15 of the tetrad core repeat region differ from annexins A7 and A13 but are spliced identically to other phylogenetic descendents, making annexin A11 the putative primary progenitor of up to nine paralogous human annexins. The 5' regions consist of untranslated exon 1, followed by an extensive intron 1 comprising almost half the total gene length of >40 kb, and additional GC-rich exons 2-5 encoding the proline- and glycine-rich amino-terminus. Distinct cDNA isoforms in cow and human were determined to be unique to each species and hence of dubious general significance for this gene's function. Multiple transcription start sites were revealed by primer extension analysis of the mouse gene, and transfection constructs containing the prospective promoter generated transcriptional activity comparable to that of the SV40 promoter. Internal repetitive elements and vicinal gene markers were mapped for the complete human annexin A11 gene sequence to characterize the surrounding genomic environment.
Collapse
Affiliation(s)
- P Bances
- Department of Biochemistry and Molecular Biology, Edificio Santiago Gascon, University of Oviedo, Oviedo, E-33006, Spain
| | | | | | | | | |
Collapse
|
180
|
Kawasaki K, Minoshima S, Shimizu N. Propagation and maintenance of the 119 human immunoglobulin Vlambda genes and pseudogenes during evolution. THE JOURNAL OF EXPERIMENTAL ZOOLOGY 2000; 288:120-34. [PMID: 10931496 DOI: 10.1002/1097-010x(20000815)288:2<120::aid-jez4>3.0.co;2-i] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We previously determined a contiguous 1,025,415-nucleotide sequence of the entire human immunoglobulin lambda gene locus, in which a total of 36 potentially functional Vlambda genes and 33 pseudogenes were localized. We also identified many more incomplete Vlambda genes to be characterized further. Some of these possessed only a slight sequence homology with the known Vlambda genes, and others possessed a high homology but had severely truncated coding regions. Here, we made extensive characterization of 50 new Vlambda pseudogenes, totaling 119 gene segments in the Vlambda gene locus. Of these 119 Vlambda genes, 118 were localized within the five Vlambda gene-rich clusters that we previously defined. Two of these novel Vlambda pseudogenes possessed the opposite transcriptional polarity to all the other Vlambda genes. The present comprehensive analysis of 119 Vlambda genes validated our previous classification of Vlambda genes and provided a basis for a possible mechanism by which a large number of Vlambda pseudogenes were propagated and maintained as a particular locus during evolution.
Collapse
Affiliation(s)
- K Kawasaki
- Department of Molecular Biology, Keio University School of Medicine, Shinjuku-ku, Tokyo 160-8582, Japan
| | | | | |
Collapse
|
181
|
Hudson LD. Breaking away from home. Am J Hum Genet 2000; 67:1-3. [PMID: 10827110 PMCID: PMC1287066 DOI: 10.1086/302982] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2000] [Accepted: 05/08/2000] [Indexed: 11/03/2022] Open
|
182
|
Abstract
Recent availability of extensive genome sequence information offers new opportunities to analyze genome organization, including transposon diversity and accumulation, at a level of resolution that was previously unattainable. In this report, we used sequence similarity search and analysis protocols to perform a fine-scale analysis of a large sample ( approximately 17.2 Mb) of the Arabidopsis thaliana (Columbia) genome for transposons. Consistent with previous studies, we report that the A. thaliana genome harbors diverse representatives of most known superfamilies of transposons. However, our survey reveals a higher density of transposons of which over one-fourth could be classified into a single novel transposon family designated as Basho, which appears unrelated to any previously known superfamily. We have also identified putative transposase-coding ORFs for miniature inverted-repeat transposable elements (MITEs), providing clues into the mechanism of mobility and origins of the most abundant transposons associated with plant genes. In addition, we provide evidence that most mined transposons have a clear distribution preference for A + T-rich sequences and show that structural variation for many mined transposons is partly due to interelement recombination. Taken together, these findings further underscore the complexity of transposons within the compact genome of A. thaliana.
Collapse
Affiliation(s)
- Q H Le
- Department of Biology, McGill University, 1205 Docteur Penfield Avenue, Montreal, Quebec H3A 1B1, Canada
| | | | | | | |
Collapse
|
183
|
Pittoggi C, Zaccagnini G, Giordano R, Magnano AR, Baccetti B, Lorenzini R, Spadafora C. Nucleosomal domains of mouse spermatozoa chromatin as potential sites for retroposition and foreign DNA integration. Mol Reprod Dev 2000; 56:248-51. [PMID: 10824977 DOI: 10.1002/(sici)1098-2795(200006)56:2+<248::aid-mrd7>3.0.co;2-v] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Exogenous DNA molecules are spontaneously taken up by sperm cells, internalized in nuclei, and eventually integrated in the sperm genome. The actual occurrence of the integration suggests that the sperm chromosomal DNA is not uniformly and tightly packed with protamines, implying the existence of genomic sites where the chromosomal DNA is accessible to foreign molecules. We have characterized a hypersensitive, nucleosomal subfraction of mouse sperm chromatin that is highly enriched in unmethylated retroposon DNA from a variety of families. Here we propose that both the integration of exogenous DNA molecules, and the endogenous retroposition activity, occur in the same site(s) of sperm chromatin.
Collapse
Affiliation(s)
- C Pittoggi
- Institute of General Biology, University of Siena, Italy
| | | | | | | | | | | | | |
Collapse
|
184
|
Baust C, Seifarth W, Germaier H, Hehlmann R, Leib-Mösch C. HERV-K-T47D-Related long terminal repeats mediate polyadenylation of cellular transcripts. Genomics 2000; 66:98-103. [PMID: 10843810 DOI: 10.1006/geno.2000.6175] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The human genome harbors thousands of long terminal repeats (LTRs) that are derived from endogenous retroviruses and contain elements able to regulate the expression of neighboring cellular genes. We have investigated the ability of human endogenous retroviral (HERV)-K LTRs to provide transcriptional processing signals for nonviral sequences. Four chimeric cDNA clones isolated from a cDNA library derived from the human cell line T47D were found to be polyadenylated within an HERV-K-T47D-related LTR. Two transcripts containing an as yet unknown cellular sequence were probably derived from the same genomic locus but their 3' ends were processed at different positions of the LTR. Structural analysis of the polyadenylation site suggests RNA stem-loop structures similar to the HTLV-1 Rex responsive element that bring the two remote AAUAAA and GU-rich elements into the spatial juxtaposition necessary for correct 3' end processing. The cellular part of the third chimeric clone shows significant homology to an exon of the human tyrosine phosphatase 1 gene, although oriented in the antisense direction compared to the adjacent LTR. Furthermore, we found that the 3' untranslated region of the human transmembrane tyrosine kinase gene FLT4 is probably derived from a partial HERV-K-T47D LTR sequence. Taken together, our data suggest that LTRs of the HERV-K-T47D family display biological function by mediating polyadenylation of cellular sequences.
Collapse
Affiliation(s)
- C Baust
- Medical Clinic III, Faculty of Clinical Medicine Mannheim, University of Heidelberg, Mannheim, D-68305, Germany.
| | | | | | | | | |
Collapse
|
185
|
Martusewitsch E, Sensen CW, Schleper C. High spontaneous mutation rate in the hyperthermophilic archaeon Sulfolobus solfataricus is mediated by transposable elements. J Bacteriol 2000; 182:2574-81. [PMID: 10762261 PMCID: PMC111323 DOI: 10.1128/jb.182.9.2574-2581.2000] [Citation(s) in RCA: 102] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
We have isolated uracil-auxotrophic mutants of the hyperthermophilic archaeon Sulfolobus solfataricus in order to explore the genomic stability and mutational frequencies of this organism and to identify complementable recipients for a selectable genetic transformation system. Positive selection of spontaneous mutants resistant to 5-fluoroorotate yielded uracil auxotrophs with frequencies of between 10(-4) and 10(-5) per sensitive, viable cell. Four different, nonhomologous insertion sequences (ISs) were identified at different positions within the chromosomal pyrEF locus of these mutants. They ranged in size from 1,058 to 1,439 bp and possessed properties typical of known transposable elements, i.e., terminal inverted repeats, flanking duplicated target sequences, and putative transposase genes encoding motifs that are indicative of the IS4-IS5 IS element families. Between 12 and 25 copies of each IS element were found in chromosomal DNAs by Southern analyses. While characteristic fingerprint patterns created by IS element-specific probes were observed with genomic DNA of different S. solfataricus strains, no homologous sequences were identified in DNA of other well-characterized strains of the order Sulfolobales.
Collapse
Affiliation(s)
- E Martusewitsch
- Institute of Microbiology, Darmstadt University of Technology, 64287 Darmstadt, Germany
| | | | | |
Collapse
|
186
|
Abstract
Ab initio gene identification in the genomic sequence of Drosophila melanogaster was obtained using (human gene predictor) and Fgenesh programs that have organism-specific parameters for human, Drosophila, plants, yeast, and nematode. We did not use information about cDNA/EST in most predictions to model a real situation for finding new genes because information about complete cDNA is often absent or based on very small partial fragments. We investigated the accuracy of gene prediction on different levels and designed several schemes to predict an unambiguous set of genes (annotation CGG1), a set of reliable exons (annotation CGG2), and the most complete set of exons (annotation CGG3). For 49 genes, protein products of which have clear homologs in protein databases, predictions were recomputed by Fgenesh+ program. The first annotation serves as the optimal computational description of new sequence to be presented in a database. Reliable exons from the second annotation serve as good candidates for selecting the PCR primers for experimental work for gene structure verification. Our results shows that we can identify approximately 90% of coding nucleotides with 20% false positives. At the exon level we accurately predicted 65% of exons and 89% including overlapping exons with 49% false positives. Optimizing accuracy of prediction, we designed a gene identification scheme using Fgenesh, which provided sensitivity (Sn) = 98% and specificity (Sp) = 86% at the base level, Sn = 81% (97% including overlapping exons) and Sp = 58% at the exon level and Sn = 72% and Sp = 39% at the gene level (estimating sensitivity on std1 set and specificity on std3 set). In general, these results showed that computational gene prediction can be a reliable tool for annotating new genomic sequences, giving accurate information on 90% of coding sequences with 14% false positives. However, exact gene prediction (especially at the gene level) needs additional improvement using gene prediction algorithms. The program was also tested for predicting genes of human Chromosome 22 (the last variant of Fgenesh can analyze the whole chromosome sequence). This analysis has demonstrated that the 88% of manually annotated exons in Chromosome 22 were among the ab initio predicted exons. The suite of gene identification programs is available through the WWW server of Computational Genomics Group at http://genomic.sanger.ac.uk/gf. html.
Collapse
Affiliation(s)
- A A Salamov
- The Sanger Centre, Hinxton, Cambridge CB10 1SA, UK
| | | |
Collapse
|
187
|
Reese MG, Hartzell G, Harris NL, Ohler U, Abril JF, Lewis SE. Genome annotation assessment in Drosophila melanogaster. Genome Res 2000; 10:483-501. [PMID: 10779488 PMCID: PMC310877 DOI: 10.1101/gr.10.4.483] [Citation(s) in RCA: 125] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2000] [Accepted: 02/29/2000] [Indexed: 11/24/2022]
Abstract
Computational methods for automated genome annotation are critical to our community's ability to make full use of the large volume of genomic sequence being generated and released. To explore the accuracy of these automated feature prediction tools in the genomes of higher organisms, we evaluated their performance on a large, well-characterized sequence contig from the Adh region of Drosophila melanogaster. This experiment, known as the Genome Annotation Assessment Project (GASP), was launched in May 1999. Twelve groups, applying state-of-the-art tools, contributed predictions for features including gene structure, protein homologies, promoter sites, and repeat elements. We evaluated these predictions using two standards, one based on previously unreleased high-quality full-length cDNA sequences and a second based on the set of annotations generated as part of an in-depth study of the region by a group of Drosophila experts. Although these standard sets only approximate the unknown distribution of features in this region, we believe that when taken in context the results of an evaluation based on them are meaningful. The results were presented as a tutorial at the conference on Intelligent Systems in Molecular Biology (ISMB-99) in August 1999. Over 95% of the coding nucleotides in the region were correctly identified by the majority of the gene finders, and the correct intron/exon structures were predicted for >40% of the genes. Homology-based annotation techniques recognized and associated functions with almost half of the genes in the region; the remainder were only identified by the ab initio techniques. This experiment also presents the first assessment of promoter prediction techniques for a significant number of genes in a large contiguous region. We discovered that the promoter predictors' high false-positive rates make their predictions difficult to use. Integrating gene finding and cDNA/EST alignments with promoter predictions decreases the number of false-positive classifications but discovers less than one-third of the promoters in the region. We believe that by establishing standards for evaluating genomic annotations and by assessing the performance of existing automated genome annotation tools, this experiment establishes a baseline that contributes to the value of ongoing large-scale annotation projects and should guide further research in genome informatics.
Collapse
Affiliation(s)
- M G Reese
- Berkeley Drosophila Genome Project, Department of Molecular and Cell Biology, University of California, Berkeley 94720-3200, USA.
| | | | | | | | | | | |
Collapse
|
188
|
Thomas MC, Olivares M, Escalante M, Marañón C, Montilla M, Nicholls S, López MC, Puerta C. Plasticity of the histone H2A genes in a Brazilian and six Colombian strains of Trypanosoma cruzi. Acta Trop 2000; 75:203-10. [PMID: 10708660 DOI: 10.1016/s0001-706x(00)00061-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The analysis of three recombinant clones containing the histone H2A locus isolated from a genomic library of Trypanosoma cruzi DNA shows that the H2A gene loci are formed by 1.2 and 0.76 kb long intercalated units organized in a head-to-tail tandem array. The difference in length between the two gene units is due to the presence of a short interspersed nucleotide element (SINE)-like DNA sequence inserted at the 3' end of some of these units. Southern, northern and chromosomal blot analysis of a Brazilian Y strain and six Colombian strains demonstrated the existence of polymorphisms regarding the relative copy number of the H2A gene units, the relative abundance of the H2A transcripts and their chromosomal location. These results show the existence of a dynamic organization in the H2A loci among T. cruzi strains in which a SINE-like sequence may be involved and support the fact that T. cruzi has a high degree of plasticity in its genome.
Collapse
MESH Headings
- Animals
- Blotting, Northern
- Blotting, Southern
- Brazil
- Cloning, Molecular
- Colombia
- DNA, Protozoan/analysis
- Electrophoresis, Gel, Pulsed-Field
- Escherichia coli/metabolism
- Gene Dosage
- Genes, Protozoan
- Genetic Vectors
- Genome, Protozoan
- Histones/biosynthesis
- Histones/genetics
- Humans
- Polymorphism, Genetic
- RNA, Protozoan/analysis
- Recombinant Proteins/biosynthesis
- Short Interspersed Nucleotide Elements
- Trypanosoma cruzi/genetics
Collapse
Affiliation(s)
- M C Thomas
- Instituto de Parasitología y Biomedicina 'López Neyra', Consejo Superior de Investigaciones Científicas, Calle Ventanilla 11, 18001, Granada, Spain
| | | | | | | | | | | | | | | |
Collapse
|
189
|
Pesole G, Liuni S, Grillo G, Licciulli F, Larizza A, Makalowski W, Saccone C. UTRdb and UTRsite: specialized databases of sequences and functional elements of 5' and 3' untranslated regions of eukaryotic mRNAs. Nucleic Acids Res 2000; 28:193-6. [PMID: 10592223 PMCID: PMC102415 DOI: 10.1093/nar/28.1.193] [Citation(s) in RCA: 64] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/1999] [Accepted: 10/04/1999] [Indexed: 11/12/2022] Open
Abstract
The 5' and 3' untranslated regions of eukaryotic mRNAs may play a crucial role in the regulation of gene expression controlling mRNA localization, stability and translational efficiency. For this reason we developed UTRdb, a specialized database of 5' and 3' untranslated sequences of eukaryotic mRNAs cleaned from redundancy. UTRdb entries are enriched with specialized information not present in the primary databases including the presence of nucleotide sequence patterns already demonstrated by experimental analysis to have some functional role. All these patterns have been collected in the UTRsite database so that it is possible to search any input sequence for the presence of annotated functional motifs. Furthermore, UTRdb entries have been annotated for the presence of repetitive elements. All internet resources implemented for retrieval and functional analysis of 5' and 3' untranslated regions of eukaryotic mRNAs are accessible at http://bigarea.area.ba.cnr.it:8000/EmbIT/UTRH ome/
Collapse
Affiliation(s)
- G Pesole
- Dipartimento di Fisiologia e Biochimica Generali, Università di Milano, via Celoria 26, 20133 Milano, Italy.
| | | | | | | | | | | | | |
Collapse
|
190
|
Hishiki T, Kawamoto S, Morishita S, Okubo K. BodyMap: a human and mouse gene expression database. Nucleic Acids Res 2000; 28:136-8. [PMID: 10592203 PMCID: PMC102396 DOI: 10.1093/nar/28.1.136] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
BodyMap is a human and mouse gene expression database that has been maintained since 1993. It is based on site-directed 3'-ESTs collected from non-biased cDNA libraries constructed at Osaka University and contains >270 000 sequences from 60 human and 38 mouse tissues. The site-directed nature of the sequence tags allows unequivocal grouping of tags representing the same transcript and provides abundance information for each transcript in different parts of the body. Our collection of ESTs was compared periodically with other public databases for cross referencing. The histological resolution of source tissues and unique cloning strategy that minimized cloning bias enabled BodyMap to support three unique mRNA based experiments in silico. First, the recurrence information for clones in each library provides a rough estimate of the mRNA composition of each source tissue. Second, a user can search the entire data set with nucleotide sequences or keywords to assess expression patterns of particular genes. Third, and most important, BodyMap allows a user to select genes that have a desired expression pattern in humans and mice. BodyMap is accessible through the WWW at http://bodymap.ims.u-tokyo.ac.jp
Collapse
Affiliation(s)
- T Hishiki
- Institute for Molecular and Cellular Biology, Osaka University, 1-3 Yamadaoka, Suita, Osaka 565-0871, Japan
| | | | | | | |
Collapse
|
191
|
Waldman AS, Tran H, Goldsmith EC, Resnick MA. Long inverted repeats are an at-risk motif for recombination in mammalian cells. Genetics 1999; 153:1873-83. [PMID: 10581292 PMCID: PMC1460879 DOI: 10.1093/genetics/153.4.1873] [Citation(s) in RCA: 39] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Certain DNA sequence motifs and structures can promote genomic instability. We have explored instability induced in mouse cells by long inverted repeats (LIRs). A cassette was constructed containing a herpes simplex virus thymidine kinase (tk) gene into which was inserted an LIR composed of two inverted copies of a 1.1-kb yeast URA3 gene sequence separated by a 200-bp spacer sequence. The tk gene was introduced into the genome of mouse Ltk(-) fibroblasts either by itself or in conjunction with a closely linked tk gene that was disrupted by an 8-bp XhoI linker insertion; rates of intrachromosomal homologous recombination between the markers were determined. Recombination between the two tk alleles was stimulated 5-fold by the LIR, as compared to a long direct repeat (LDR) insert, resulting in nearly 10(-5) events per cell per generation. Of the tk(+) segregants recovered from LIR-containing cell lines, 14% arose from gene conversions that eliminated the LIR, as compared to 3% of the tk(+) segregants from LDR cell lines, corresponding to a >20-fold increase in deletions at the LIR hotspot. Thus, an LIR, which is a common motif in mammalian genomes, is at risk for the stimulation of homologous recombination and possibly other genetic rearrangements.
Collapse
Affiliation(s)
- A S Waldman
- Department of Biological Sciences, University of South Carolina, Columbia, South Carolina 29208, USA.
| | | | | | | |
Collapse
|
192
|
Miller RT, Christoffels AG, Gopalakrishnan C, Burke J, Ptitsyn AA, Broveak TR, Hide WA. A comprehensive approach to clustering of expressed human gene sequence: the sequence tag alignment and consensus knowledge base. Genome Res 1999; 9:1143-55. [PMID: 10568754 PMCID: PMC310831 DOI: 10.1101/gr.9.11.1143] [Citation(s) in RCA: 142] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/1999] [Accepted: 09/20/1999] [Indexed: 11/24/2022]
Abstract
The expressed human genome is being sequenced and analyzed by disparate groups producing disparate data. The majority of the identified coding portion is in the form of expressed sequence tags (ESTs). The need to discover exonic representation and expression forms of full-length cDNAs for each human gene is frustrated by the partial and variable quality nature of this data delivery. A highly redundant human EST data set has been processed into integrated and unified expressed transcript indices that consist of hierarchically organized human transcript consensi reflecting gene expression forms and genetic polymorphism within an index class. The expression index and its intermediate outputs include cleaned transcript sequence, expression, and alignment information and a higher fidelity subset, SANIGENE. The STACK_PACK clustering system has been applied to dbEST release 121598 (GenBank version 110). Sixty-four percent of 1,313, 103 Homo sapiens ESTs are condensed into 143,885 tissue level multiple sequence clusters; linking through clone-ID annotations produces 68,701 total assemblies, such that 81% of the original input set is captured in a STACK multiple sequence or linked cluster. Indexing of alignments by substituent EST accession allows browsing of the data structure and its cross-links to UniGene. STACK metaclusters consolidate a greater number of ESTs by a factor of 1. 86 with respect to the corresponding UniGene build. Fidelity comparison with genome reference sequence AC004106 demonstrates consensus expression clusters that reflect significantly lower spurious repeat sequence content and capture alternate splicing within a whole body index cluster and three STACK v.2.3 tissue-level clusters. Statistics of a staggered release whole body index build of STACK v.2.0 are presented.
Collapse
Affiliation(s)
- R T Miller
- South African National Bioinformatics Institute, Private Bag X17, Bellville 7535, University of the Western Cape, South Africa
| | | | | | | | | | | | | |
Collapse
|
193
|
Mager DL. Human endogenous retroviruses and pathogenicity: genomic considerations. Trends Microbiol 1999; 7:431; author reply 431-2. [PMID: 10542420 DOI: 10.1016/s0966-842x(99)01615-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
194
|
Brosius J. RNAs from all categories generate retrosequences that may be exapted as novel genes or regulatory elements. Gene 1999; 238:115-34. [PMID: 10570990 DOI: 10.1016/s0378-1119(99)00227-9] [Citation(s) in RCA: 275] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
While the significance of middle repetitive elements had been neglected for a long time, there are again tendencies to ascribe most members of a given middle repetitive sequence family a functional role--as if the discussion of SINE (short interspersed repetitive elements) function only can occupy extreme positions. In this article, I argue that differences between the various classes of retrosequences concern mainly their copy numbers. Consequently, the function of SINEs should be viewed as pragmatic such as, for example, mRNA-derived retrosequences, without underestimating the impact of retroposition for generation of novel protein coding genes or parts thereof (exon shuffling by retroposition) and in particular of SINEs (and retroelements) in modulating genes and their expression. Rapid genomic change by accumulating retrosequences may even facilitate speciation [McDonald, J.F., 1995. Transposable elements: possible catalysts of organismic evolution. Trends Ecol. Evol. 10, 123-126.] In addition to providing mobile regulatory elements, small RNA-derived retrosequences including SINEs can, in analogy to mRNA-derived retrosequences, also give rise to novel small RNA genes. Perhaps not representative for all SINE/master gene relationships, we gained significant knowledge by studying the small neuronal non-messenger RNAs, namely BC1 RNA in rodents and BC200 RNA in primates. BC1 is the first identified master gene generating a subclass of ID repetitive elements, and BC200 is the only known Alu element (monomeric) that was exapted as a novel small RNA encoding gene.
Collapse
Affiliation(s)
- J Brosius
- Institute of Experimental Pathology/Molecular Neurobiology, ZMBE, University of Münster, Germany.
| |
Collapse
|
195
|
Mager DL, Hunter DG, Schertzer M, Freeman JD. Endogenous retroviruses provide the primary polyadenylation signal for two new human genes (HHLA2 and HHLA3). Genomics 1999; 59:255-63. [PMID: 10444326 DOI: 10.1006/geno.1999.5877] [Citation(s) in RCA: 94] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
By screening the expressed sequence tag (EST) database, we identified transcripts of two new human genes that are polyadenylated within a long terminal repeat (LTR) of the HERV-H endogenous retrovirus family. The first gene, termed HHLA2, is represented by two EST clones and one cDNA clone, all of which have a polyadenylated LTR as their 3' end. The gene has an open reading frame (ORF) of 414 amino acids with three immunoglobulin-like domains and is expressed primarily in intestinal tissues, kidney, and lung. Seven small EST clones from several different tissues were found for the second gene, termed HHLA3. As with HHLA2, all HHLA3 ESTs utilized a HERV-H LTR as the polyadenylation signal. Three types of alternatively spliced HHLA3 transcripts that could encode proteins of 76, 121, or 153 amino acids were detected. Interestingly, the ORF for two of these transcripts continues into the LTR. For both HHLA2 and 3, no major human transcripts that utilized a non-LTR polyadenylation signal were detected. Analysis of RNA from baboon, which lacks the LTRs at these genomic loci, showed that the baboon HHLA2 and 3 genes use other polyadenylation signals. This study demonstrates that ancient retroviral insertions have assumed gene regulatory functions during the course of human evolution.
Collapse
Affiliation(s)
- D L Mager
- British Columbia Cancer Agency and Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada.
| | | | | | | |
Collapse
|
196
|
|
197
|
Pesole G, Liuni S, Grillo G, Ippedico M, Larizza A, Makalowski W, Saccone C. UTRdb: a specialized database of 5' and 3' untranslated regions of eukaryotic mRNAs. Nucleic Acids Res 1999; 27:188-91. [PMID: 9847176 PMCID: PMC148131 DOI: 10.1093/nar/27.1.188] [Citation(s) in RCA: 41] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The 5' and 3' untranslated regions of eukaryotic mRNAs may play a crucial role in the regulation of gene expression controlling mRNA localization, stability and translational efficiency. For this reason we developed UTRdb (http://bigarea.area.ba.cnr.it:8000/BioWWW/#U TRdb), a specialized database of 5' and 3' untranslated sequences of eukaryotic mRNAs cleaned from redundancy. UTRdb entries are enriched with specialized information not present in the primary databases including the presence of nucleotide sequence patterns already demonstrated by experimental analysis to have some functional role. All these patterns have been collected in the UTRsite database so that it is possible to search any input sequence for the presence of annotated functional motifs. Furthermore, UTRdb entries have been annotated for the presence of repetitive elements.
Collapse
Affiliation(s)
- G Pesole
- Dipartimento di Biologia, D.B.A.F., Università della Basilicata, via Anzio 10, 85100 Potenza, Italy.
| | | | | | | | | | | | | |
Collapse
|