101
|
Revisiting the relationship between compositional sequence complexity and periodicity. Comput Biol Chem 2007; 32:17-28. [PMID: 17983838 DOI: 10.1016/j.compbiolchem.2007.09.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2006] [Revised: 05/21/2007] [Accepted: 09/03/2007] [Indexed: 11/20/2022]
Abstract
BACKGROUND Given a big sequence fragment or a set of functionally related sequences we consider two problems of a sequence analysis associated with the given sequence(s). The first problem is to measure sequence complexity (repetitiveness, compactness) to estimate how informative the set as a whole is. Usually an obtained measure should be compared with an appropriate random background calculated using permutation of the given sequences. We propose a novel and effective approach for background information measurement instead of the usual sequence reshuffling. The second problem is to detect a periodic bias to determine if it is one of the set features. Sequence periodicity, when sometimes one has in mind hidden periodicity, is a very basic genomic property. The sequence period of 3, which is considered to characterize coding sequences, and period 10-11, which may be due to the alternation of hydrophobic and hydrophilic amino acids, DNA curvature, and bendability were discovered and described. Searching for periodical biases brought significant results in the study of sequence-dependent nucleosome positioning: nucleosomal sites carry hidden period of about 10.4 bases. RESULTS Calculated differences between genomic sequences and background showed high biological relevancy of the method that we proposed in this study. Our algorithm was applied to a few natural and artificial datasets. We constructed a simple "periodic" dataset by replacement of every tenth dinucleotide in each sequence of a trial set by the same dinucleotide "CC". We showed that the method reveals the introduced periodicity and that this periodical pattern carries higher information than in uninterrupted subsequences. An application of the method to the nucleosomal dataset revealed a weak pseudo-periodicity of 10.4 nucleotides confirming previous knowledge. An application of the method to Escherichia coli datasets revealed the well-known periodicity of 3bp as a genic attribute, a secondary genic period slightly larger than 11bp, and an intergenic period a bit smaller than 11bp. CONCLUSIONS We reported a novel compositional complexity-based method for sequence analysis. We found that the difference between the sequence complexity of a natural sequence and of background is especially high for a set consisting exclusively of coding sequences. Hidden periodicities were found with no need of any preliminary assumptions regarding a composition of periodic elements. We illustrated the power of the method by studying the sets with known weak periodic properties: a nucleosomal database and sets of different regions of E. coli. We showed that the method conveniently indicated all kinds of periodicity and related features in these sets of DNA sequences.
Collapse
|
102
|
Abstract
INTRODUCTIONThe BLAST algorithm was developed as a way to perform DNA and protein sequence similarity searches by an algorithm that is faster than FASTA but considered to be equally as sensitive. Both of these methods follow a heuristic (tried-and-true) method that almost always works to find related sequences in a database search, but does not have the underlying guarantee of an optimal solution like the dynamic programming algorithm. FASTA finds short common patterns in query and database sequences and joins these into an alignment. BLAST is similar to FASTA, but gains a further increase in speed by searching only for rarer, more significant patterns in nucleic acid and protein sequences. BLAST is very popular due to its availability on the World Wide Web through a large server at the National Center for Biotechnology Information (NCBI) and at many other sites. The BLAST algorithm has evolved to provide molecular biologists with a set of very powerful search tools that are freely available to run on many computer platforms. This article is intended to be a "user's guide" to the principles underlying BLAST.
Collapse
|
103
|
Kim TM, Chung YJ, Rhyu MG, Jung MH. Germline methylation patterns inferred from local nucleotide frequency of repetitive sequences in the human genome. Mamm Genome 2007; 18:277-85. [PMID: 17514347 DOI: 10.1007/s00335-007-9016-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2007] [Accepted: 03/12/2007] [Indexed: 12/31/2022]
Abstract
Given the genomic abundance and susceptibility to DNA methylation, interspersed repetitive sequences in the human genome can be exploited as valuable resources in genome-wide methylation studies. To learn about the relationships between DNA methylation and repeat sequences, we performed a global measurement of CpG dinucleotide frequencies for interspersed repetitive sequences and inferred germline methylation patterns in the human genome. Although extensive CpG depletion was observed for most repeat sequences, those in the proximity to CpG islands have been relatively removed from germline methylation being the potential source of germline activation. We also investigated the CpG depletion patterns of Alu pairs to see whether they might play an active role in germline methylation. Two kinds of Alu pairs, direct or inverted pairs classified according to the orientation, showed contrast CpG depletion patterns with respect to separating distance of Alus, i.e., as two Alu elements are more closely spaced in a pair, a higher extent of CpG depletion was observed in inverted orientation and vice versa for directly repetitive Alu pairs. This suggests that specific organization of repetitive sequences, such as inverted Alu pairs, might play a role in triggering DNA methylation consistent with a homology-dependent methylation hypothesis.
Collapse
Affiliation(s)
- Tae-Min Kim
- Division of Metabolic Disease, Center for Biomedical Science, National Institute of Health, Nokbun-dong 5, Eunpyung-gu, Seoul 122-701, Korea
| | | | | | | |
Collapse
|
104
|
Deragon JM, Zhang X. Short interspersed elements (SINEs) in plants: origin, classification, and use as phylogenetic markers. Syst Biol 2007; 55:949-56. [PMID: 17345676 DOI: 10.1080/10635150601047843] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
Abstract
Short interspersed elements (SINEs) are a class of dispersed mobile sequences that use RNA as an intermediate in an amplification process called retroposition. The presence-absence of a SINE at a given locus has been used as a meaningful classification criterion to evaluate phylogenetic relations among species. We review here recent developments in the characterisation of plant SINEs and their use as molecular makers to retrace phylogenetic relations among wild and cultivated Oryza and Brassica species. In Brassicaceae, further use of SINE markers is limited by our partial knowledge of endogenous SINE families (their origin and evolution histories) and by the absence of a clear classification. To solve this problem, phylogenetic relations among all known Brassicaceae SINEs were analyzed and a new classification, grouping SINEs in 15 different families, is proposed. The relative age and size of each Brassicaceae SINE family was evaluated and new phylogenetically supported subfamilies were described. We also present evidence suggesting that new potentially active SINEs recently emerged in Brassica oleracea from the shuffling of preexisting SINE portions. Finally, the comparative evolution history of SINE families present in Arabidopsis thaliana and Brassica oleracea revealed that SINEs were in general more active in the Brassica lineage. The importance of these new data for the use of Brassicaceae SINEs as molecular markers in future applications is discussed.
Collapse
Affiliation(s)
- Jean-Marc Deragon
- CNRS UMR6547, GDR2157 Biomove, Université Blaise Pascal, 24 Avenue des Landais, 63177, Aubière, France.
| | | |
Collapse
|
105
|
Sabot F, Sourdille P, Chantret N, Bernard M. Morgane, a new LTR retrotransposon group, and its subfamilies in wheats. Genetica 2007; 128:439-47. [PMID: 17028971 DOI: 10.1007/s10709-006-7725-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2005] [Accepted: 03/01/2006] [Indexed: 11/26/2022]
Abstract
Transposable elements are the main components of grass genomes, especially in Triticeae species. In a previous analysis, we identified a very short element, Morgane_CR626934-1; here we describe more precisely this unusual element. Morgane_CR626934-1 shows high sequence identity (until 98%) with ESTs belonging to other possible small elements, expressed under abiotic and biotic stress conditions. No putative functional polyprotein could be identified in all of these different Morgane-like sequences. Moreover, elements from the Morgane_CR626934-1 subfamily are found only in wheats and Agropyrum genomes and among these species, only Ae. tauschii and T. aestivum present a high copy number of these elements. They are highly conserved in wheat genomes (95.5%). Based on the uncommon characteristics of the described Morgane-like elements, we proposed to classify them in a new group within the Class I LTR retrotransposon, the Morgane group.
Collapse
Affiliation(s)
- François Sabot
- UMR INRA/UBP 1095 Amélioration & Santé des Plantes, 234 Avenue du Brézet, F-63039, Clermont-Ferrand Cedex, France
| | | | | | | |
Collapse
|
106
|
Sun FJ, Fleurdépine S, Bousquet-Antonelli C, Caetano-Anollés G, Deragon JM. Common evolutionary trends for SINE RNA structures. Trends Genet 2006; 23:26-33. [PMID: 17126948 DOI: 10.1016/j.tig.2006.11.005] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2006] [Revised: 10/10/2006] [Accepted: 11/10/2006] [Indexed: 10/23/2022]
Abstract
Short interspersed elements (SINEs) and long interspersed elements (LINEs) are transposable elements in eukaryotic genomes that mobilize through an RNA intermediate. Understanding their evolution is important because of their impact on the host genome. Most eukaryotic SINEs are ancestrally related to tRNA genes, although the typical tRNA cloverleaf structure is not apparent for most SINE consensus RNAs. Using a cladistic method where RNA structural components were coded as polarized and ordered multistate characters, we showed that related structural motifs are present in most SINE RNAs from mammals, fishes and plants, suggesting common selective constraints imposed at the SINE RNA structural level. Based on these results, we propose a general multistep model for the evolution of tRNA-related SINEs in eukaryotes.
Collapse
Affiliation(s)
- Feng-Jie Sun
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | | | | | | | | |
Collapse
|
107
|
Durand P, Mahé F, Valin AS, Nicolas J. Browsing repeats in genomes: Pygram and an application to non-coding region analysis. BMC Bioinformatics 2006; 7:477. [PMID: 17067389 PMCID: PMC1635066 DOI: 10.1186/1471-2105-7-477] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2006] [Accepted: 10/26/2006] [Indexed: 01/06/2023] Open
Abstract
Background A large number of studies on genome sequences have revealed the major role played by repeated sequences in the structure, function, dynamics and evolution of genomes. In-depth repeat analysis requires specialized methods, including visualization techniques, to achieve optimum exploratory power. Results This article presents Pygram, a new visualization application for investigating the organization of repeated sequences in complete genome sequences. The application projects data from a repeat index file on the analysed sequences, and by combining this principle with a query system, is capable of locating repeated sequences with specific properties. In short, Pygram provides an efficient, graphical browser for studying repeats. Implementation of the complete configuration is illustrated in an analysis of CRISPR structures in Archaea genomes and the detection of horizontal transfer between Archaea and Viruses. Conclusion By proposing a new visualization environment to analyse repeated sequences, this application aims to increase the efficiency of laboratories involved in investigating repeat organization in single genomes or across several genomes.
Collapse
Affiliation(s)
- Patrick Durand
- IRISA/INRIA, Campus de Beaulieu, 35042 Rennes Cedex, France
| | - Frédéric Mahé
- ECOBIO, CNRS UMR 6553, Campus de Beaulieu, 35042 Rennes Cedex, France
| | - Anne-Sophie Valin
- IRISA/INRIA, Campus de Beaulieu, 35042 Rennes Cedex, France
- Institut Curie, Dept transfert, Quadrilatère historique Hôpital Saint Louis, Porte 13, 1 rue Claude Vellefaux, 75010 Paris, France
| | | |
Collapse
|
108
|
Tóth G, Deák G, Barta E, Kiss GB. PLOTREP: a web tool for defragmentation and visual analysis of dispersed genomic repeats. Nucleic Acids Res 2006; 34:W708-13. [PMID: 16845104 PMCID: PMC1538846 DOI: 10.1093/nar/gkl263] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Identification of dispersed or interspersed repeats, most of which are derived from transposons, retrotransposons or retrovirus-like elements, is an important step in genome annotation. Software tools that compare genomic sequences with precompiled repeat reference libraries using sensitive similarity-based methods provide reliable means of finding the positions of fragments homologous to known repeats. However, their output is often incomplete and fragmented owing to the mutations (nucleotide substitutions, deletions or insertions) that can result in considerable divergence from the reference sequence. Merging these fragments to identify the whole region that represents an ancient copy of a mobile element is challenging, particularly if the element is large and suffered multiple deletions or insertions. Here we report PLOTREP, a tool designed to post-process results obtained by sequence similarity search and merge fragments belonging to the same copy of a repeat. The software allows rapid visual inspection of the results using a dot-plot like graphical output. The web implementation of PLOTREP is available at .
Collapse
Affiliation(s)
- Gábor Tóth
- Agricultural Biotechnology Center, Gödöllo Szent-Györgyi Albert u. 4, H-2100, Hungary.
| | | | | | | |
Collapse
|
109
|
Wang W, Zheng H, Fan C, Li J, Shi J, Cai Z, Zhang G, Liu D, Zhang J, Vang S, Lu Z, Wong GKS, Long M, Wang J. High rate of chimeric gene origination by retroposition in plant genomes. THE PLANT CELL 2006; 18:1791-802. [PMID: 16829590 PMCID: PMC1533979 DOI: 10.1105/tpc.106.041905] [Citation(s) in RCA: 147] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2006] [Revised: 04/15/2006] [Accepted: 06/08/2006] [Indexed: 05/10/2023]
Abstract
Retroposition is widely found to play essential roles in origination of new mammalian and other animal genes. However, the scarcity of retrogenes in plants has led to the assumption that plant genomes rarely evolve new gene duplicates by retroposition, despite abundant retrotransposons in plants and a reported long terminal repeat (LTR) retrotransposon-mediated mechanism of retroposing cellular genes in maize (Zea mays). We show extensive retropositions in the rice (Oryza sativa) genome, with 1235 identified primary retrogenes. We identified 27 of these primary retrogenes within LTR retrotransposons, confirming a previously observed role of retroelements in generating plant retrogenes. Substitution analyses revealed that the vast majority are subject to negative selection, suggesting, along with expression data and evidence of age, that they are likely functional retrogenes. In addition, 42% of these retrosequences have recruited new exons from flanking regions, generating a large number of chimerical genes. We also identified young chimerical genes, suggesting that gene origination through retroposition is ongoing, with a rate an order of magnitude higher than the rate in primates. Finally, we observed that retropositions have followed an unexpected spatial pattern in which functional retrogenes avoid centromeric regions, while retropseudogenes are randomly distributed. These observations suggest that retroposition is an important mechanism that governs gene evolution in rice and other grass species.
Collapse
Affiliation(s)
- Wen Wang
- CAS-Max-Plank Junior Research Group, Key Laboratory of Cellular and Molecular Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
110
|
Umetani N, Kim J, Hiramatsu S, Reber HA, Hines OJ, Bilchik AJ, Hoon DSB. Increased integrity of free circulating DNA in sera of patients with colorectal or periampullary cancer: direct quantitative PCR for ALU repeats. Clin Chem 2006; 52:1062-9. [PMID: 16723681 DOI: 10.1373/clinchem.2006.068577] [Citation(s) in RCA: 242] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
BACKGROUND Cell-free DNA circulating in blood is a candidate biomarker for malignant tumors. Unlike uniformly truncated DNA released from apoptotic nondiseased cells, DNA released from dead cancer cells varies in size. We developed a novel method to measure the ratio of longer to shorter DNA fragments (DNA integrity) in serum as a potential biomarker for patients with colorectal cancer (CRC) or periampullary cancers (PACs). METHODS Sera from 32 patients with CRC (3 stage I, 14 stage II, 6 stage III, and 9 stage IV patients), 19 patients with PACs (2 stage I, 9 stage II, 1 stage III, and 7 stage IV patients), and 51 healthy volunteers were assessed by quantitative real-time PCR of ALU repeats (ALU-qPCR) with 2 sets of primers (115 and 247 bp) amplifying different lengths of DNA. We used serum directly as a template for ALU-qPCR without DNA purification. DNA integrity was determined as ratio of qPCR results of 247-bp ALU over 115-bp ALU. RESULTS ALU-qPCR had a detection limit of 0.01 pg of DNA. Eliminating DNA purification reduced technical artifacts and reagent/labor costs. Serum DNA integrity was significantly increased for stage I/II and III/IV CRC and stage I/II and III/IV PACs (P = 0.002, P = 0.006, P = 0.022, and P <0.0001, respectively). ROC curves for detecting CRC and PACs had areas under the curves of 0.78 and 0.80, respectively. CONCLUSIONS Direct ALU-qPCR is a robust, highly sensitive, and high-throughput method to measure serum DNA integrity. DNA integrity is a potential serum biomarker for detection and evaluation of CRC and PACs.
Collapse
Affiliation(s)
- Naoyuki Umetani
- Department of Molecular Oncology and Division of Surgical Oncology, John Wayne Cancer Institute, Santa Monica, CA 90404, USA
| | | | | | | | | | | | | |
Collapse
|
111
|
Saba L, Bhave SV, Grahame N, Bice P, Lapadat R, Belknap J, Hoffman PL, Tabakoff B. Candidate genes and their regulatory elements: alcohol preference and tolerance. Mamm Genome 2006; 17:669-88. [PMID: 16783646 DOI: 10.1007/s00335-005-0190-0] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2005] [Accepted: 03/14/2006] [Indexed: 01/10/2023]
Abstract
QTL analysis of behavioral traits and mouse brain gene expression studies were combined to identify candidate genes involved in the traits of alcohol preference and acute functional alcohol tolerance. The systematic application of normalization and statistical analysis of differential gene expression, behavioral and expression QTL location, and informatics methodologies resulted in identification of 8 candidate genes for the trait of alcohol preference and 22 candidate genes for acute functional tolerance. Pathway analysis, combined with clustering by ontology, indicated the importance of transcriptional regulation and DNA and protein binding elements in the acute functional tolerance trait, and protein kinases and intracellular signal transduction elements in the alcohol preference trait. A rudimentary search for transcription control elements that could indicate coregulation of the panels of candidate genes produced modest results, implicating SMAD-3 in the regulation of four of the eight candidate genes for alcohol preference. However, the realization of the many caveats related to transcription factor binding site analysis, and attempts to correlate between transcription factor binding and function, forestalled any definitive global analysis of transcriptional control of differentially expressed candidate genes.
Collapse
Affiliation(s)
- Laura Saba
- Department of Pharmacology, University of Colorado at Denver and Health Sciences Center, 12801 East 17th Avenue, Aurora, CO 80045, USA
| | | | | | | | | | | | | | | |
Collapse
|
112
|
Hutter B, Helms V, Paulsen M. Tandem repeats in the CpG islands of imprinted genes. Genomics 2006; 88:323-32. [PMID: 16690248 DOI: 10.1016/j.ygeno.2006.03.019] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2005] [Revised: 03/24/2006] [Accepted: 03/30/2006] [Indexed: 11/26/2022]
Abstract
In contrast to most genes in mammalian genomes, imprinted genes are monoallelically expressed depending on the parental origin of the alleles. Imprinted gene expression is regulated by distinct DNA elements that exhibit allele-specific epigenetic modifications, such as DNA methylation. These so-called differentially methylated regions frequently overlap with CpG islands. Thus, CpG islands of imprinted genes may contain special DNA elements that distinguish them from CpG islands of biallelically expressed genes. Here, we present a detailed study of CpG islands of imprinted genes in mouse and in human. Our study shows that imprinted genes more frequently contain tandem repeat arrays in their CpG islands than randomly selected genes in both species. In addition, mouse imprinted genes more frequently possess intragenic CpG islands that may serve as promoters of allele-specific antisense transcripts. This feature is much less pronounced in human, indicating an interspecies variability in the evolution of imprinting control elements.
Collapse
Affiliation(s)
- Barbara Hutter
- Bioinformatik, FR 8.3 Biowissenschaften, Universität des Saarlandes, Postfach 151150, D-66041 Saarbrücken, Germany
| | | | | |
Collapse
|
113
|
Xu JH, Osawa I, Tsuchimoto S, Ohtsubo E, Ohtsubo H. Two new SINE elements, p-SINE2 and p-SINE3, from rice. Genes Genet Syst 2006; 80:161-71. [PMID: 16172529 DOI: 10.1266/ggs.80.161] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
p-SINE1 was the first plant SINE element identified in the Waxy gene in Oryza sativa, and since then a large number of p-SINE1-family members have been identified from rice species with the AA or non-AA genome. In this paper, we report two new rice SINE elements, designated p-SINE2 and p-SINE3, which form distinct families from that of p-SINE1. Each of the two new elements is significantly homologous to p-SINE1 in their 5'-end regions with that of the polymerase III promoter (A box and B box), but not significantly homologous in the 3'-end regions, although they all have a T-rich tail at the 3' terminus. Despite the three elements sharing minimal homology in their 3'-end regions, the deduced RNA secondary structures of p-SINE1, p-SINE2 and p-SINE3 were found to be similar to one another, such that a stem-loop structure seen in the 3'-end region of each element is well conserved, suggesting that the structure has an important role on the p-SINE retroposition. These findings suggest that the three p-SINE elements originated from a common ancestor. Similar to members of the p-SINE1 family, the members of p-SINE2 or p-SINE3 are almost randomly dispersed in each of the 12 rice chromosomes, but appear to be preferentially inserted into gene-rich regions. The p-SINE2 members were present at respective loci not only in the strains of the species with the AA genome in the O. sativa complex, but also in those of other species with the BB, CC, DD, or EE genome in the O. officinalis complex. The p-SINE3 members were, however, only present in strains of species in the O. sativa complex. These findings suggest that p-SINE2 originated in an ancestral species with the AA, BB, CC, DD and EE genomes, like p-SINE1, whereas p-SINE3 originated in an ancestral strain of the species with the AA genome. The nucleotide sequences of p-SINE1 members are more divergent than those of p-SINE2 or p-SINE3, indicating that p-SINE1 is likely to be older than p-SINE2 and p-SINE3. This suggests that p-SINE2 and p-SINE3 have been derived from p-SINE1.
Collapse
Affiliation(s)
- Jian-Hong Xu
- Institute of Molecular and Cellular Biosciences, the University of Tokyo, Japan
| | | | | | | | | |
Collapse
|
114
|
Durand D, Hoberman R. Diagnosing duplications – can it be done? Trends Genet 2006; 22:156-64. [PMID: 16442663 DOI: 10.1016/j.tig.2006.01.002] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2005] [Revised: 11/30/2005] [Accepted: 01/11/2006] [Indexed: 01/10/2023]
Abstract
New genes arise through duplication and modification of DNA sequences on a range of scales: single gene duplication, duplication of large chromosomal fragments and whole-genome duplication. Each duplication mechanism has specific characteristics that influence the fate of the resulting duplicates, such as the size of the duplicated fragment, the potential for dosage imbalance, the preservation or disruption of regulatory control and genomic context. The ability to diagnose or identify the mechanism that produced a pair of paralogs has the potential to increase our ability to reconstruct evolutionary history, to understand the processes that govern genome evolution and to make functional predictions based on paralogy. The recent availability of large amounts of whole-genome sequence, often from several closely related species, has stimulated a wealth of new computational methods to diagnose gene duplications.
Collapse
Affiliation(s)
- Dannie Durand
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
| | | |
Collapse
|
115
|
Slawson EE, Shaffer CD, Malone CD, Leung W, Kellmann E, Shevchek RB, Craig CA, Bloom SM, Bogenpohl J, Dee J, Morimoto ETA, Myoung J, Nett AS, Ozsolak F, Tittiger ME, Zeug A, Pardue ML, Buhler J, Mardis ER, Elgin SCR. Comparison of dot chromosome sequences from D. melanogaster and D. virilis reveals an enrichment of DNA transposon sequences in heterochromatic domains. Genome Biol 2006; 7:R15. [PMID: 16507169 PMCID: PMC1431729 DOI: 10.1186/gb-2006-7-2-r15] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2005] [Revised: 09/15/2005] [Accepted: 01/25/2006] [Indexed: 11/10/2022] Open
Abstract
Sequencing and analysis of fosmid hybridization to the dot chromosomes of Drosophila virilis and D. melanogaster suggest that repetitive elements and density are important in determining higher-order chromatin packaging. Background Chromosome four of Drosophila melanogaster, known as the dot chromosome, is largely heterochromatic, as shown by immunofluorescent staining with antibodies to heterochromatin protein 1 (HP1) and histone H3K9me. In contrast, the absence of HP1 and H3K9me from the dot chromosome in D. virilis suggests that this region is euchromatic. D. virilis diverged from D. melanogaster 40 to 60 million years ago. Results Here we describe finished sequencing and analysis of 11 fosmids hybridizing to the dot chromosome of D. virilis (372,650 base-pairs) and seven fosmids from major euchromatic chromosome arms (273,110 base-pairs). Most genes from the dot chromosome of D. melanogaster remain on the dot chromosome in D. virilis, but many inversions have occurred. The dot chromosomes of both species are similar to the major chromosome arms in gene density and coding density, but the dot chromosome genes of both species have larger introns. The D. virilis dot chromosome fosmids have a high repeat density (22.8%), similar to homologous regions of D. melanogaster (26.5%). There are, however, major differences in the representation of repetitive elements. Remnants of DNA transposons make up only 6.3% of the D. virilis dot chromosome fosmids, but 18.4% of the homologous regions from D. melanogaster; DINE-1 and 1360 elements are particularly enriched in D. melanogaster. Euchromatic domains on the major chromosomes in both species have very few DNA transposons (less than 0.4 %). Conclusion Combining these results with recent findings about RNAi, we suggest that specific repetitive elements, as well as density, play a role in determining higher-order chromatin packaging.
Collapse
Affiliation(s)
| | | | - Colin D Malone
- Biology Department, Washington University, St Louis, MO 63130, USA
| | - Wilson Leung
- Biology Department, Washington University, St Louis, MO 63130, USA
| | - Elmer Kellmann
- Biology Department, Washington University, St Louis, MO 63130, USA
| | | | - Carolyn A Craig
- Biology Department, Washington University, St Louis, MO 63130, USA
| | - Seth M Bloom
- Member, Bio 4342 class, Washington University, St Louis, MO 63130, USA
| | - James Bogenpohl
- Member, Bio 4342 class, Washington University, St Louis, MO 63130, USA
| | - James Dee
- Member, Bio 4342 class, Washington University, St Louis, MO 63130, USA
| | - Emiko TA Morimoto
- Member, Bio 4342 class, Washington University, St Louis, MO 63130, USA
| | - Jenny Myoung
- Member, Bio 4342 class, Washington University, St Louis, MO 63130, USA
| | - Andrew S Nett
- Member, Bio 4342 class, Washington University, St Louis, MO 63130, USA
| | - Fatih Ozsolak
- Member, Bio 4342 class, Washington University, St Louis, MO 63130, USA
| | - Mindy E Tittiger
- Member, Bio 4342 class, Washington University, St Louis, MO 63130, USA
| | - Andrea Zeug
- Member, Bio 4342 class, Washington University, St Louis, MO 63130, USA
| | - Mary-Lou Pardue
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Jeremy Buhler
- Computer Science and Engineering, Washington University, St Louis, MO 63130, USA
| | - Elaine R Mardis
- Genome Sequencing Center and Department of Genetics, Washington University, St Louis, MO 63108, USA
| | - Sarah CR Elgin
- Biology Department, Washington University, St Louis, MO 63130, USA
| |
Collapse
|
116
|
Cao Y, Tung WW, Gao JB. Recurrence time statistics: versatile tools for genomic DNA sequence analysis. PROCEEDINGS. IEEE COMPUTATIONAL SYSTEMS BIOINFORMATICS CONFERENCE 2006:40-51. [PMID: 16447998 DOI: 10.1109/csb.2004.1332415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.
Collapse
|
117
|
Zhi D, Raphael BJ, Price AL, Tang H, Pevzner PA. Identifying repeat domains in large genomes. Genome Biol 2006; 7:R7. [PMID: 16507140 PMCID: PMC1431705 DOI: 10.1186/gb-2006-7-1-r7] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2005] [Revised: 09/26/2005] [Accepted: 01/05/2006] [Indexed: 11/29/2022] Open
Abstract
A graph-based method for the analysis of repeat families in a repeat library is presented that helps elucidating the evolutionary history of repeats. We present a graph-based method for the analysis of repeat families in a repeat library. We build a repeat domain graph that decomposes a repeat library into repeat domains, short subsequences shared by multiple repeat families, and reveals the mosaic structure of repeat families. Our method recovers documented mosaic repeat structures and suggests additional putative ones. Our method is useful for elucidating the evolutionary history of repeats and annotating de novo generated repeat libraries.
Collapse
Affiliation(s)
- Degui Zhi
- Bioinformatics Program, University of California, San Diego, CA 92093-0419, USA
| | - Benjamin J Raphael
- Department of Computer Science and Engineering, University of California, San Diego, CA 92093-0114, USA
| | - Alkes L Price
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Haixu Tang
- School of Informatics and Center for Genomics and Bioinformatics, Indiana University, Bloomington, IN 47408, USA
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, CA 92093-0114, USA
| |
Collapse
|
118
|
Jeffries C, Perkins DO, Jarstfer M. Systematic discovery of the grammar of translational inhibition by RNA hairpins. J Theor Biol 2006; 241:205-15. [PMID: 16403535 DOI: 10.1016/j.jtbi.2005.11.027] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2005] [Revised: 10/16/2005] [Accepted: 11/14/2005] [Indexed: 11/25/2022]
Abstract
Recent discovery of gene expression mechanisms has propelled molecular genetics to a state of rapid development, a state likely to persist due to continuing advances in understanding control systems of fundamental cellular processes. An algorithm for that advancement starts in this paper with a gene of interest and a characteristic function of that gene. The set of all genes with counteracting function is identified by pathway searches. Also associated with the first gene is the set of the genes which byproducts of its transcription might downregulate, identified relative to searches involving sequence alignments. Our focus is the intersection of the counteracting gene set and the downregulated gene set. The result is hypothesis generation. Examples of and predictions from this approach are given in the context of apoptosis. Also discussed is application of the algorithm to rational drug design from a new development platform.
Collapse
Affiliation(s)
- Clark Jeffries
- Renaissance Computing Institute and School of Pharmacy, Campus Box 7360, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7360, USA.
| | | | | |
Collapse
|
119
|
Abstract
MOTIVATION Obtaining high quality alignments of divergent homologous sequences for cross-species sequence comparison remains a challenge. RESULTS We propose a novel pairwise sequence alignment algorithm, ACANA (ACcurate ANchoring Alignment), for aligning biological sequences at both local and global levels. Like many fast heuristic methods, ACANA uses an anchoring strategy. However, unlike others, ACANA uses a Smith-Waterman-like dynamic programming algorithm to recursively identify near-optimal regions as anchors for a global alignment. Performance evaluations using a simulated benchmark dataset and real promoter sequences suggest that ACANA is accurate and consistent, especially for divergent sequences. Specifically, we use a simulated benchmark dataset to show that ACANA has the highest sensitivity to align constrained functional sites compared to BLASTZ, CHAOS and DIALIGN for local alignment and compared to AVID, ClustalW, DIALIGN and LAGAN for global alignment. Applied to 6007 pairs of human-mouse orthologous promoter sequences, ACANA identified the largest number of conserved regions (defined as over 70% identity over 100 bp) compared to AVID, ClustalW, DIALIGN and LAGAN. In addition, the average length of conserved region identified by ACANA was the longest. Thus, we suggest that ACANA is a useful tool for identifying functional elements in cross-species sequence analysis, such as predicting transcription factor binding sites in non-coding DNA. AVAILABILITY ACANA software and test sequence data are publicly available at http://BioMedEmpire.org/
Collapse
Affiliation(s)
- Weichun Huang
- Biostatistics Branch, The National Institute of Environmental Health Sciences/NIH, Research Triangle Park, NC 27709, USA
| | | | | |
Collapse
|
120
|
Greenwood AD, Leib-Mösch C, Seifarth W. Abyss1: a novel L2-like non-LTR retroelement of the snakelocks anemone (Anemonia sulcata). Cytogenet Genome Res 2005; 110:553-8. [PMID: 16093708 DOI: 10.1159/000084988] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2003] [Accepted: 02/11/2004] [Indexed: 11/19/2022] Open
Abstract
Non-LTR retrotransposons are a diverse and taxonomically widely dispersed group of retroelements that can be divided into at least 14 distinguishable clades. Basal metazoans have not been examined in great detail for their retrotransposon content. In order to screen for the presence of reverse transcriptase (RT) related sequences in Cnidaria and Ctenophora, basal phyla of metazoans, PCR with highly degenerate oligonucleotides was performed and an RT-like sequence was identified from the sea anemone species Anemonia sulcata. Further screening identified a related element in another anemone species Actinia equina. Significant homology to non-LTR retrotransposon RTs was observed, particularly to L2-like elements of fish such as Maui. The sequence was not detected among other cnidarians and we have designated the A. sulcata and A. equina elements Abyss1 and Abyss2 respectively. Phylogenetic analysis of Abyss1 compared with members of 14 known non-LTR retroelement clades suggests that the sequence represents a novel L2 element.
Collapse
Affiliation(s)
- A D Greenwood
- Technical University Munich, Institute of Virology, Munich, Germany.
| | | | | |
Collapse
|
121
|
Brosius J. Echoes from the past--are we still in an RNP world? Cytogenet Genome Res 2005; 110:8-24. [PMID: 16093654 DOI: 10.1159/000084934] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2004] [Accepted: 05/04/2004] [Indexed: 11/19/2022] Open
Abstract
Availability of the human genome sequence and those of other species is unmeasured in their value for a comprehensive understanding of the architecture, function and evolution of genomes and cells. Various mechanisms keep genomes in flux and generate intra- and interspecies variation. The conversion of RNA modules into DNA and their more or less random integration into chromosomes (retroposition) is in many lineages including our own the most pervasive and perhaps the most enigmatic. The proclivity of such events in extant multicellular eukaryotes, even in more recent evolutionary times, gives the impression that the transition period from the RNP (ribonucleoprotein) world to the emergence of modern cells, where DNA became the predominant carrier of genetic information, has lasted billions of years and is an endlessly drawn-out process rather than the punctuated event one might expect. Apart from the impact of such RNA-mediated processes as retroposition, the role of RNA in a wide variety of cellular functions has only recently become more widely appreciated.
Collapse
Affiliation(s)
- J Brosius
- Institute of Experimental Pathology, ZMBE, University of Munster, Munster, Germany.
| |
Collapse
|
122
|
Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 2005; 110:462-7. [PMID: 16093699 DOI: 10.1159/000084979] [Citation(s) in RCA: 2376] [Impact Index Per Article: 118.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2003] [Accepted: 04/06/2004] [Indexed: 12/13/2022] Open
Abstract
Repbase Update is a comprehensive database of repetitive elements from diverse eukaryotic organisms. Currently, it contains over 3600 annotated sequences representing different families and subfamilies of repeats, many of which are unreported anywhere else. Each sequence is accompanied by a short description and references to the original contributors. Repbase Update includes Repbase Reports, an electronic journal publishing newly discovered transposable elements, and the Transposon Pub, a web-based browser of selected chromosomal maps of transposable elements. Sequences from Repbase Update are used to screen and annotate repetitive elements using programs such as Censor and RepeatMasker. Repbase Update is available on the worldwide web at http://www.girinst.org/Repbase_Update.html.
Collapse
Affiliation(s)
- J Jurka
- Genetic Information Research Institute, Mountain View, CA 94043, USA.
| | | | | | | | | | | |
Collapse
|
123
|
Lenoir A, Pélissier T, Bousquet-Antonelli C, Deragon JM. Comparative evolution history of SINEs in Arabidopsis thaliana and Brassica oleracea: evidence for a high rate of SINE loss. Cytogenet Genome Res 2005; 110:441-7. [PMID: 16093696 DOI: 10.1159/000084976] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2003] [Accepted: 10/16/2003] [Indexed: 11/19/2022] Open
Abstract
Brassica oleracea and Arabidopsis thaliana belong to the Brassicaceae(Cruciferae) family and diverged 16 to 19 million years ago. Although the genome size of B. oleracea (approximately 600 million base pairs) is more than four times that of A. thaliana (approximately 130 million base pairs), their gene content is believed to be very similar with more than 85% sequence identity in the coding region. Therefore, this important difference in genome size is likely to reflect a different rate of non-coding DNA accumulation. Transposable elements (TEs) constitute a major fraction of non-coding DNA in plant species. A different rate in TE accumulation between two closely related species can result in significant genome size variations in a short evolutionary period. Short interspersed elements (SINEs) are non-autonomous retroposons that have invaded the genome of most eukaryote species. Several SINE families are present in B. oleracea and A. thaliana and we found that two of them (called RathE1 and RathE2) are present in both species. In this study, the tempo of evolution of RathE1 and RathE2 SINE families in both species was compared. We observed that most B. oleracea RathE2 SINEs are "young" (close to the consensus sequence) and abundant while elements from this family are more degenerated and much less abundant in A. thaliana. However, the situation is different for the RathE1 SINE family for which the youngest elements are found in A. thaliana. Surprisingly, no SINE was found to occupy the same (orthologous) genomic locus in both species suggesting that either these SINE families were not amplified at a significant rate in the common ancestor of the two species or that older elements were lost and only the recent (lineage-specific) insertions remain. To test this latter hypothesis, loci containing a recently inserted SINE in the A. thaliana col-0 ecotype were selected and characterized in several other A. thaliana ecotypes. In addition to the expected SINE containing allele and the pre-integrative allele (i.e. the "empty" allele), we observed in the different ecotypes, alleles with truncated portions of the SINE (up to the complete loss of the element) and of the immediate genomic flanking sequences. The absence of SINEs in orthologous positions between B. oleracea and A. thaliana and the presence in recently diverged A. thaliana ecotypes of alleles containing severely truncated SINEs suggest a very high rate of SINE loss in these species.
Collapse
Affiliation(s)
- A Lenoir
- CNRS UMR6547 Biomove, Université Blaise Pascal, Aubière, France
| | | | | | | |
Collapse
|
124
|
Rinehart TA, Grahn RA, Wichman HA. SINE extinction preceded LINE extinction in sigmodontine rodents: implications for retrotranspositional dynamics and mechanisms. Cytogenet Genome Res 2005; 110:416-25. [PMID: 16093694 DOI: 10.1159/000084974] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2004] [Accepted: 03/07/2004] [Indexed: 11/19/2022] Open
Abstract
Short Interspersed Nuclear Elements, or SINEs, retrotranspose despite lacking protein-coding capability. It has been proposed that SINEs utilize enzymes produced in trans by Long Interspersed Nuclear Elements, or LINEs. Strong support for this hypothesis is found in LINE and SINE pairs that share sequence homology; however, LINEs and SINEs in primates and rodents are only linked by an insertion site motif. We have now profiled L1 LINE and B1 SINE activity in 24 rodent species including candidate taxa for the first documented L1 extinction. As expected, there was no evidence for recent activity of B1s in species that also lack L1 activity. However, B1 silencing appears to have preceded L1 extinction, since B1 activity is also lacking in the genus most closely related to those lacking active L1s despite the presence of active L1s in this genus. A second genus with active L1s but inactive B1s was also identified.
Collapse
Affiliation(s)
- T A Rinehart
- Department of Biological Sciences, University of Idaho, Moscow, ID 83844-3051, USA
| | | | | |
Collapse
|
125
|
Cao Y, Tung WW, Gao JB, Qi Y. Recurrence time statistics: versatile tools for genomic DNA sequence analysis. J Bioinform Comput Biol 2005; 3:677-96. [PMID: 16108089 DOI: 10.1142/s0219720005001235] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2004] [Revised: 11/05/2004] [Accepted: 12/10/2004] [Indexed: 11/18/2022]
Abstract
With the completion of the human and a few model organisms' genomes, and with the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time-based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Our method requires approximately 6 . N byte memory and a computational time of N log N to extract all the repeat-related and periodic or quasi-periodic features from a sequence of length N without any prior knowledge on the consensus sequence of those features, hence enables us to carry out sequence analysis on the whole genomic scale by a PC.
Collapse
Affiliation(s)
- Yinhe Cao
- Biosieve, 1026 Springfield Drive, Campbell, CA 95008, USA.
| | | | | | | |
Collapse
|
126
|
Li R, Ye J, Li S, Wang J, Han Y, Ye C, Wang J, Yang H, Yu J, Wong GKS, Wang J. ReAS: Recovery of ancestral sequences for transposable elements from the unassembled reads of a whole genome shotgun. PLoS Comput Biol 2005; 1:e43. [PMID: 16184192 PMCID: PMC1232128 DOI: 10.1371/journal.pcbi.0010043] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2005] [Accepted: 08/23/2005] [Indexed: 01/30/2023] Open
Abstract
We describe an algorithm, ReAS, to recover ancestral sequences for transposable elements (TEs) from the unassembled reads of a whole genome shotgun. The main assumptions are that these TEs must exist at high copy numbers across the genome and must not be so old that they are no longer recognizable in comparison to their ancestral sequences. Tested on the japonica rice genome, ReAS was able to reconstruct all of the high copy sequences in the Repbase repository of known TEs, and increase the effectiveness of RepeatMasker in identifying TEs from genome sequences. Transposable elements (TEs) are a major component of the genomes of multicellular organisms. They are parasitic creatures that invade the genome, insert multiple copies of themselves, and then die. All we see now are the decayed remnants of their ancestral sequences. Reconstruction of these ancestral sequences can bring dead TEs back to life. Algorithms for detecting TEs compare present-day sequences to a library of ancestral sequences. Unknown to many, pervasive use of whole genome shotgun (WGS) methods in large-scale sequencing have made TE reconstructions increasingly problematic. To minimize assembly errors, WGS methods must reject the highly repetitive sequences that characterize most TEs, especially the most recent TEs, which are the least diverged from their ancestral sequences (and most informative for reconstruction). This is acceptable to many, because the most important parts of the genes are not repetitive, but for the TE aficionados, it is a problem. ReAS is a novel algorithm that does TE reconstruction using only the unassembled reads of a WGS. Tested against the WGS for japonica rice, it is shown to produce a library that is superior to the manually curated Repbase database of known ancestral TEs.
Collapse
Affiliation(s)
- Ruiqiang Li
- James D. Watson Institute of Genome Sciences of Zhejiang University, Hangzhou, China
- Beijing Institute of Genomics of Chinese Academy of Sciences, Beijing Genomics Institute, Beijing, China
| | - Jia Ye
- James D. Watson Institute of Genome Sciences of Zhejiang University, Hangzhou, China
- Beijing Institute of Genomics of Chinese Academy of Sciences, Beijing Genomics Institute, Beijing, China
| | - Songgang Li
- Beijing Institute of Genomics of Chinese Academy of Sciences, Beijing Genomics Institute, Beijing, China
- College of Life Sciences, Peking University, Beijing, China
| | - Jing Wang
- Beijing Institute of Genomics of Chinese Academy of Sciences, Beijing Genomics Institute, Beijing, China
| | - Yujun Han
- Beijing Institute of Genomics of Chinese Academy of Sciences, Beijing Genomics Institute, Beijing, China
| | - Chen Ye
- Beijing Institute of Genomics of Chinese Academy of Sciences, Beijing Genomics Institute, Beijing, China
| | - Jian Wang
- James D. Watson Institute of Genome Sciences of Zhejiang University, Hangzhou, China
- Beijing Institute of Genomics of Chinese Academy of Sciences, Beijing Genomics Institute, Beijing, China
| | - Huanming Yang
- James D. Watson Institute of Genome Sciences of Zhejiang University, Hangzhou, China
- Beijing Institute of Genomics of Chinese Academy of Sciences, Beijing Genomics Institute, Beijing, China
| | - Jun Yu
- James D. Watson Institute of Genome Sciences of Zhejiang University, Hangzhou, China
- Beijing Institute of Genomics of Chinese Academy of Sciences, Beijing Genomics Institute, Beijing, China
| | - Gane Ka-Shu Wong
- James D. Watson Institute of Genome Sciences of Zhejiang University, Hangzhou, China
- Beijing Institute of Genomics of Chinese Academy of Sciences, Beijing Genomics Institute, Beijing, China
- UW Genome Center, Department of Medicine, University of Washington, Seattle, Washington, United States of America
- * To whom correspondence should be addressed. E-mail: (GKW), (JW)
| | - Jun Wang
- James D. Watson Institute of Genome Sciences of Zhejiang University, Hangzhou, China
- Beijing Institute of Genomics of Chinese Academy of Sciences, Beijing Genomics Institute, Beijing, China
- The Institute of Human Genetics, University of Aarhus, Aarhus, Denmark
- Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense M, Denmark
- * To whom correspondence should be addressed. E-mail: (GKW), (JW)
| |
Collapse
|
127
|
Claverie-Martín F, Flores C, Antón-Gamero M, González-Acosta H, García-Nieto V. The Alu insertion in the CLCN5 gene of a patient with Dent's disease leads to exon 11 skipping. J Hum Genet 2005; 50:370-374. [PMID: 16041495 DOI: 10.1007/s10038-005-0265-5] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2005] [Accepted: 06/01/2005] [Indexed: 11/29/2022]
Abstract
Alu sequences are short, interspersed elements that have generated more than one million copies in the human genome. They propagate by transcription followed by reverse transcription and integration, causing mutations, recombination, and changes in pre-mRNA splicing. We have recently identified a 345-bp long Alu Ya5 element inserted in codon 650 within exon 11 of the chloride channel ClC-5 gene (CLCN5) of a patient with Dent's disease. A microsatellite pedigree analysis indicated that the insertion occurred in the germline of the maternal grandfather. Dent's disease is an X-linked renal tubular disorder characterized by low-molecular-weight proteinuria, hypercalciuria, nephrolithiasis, and nephrocalcinosis. Here, we found, by RT-PCR amplification of RNA extracted from the patient's blood and subsequent DNA sequencing, that the Alu insertion led to an aberrant splicing of the CLCN5 pre-mRNA that skipped exon 11. Using the ESE finder and RESCUE-ESE Web interfaces, we identified two high-score exonic splicing enhancer (ESE) sequences in the site of insertion. The functional significance of these ESE motifs is suggested by our observation that these sequences are highly conserved among mammal CLCN5 genes. Therefore, we suggest that the Alu insertion causes exon skipping by interfering with splicing regulatory elements. The altered splicing would predict a truncated ClC-5 protein that lacks critical domains for sorting and chloride channel function.
Collapse
Affiliation(s)
- Félix Claverie-Martín
- Unidad de Investigación, Asociada al Centro de Investigaciones Biológicas, CSIC, Hospital Universitario Nuestra Señora de Candelaria, 38010, Santa Cruz de Tenerife, Spain.
| | - Carlos Flores
- Unidad de Investigación, Asociada al Centro de Investigaciones Biológicas, CSIC, Hospital Universitario Nuestra Señora de Candelaria, 38010, Santa Cruz de Tenerife, Spain
| | | | - Hilaria González-Acosta
- Unidad de Investigación, Asociada al Centro de Investigaciones Biológicas, CSIC, Hospital Universitario Nuestra Señora de Candelaria, 38010, Santa Cruz de Tenerife, Spain
| | - Víctor García-Nieto
- Unidad de Nefrología Pediátrica, Hospital Universitario Nuestra Señora de Candelaria, 38010, Santa Cruz de Tenerife, Spain
| |
Collapse
|
128
|
Bashir A, Ye C, Price AL, Bafna V. Orthologous repeats and mammalian phylogenetic inference. Genome Res 2005; 15:998-1006. [PMID: 15998912 PMCID: PMC1172044 DOI: 10.1101/gr.3493405] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2004] [Accepted: 05/03/2005] [Indexed: 11/25/2022]
Abstract
Determining phylogenetic relationships between species is a difficult problem, and many phylogenetic relationships remain unresolved, even among eutherian mammals. Repetitive elements provide excellent markers for phylogenetic analysis, because their mode of evolution is predominantly homoplasy-free and unidirectional. Historically, phylogenetic studies using repetitive elements have relied on biological methods such as PCR analysis, and computational inference is limited to a few isolated repeats. Here, we present a novel computational method for inferring phylogenetic relationships from partial sequence data using orthologous repeats. We apply our method to reconstructing the phylogeny of 28 mammals, using more than 1000 orthologous repeats obtained from sequence data available from the NISC Comparative Sequencing Program. The resulting phylogeny has robust bootstrap numbers, and broadly matches results from previous studies which were obtained using entirely different data and methods. In addition, we shed light on some of the debatable aspects of the phylogeny. With rapid expansion of available partial sequence data, computational analysis of repetitive elements holds great promise for the future of phylogenetic inference.
Collapse
Affiliation(s)
- Ali Bashir
- Bioinformatics Program, University of California San Diego, La Jolla, California 92093-0114, USA.
| | | | | | | |
Collapse
|
129
|
Stenberg P, Pettersson F, Saura AO, Berglund A, Larsson J. Sequence signature analysis of chromosome identity in three Drosophila species. BMC Bioinformatics 2005; 6:158. [PMID: 15975141 PMCID: PMC1181806 DOI: 10.1186/1471-2105-6-158] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2004] [Accepted: 06/23/2005] [Indexed: 11/30/2022] Open
Abstract
Background All eukaryotic organisms need to distinguish each of their chromosomes. A few protein complexes have been described that recognise entire, specific chromosomes, for instance dosage compensation complexes and the recently discovered autosome-specific Painting of Fourth (POF) protein in Drosophila. However, no sequences have been found that are chromosome-specific and distributed over the entire length of the respective chromosome. Here, we present a new, unbiased, exhaustive computational method that was used to probe three Drosophila genomes for chromosome-specific sequences. Results By combining genome annotations and cytological data with multivariate statistics related to three Drosophila genomes we found sequence signatures that distinguish Muller's F-elements (chromosome 4 in D. melanogaster) from all other chromosomes in Drosophila that are not attributable to differences in nucleotide composition, simple sequence repeats or repeated elements. Based on these signatures we identified complex motifs that are strongly overrepresented in the F-elements and found indications that the D. melanogaster motif may be involved in POF-binding to the F-element. In addition, the X-chromosomes of D. melanogaster and D. yakuba can be distinguished from the other chromosomes, albeit to a lesser extent. Surprisingly, the conservation of the F-element sequence signatures extends not only between species separated by approximately 55 Myr, but also linearly along the sequenced part of the F-elements. Conclusion Our results suggest that chromosome-distinguishing features are not exclusive to the sex chromosomes, but are also present on at least one autosome (the F-element) in Drosophila.
Collapse
Affiliation(s)
| | - Fredrik Pettersson
- Research Group for Chemometrics, Department of Chemistry, Umeå University, Umeå, Sweden
| | - Anja O Saura
- Department of Genetics, University of Helsinki, Helsinki, Finland
| | - Anders Berglund
- Research Group for Chemometrics, Department of Chemistry, Umeå University, Umeå, Sweden
| | | |
Collapse
|
130
|
Galli UM, Sauter M, Lecher B, Maurer S, Herbst H, Roemer K, Mueller-Lantzsch N. Human endogenous retrovirus rec interferes with germ cell development in mice and may cause carcinoma in situ, the predecessor lesion of germ cell tumors. Oncogene 2005; 24:3223-8. [PMID: 15735668 DOI: 10.1038/sj.onc.1208543] [Citation(s) in RCA: 90] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Germ cell tumors (GCTs) are among the most common malignancies in young men. We have previously documented that patients with GCT frequently produce serum antibodies directed against proteins encoded by human endogenous retrovirus (HERV) type K sequences. Transcripts originating from the env gene of HERV-K, including the rec-relative of human immunodeficiency virus rev, are highly expressed in GCTs. We report here that mice that inducibly express HERV-K rec show a disturbed germ cell development and may exhibit, by 19 months of age, changes reminiscent of carcinoma in situ, the predecessor lesion of classic seminoma in humans. This provides the first direct evidence that the expression of a human endogenous retroviral gene previously established as a marker in human germ cell tumors may contribute to organ-specific tumorigenesis in a transgenic mouse model.
Collapse
Affiliation(s)
- Uwe M Galli
- Department of Virology, Bldg. 47, University of Saarland Medical School, 66421 Homburg/Saar, Germany
| | | | | | | | | | | | | |
Collapse
|
131
|
Abstract
BACKGROUND Human primordial follicles (PFs) or the oocyte-pre-granulosa complex, constitute the earliest and most immature stage of human oogenesis. The factors, signalling networks and the precise role of the oocyte and the pre-granulosa cells in initiating growth and recruitment from this finite resting pool remain largely unknown at present. METHODS To obtain a gene resource of this oogenesis stage and thereby determine a molecular blueprint of the human PF, a cDNA library was constructed from 50 isolated human PFs using the phagemid vector pTriplEx2. RESULTS Sequence analysis showed that 46.67% of these clones corresponded to known genes while 29.48% were uncharacterized genes that included hypothetical proteins, human cDNA clones and novel genes. Bioinformatics analysis revealed a preponderance of mitochondrial genes and repeat elements followed by ribosomal proteins, transcription and translation genes. Transcripts for heat shock proteins, cell cycle, embryogenesis genes and apoptosis genes were identified. Members of the ubiquitin-proteasome pathway, MAPK, p38/JNK, GPCR, Wnt, NF-kappaB and notch signalling pathways were identified. A mitochondrial pathway and a transcription factor pathway in the human PF were generated. The gene networks in the transcription factor pathway provided a first glimpse of the balance between proliferation and cell death/apoptosis in this earliest stage of oogenesis. CONCLUSIONS The abundance and diversity of retroviral elements and transcriptional repressor genes in the human PF suggest these could contribute to the maintainance of this oogenesis stage. The role of these genes in initial recruitment and in subsequent oogenesis stages will be greatly facilitated and elucidated by printing a human PF cDNA array of the sequenced clones and using it for gene profiling.
Collapse
Affiliation(s)
- Maria D Serafica
- MISCL (Monash Immunology and Stem Cell Laboratories), Monash University, Wellington Road, Clayton, Victoria, 3800 Australia.
| | | | | |
Collapse
|
132
|
Schneeberger K, Malde K, Coward E, Jonassen I. Masking repeats while clustering ESTs. Nucleic Acids Res 2005; 33:2176-80. [PMID: 15831790 PMCID: PMC1079970 DOI: 10.1093/nar/gki511] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2004] [Revised: 03/10/2005] [Accepted: 03/28/2005] [Indexed: 11/15/2022] Open
Abstract
A problem in EST clustering is the presence of repeat sequences. To avoid false matches, repeats have to be masked. This can be a time-consuming process, and it depends on available repeat libraries. We present a fast and effective method that aims to eliminate the problems repeats cause in the process of clustering. Unlike traditional methods, repeats are inferred directly from the EST data, we do not rely on any external library of known repeats. This makes the method especially suitable for analysing the ESTs from organisms without good repeat libraries. We demonstrate that the result is very similar to performing standard repeat masking before clustering.
Collapse
Affiliation(s)
| | - Ketil Malde
- Department of Informatics, University of BergenBergen, Norway
| | - Eivind Coward
- Department of Informatics, University of BergenBergen, Norway
| | - Inge Jonassen
- Computational Biology Unit, University of BergenBergen, Norway
- Department of Informatics, University of BergenBergen, Norway
| |
Collapse
|
133
|
Robins DM. Multiple mechanisms of male-specific gene expression: lessons from the mouse sex-limited protein (Slp) gene. ACTA ACUST UNITED AC 2005; 78:1-36. [PMID: 15210327 DOI: 10.1016/s0079-6603(04)78001-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Affiliation(s)
- Diane M Robins
- Department of Human Genetics, University of Michigan, Ann Arbor, Michigan 48109-0618, USA
| |
Collapse
|
134
|
Zhang P, Min W, Li WH. Different age distribution patterns of human, nematode, and Arabidopsis duplicate genes. Gene 2004; 342:263-8. [PMID: 15527985 DOI: 10.1016/j.gene.2004.08.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2004] [Revised: 08/03/2004] [Accepted: 08/09/2004] [Indexed: 11/28/2022]
Abstract
We studied the age distribution of duplicate genes in each of four eukaryotic genomes: human, Arabidopsis thaliana, Caenorhabditis elegans, and Drosophila melanogaster. The four distributions differ greatly from each other, contrary to the previous proposal of a universal L-shaped distribution in all eukaryotic genomes studied. Indeed, only the distribution in humans is L-shaped. The distribution in Arabidopsis is consistent with the hypothesis of an ancient genome duplication with no recent burst of duplication events, while the distribution in C. elegans is nearly uniform. We also applied a nonparametric method to the human distribution to show that the rate of loss of duplicate genes decreases over time, contrary to the proposal of an exponential decay. One possible explanation of the decreasing rate of loss of duplicate genes over time could be rapid functional divergence between duplicate genes, providing an advantage for the retention of both duplicates.
Collapse
Affiliation(s)
- Peng Zhang
- Department of Ecology and Evolution, University of Chicago, 1101 East 57th Street, Chicago, IL 60637, USA
| | | | | |
Collapse
|
135
|
Price AL, Eskin E, Pevzner PA. Whole-genome analysis of Alu repeat elements reveals complex evolutionary history. Genome Res 2004; 14:2245-52. [PMID: 15520288 PMCID: PMC525682 DOI: 10.1101/gr.2693004] [Citation(s) in RCA: 164] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2004] [Accepted: 08/14/2004] [Indexed: 11/24/2022]
Abstract
Alu repeats are the most abundant family of repeats in the human genome, with over 1 million copies comprising 10% of the genome. They have been implicated in human genetic disease and in the enrichment of gene-rich segmental duplications in the human genome, and they form a rich fossil record of primate and human history. Alu repeat elements are believed to have arisen from the replication of a small number of source elements, whose evolution over time gives rise to the 31 Alu subfamilies currently reported in Repbase Update. We apply a novel method to identify and statistically validate 213 Alu subfamilies. We build an evolutionary tree of these subfamilies and conclude that the history of Alu evolution is more complex than previous studies had indicated.
Collapse
Affiliation(s)
- Alkes L Price
- Department of Computer Science and Engineering, University of California-San Diego, La Jolla, California 92093-0114, USA.
| | | | | |
Collapse
|
136
|
Schmitt-Wrede HP, Koewius H, Tschuschke S, Greven H, Wunderlich F. Genomic organization of the cadmium-inducible tandem repeat 25-kDa metallothionein of the oligochaete worm Enchytraeus buchholzi. ACTA ACUST UNITED AC 2004; 1680:24-33. [PMID: 15451169 DOI: 10.1016/j.bbaexp.2004.08.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2003] [Revised: 07/27/2004] [Accepted: 08/26/2004] [Indexed: 11/25/2022]
Abstract
The terrestric oligochaete worm Enchytraeus buchholzi survives in cadmium (Cd)-polluted environments by aid of its Cd-inducible 25 kDa cysteine-rich protein (CRP). Here, we analyze promoter and structure of the crp gene and compare its relationship to MT genes. The crp gene, approximately 12 kbp long, consists of 10 exons with exons 2 to 9 encoding eight almost identical repeats of predominantly 31 amino acids of the CRP. The introns of the crp gene contain various repetitive elements including retrotransposon-like sequences. The 683-bp promoter of the non-constitutive crp gene exhibits a much higher basal activity than the mouse MT-II promoter in HepG2 cells. Essential for crp promoter activity is the distal region (-683/-521) with a GC box and the proximal region (-308/-8) with the four MREa, b, c, d and AP-1, -2, -3 elements, whereas the central portion (-521/-309) with CAAT box, CRE and a XRE causes promoter repression. The TATA box-, MREc- and the AP-2, -3-containing region are required for high crp promoter activity. Our data support the view that the crp gene is a unique MT-gene and has evolved by exon duplications from a MT-like ancestral gene.
Collapse
Affiliation(s)
- Hans-Peter Schmitt-Wrede
- Division of Molecular Parasitology and the Centre of Biological and Medical Research, Heinrich-Heine University, Universitätsstr. 1, 40225 Düsseldorf, Germany
| | | | | | | | | |
Collapse
|
137
|
Kim TM, Hong SJ, Rhyu MG. Periodic explosive expansion of human retroelements associated with the evolution of the hominoid primate. J Korean Med Sci 2004; 19:177-85. [PMID: 15082888 PMCID: PMC2822296 DOI: 10.3346/jkms.2004.19.2.177] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Five retroelement families, L1 and L2 (long interspersed nuclear element, LINE), Alu and MIR (short interspersed nuclear element, SINE), and LTR (long terminal repeat), comprise almost half of the human genome. This genome-wide analysis on the time-scaled expansion of retroelements sheds light on the chronologically synchronous amplification peaks of each retroelement family in variable heights across human chromosomes. Especially, L1s and LTRs in the highest density on sex chromosomes Xq and Y, respectively, disclose peak activities that are obscured in autosomes. The periods of young L1, Alu, LTR, and old L1 peak activities calibrated based on sequence divergence coincide with the divergence of the three major hominoid divergence as well as early eutherian radiation while the amplification peaks of old MIR and L2 account for the marsupial-placental split. Overall, the peaks of autonomous LINE (young and old L1s and L2s) peaks and non-autonomous SINE (Alus and MIRs) have alternated repeatedly for 150 million years. In addition, a single burst of LTR parallels the Cretaceous-Tertiary (K-T) boundary, an exceptional global event. These findings suggest that the periodic explosive expansions of LINEs and SINEs and an exceptional burst of LTR comprise the genome dynamics underlying the macroevolution of the hominoid primate lineage.
Collapse
Affiliation(s)
- Tae-Min Kim
- Department of Microbiology, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Seung-Jin Hong
- Department of Microbiology, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Mun-Gan Rhyu
- Department of Microbiology, College of Medicine, The Catholic University of Korea, Seoul, Korea
| |
Collapse
|
138
|
Huang HD, Horng JT, Sun YM, Tsou AP, Huang SL. Identifying transcriptional regulatory sites in the human genome using an integrated system. Nucleic Acids Res 2004; 32:1948-56. [PMID: 15051813 PMCID: PMC390354 DOI: 10.1093/nar/gkh345] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
This work develops an integrated system which, after a set of genes are inputted, is able to predict transcriptional regulatory sites and to detect the co-occurrence of these regulatory sites. The system integrates several site detection methods such as known site matching, over-presented oligonucleotide detection and DNA motif discovery programs. User profiles and history pages enable users to trace the sequence analyses of these transcriptional regulatory sites. Two groups of co-regulated genes were used to test the proposed system. The results predicted by the proposed system consist of known site homologs and putative regulatory sites. By comparing these sites with previously published results, the proposed system is able to help biologists identify possible candidates for the regulatory sites from groups of co-regulated genes. The integrated system is now available at http://rgsminer. csie.ncu.edu.tw/.
Collapse
Affiliation(s)
- Hsien-Da Huang
- Department of Biological Science and Technology and Institute of Bioinformatics, National Chiao-Tung University, Hsin-Chu 300, Taiwan
| | | | | | | | | |
Collapse
|
139
|
Zhao W, Wang J, He X, Huang X, Jiao Y, Dai M, Wei S, Fu J, Chen Y, Ren X, Zhang Y, Ni P, Zhang J, Li S, Wang J, Wong GKS, Zhao H, Yu J, Yang H, Wang J. BGI-RIS: an integrated information resource and comparative analysis workbench for rice genomics. Nucleic Acids Res 2004; 32:D377-82. [PMID: 14681438 PMCID: PMC308819 DOI: 10.1093/nar/gkh085] [Citation(s) in RCA: 103] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Rice is a major food staple for the world's population and serves as a model species in cereal genome research. The Beijing Genomics Institute (BGI) has long been devoting itself to sequencing, information analysis and biological research of the rice and other crop genomes. In order to facilitate the application of the rice genomic information and to provide a foundation for functional and evolutionary studies of other important cereal crops, we implemented our Rice Information System (BGI-RIS), the most up-to-date integrated information resource as well as a workbench for comparative genomic analysis. In addition to comprehensive data from Oryza sativa L. ssp. indica sequenced by BGI, BGI-RIS also hosts carefully curated genome information from Oryza sativa L. ssp. japonica and EST sequences available from other cereal crops. In this resource, sequence contigs of indica (93-11) have been further assembled into Mbp-sized scaffolds and anchored onto the rice chromosomes referenced to physical/genetic markers, cDNAs and BAC-end sequences. We have annotated the rice genomes for gene content, repetitive elements, gene duplications (tandem and segmental) and single nucleotide polymorphisms between rice subspecies. Designed as a basic platform, BGI-RIS presents the sequenced genomes and related information in systematic and graphical ways for the convenience of in-depth comparative studies (http://rise.genomics.org.cn/).
Collapse
Affiliation(s)
- Wenming Zhao
- Beijing Genomics Institute (BGI), Chinese Academy of Sciences (CAS), Beijing Airport Industrial Zone-B6, Beijing 101300, China
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
140
|
Wickstead B, Ersfeld K, Gull K. Repetitive elements in genomes of parasitic protozoa. Microbiol Mol Biol Rev 2003; 67:360-75, table of contents. [PMID: 12966140 PMCID: PMC193867 DOI: 10.1128/mmbr.67.3.360-375.2003] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Repetitive DNA elements have been a part of the genomic fauna of eukaryotes perhaps since their very beginnings. Millions of years of coevolution have given repeats central roles in chromosome maintenance and genetic modulation. Here we review the genomes of parasitic protozoa in the context of the current understanding of repetitive elements. Particular reference is made to repeats in five medically important species with ongoing or completed genome sequencing projects: Plasmodium falciparum, Leishmania major, Trypanosoma brucei, Trypanosoma cruzi, and Giardia lamblia. These organisms are used to illustrate five thematic classes of repeats with different structures and genomic locations. We discuss how these repeat classes may interact with parasitic life-style and also how they can be used as experimental tools. The story which emerges is one of opportunism and upheaval which have been employed to add genetic diversity and genomic flexibility.
Collapse
Affiliation(s)
- Bill Wickstead
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, United Kingdom
| | | | | |
Collapse
|
141
|
Lee WJ, Kwun HJ, Jang KL. Analysis of transcriptional regulatory sequences in the human endogenous retrovirus W long terminal repeat. J Gen Virol 2003; 84:2229-2235. [PMID: 12867655 DOI: 10.1099/vir.0.19076-0] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The U3 region of the human endogenous retrovirus W long terminal repeat (HERV-W LTR) contains several putative regulatory sequences that might not only regulate transcription of viral genes but also influence the expression of neighbouring cellular genes. In this study, we analysed the U3 region in detail in order to understand the transcriptional regulatory mechanism of HERV-W. Two transcription factor (TF) binding sites for Oct-1 and C/EBP were important as a silencer and an enhancer, respectively, for transcriptional regulation. Furthermore, it was possible to divide the HERV-W LTR isolates into two groups depending on their promoter strength, which might be determined by the integrity of the two TF binding sites. However, neither the Oct-1 binding site nor the CAAT-box was required for the cell type-specific activity of the HERV-W LTR. Instead, the 3' terminus of U3 from 191 to 260, which includes a TATA box, was sufficient for specificity, suggesting that the efficiency of assembly of basic transcription machinery at the TATA box of HERV-W LTR might determine the cell type specificity.
Collapse
Affiliation(s)
- Woo Jung Lee
- Department of Microbiology, College of Natural Sciences, Pusan National University, Pusan 609-735, South Korea
| | - Hyun Jin Kwun
- Department of Microbiology, College of Natural Sciences, Pusan National University, Pusan 609-735, South Korea
| | - Kyung Lib Jang
- Department of Microbiology, College of Natural Sciences, Pusan National University, Pusan 609-735, South Korea
| |
Collapse
|
142
|
Brosius J. Gene duplication and other evolutionary strategies: from the RNA world to the future. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2003; 3:1-17. [PMID: 12836680 DOI: 10.1023/a:1022627311114] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Beginning with a hypothetical RNA world, it is apparent that many evolutionary transitions led to the complexity of extant species. The duplication of genetic material is rooted in the RNA world. One of two major routes of gene amplification, retroposition, originated from mechanisms that facilitated the transition to DNA as hereditary material. Even in modern genomes the process of retroposition leads to genetic novelties including the duplication of protein and RNA coding genes, as well as regulatory elements and their juxtapositon. We examine whether and to what extent known evolutionary principles can be applied to an RNA-based world. We conclude that the major basic Neo-Darwinian principles that include amplification, variation and selection already governed evolution in the RNA and RNP worlds. In this hypothetical RNA world there were few restrictions on the exchange of genetic material and principles that acted as borders at later stages, such as Weismann's Barrier, the Central Dogma of Molecular Biology, or the Darwinian Threshold were absent or rudimentary. RNA was more than a gene: it had a dual role harboring, genotypic and phenotypic capabilities, often in the same molecule. Nuons, any discrete nucleic acid sequences, were selected on an individual basis as well as in groups. The performance and success of an individual nuon was markedly dependent on the type of other nuons in a given cell. In the RNA world the transition may already have begun towards the linkage of nuons to yield a composite linear RNA genome, an arrangement necessitating the origin of RNA processing. A concatenated genome may have curbed unlimited exchange of genetic material; concomitantly, selfish nuons were more difficult to purge. A linked genome may also have constituted the beginning of the phenotype/genotype separation. This division of tasks was expanded when templated protein biosynthesis led to the RNP world, and more so when DNA took over as genetic material. The aforementioned barriers and thresholds increased and the significance and extent of horizontal gene transfer fluctuated over major evolutionary transitions. At the dawn of the most recent transformation, a fast evolutionary transition that we will be witnessing in our life times, a form of Lamarckism is raising its head.
Collapse
Affiliation(s)
- Jürgen Brosius
- Institute of Experimental Pathology, Center for Molecular Biology of Inflammation, University of Münster, Von-Esmarch-Str. 56, D-48149 Münster, Germany.
| |
Collapse
|
143
|
Bartolucci S, Rossi M, Cannio R. Characterization and functional complementation of a nonlethal deletion in the chromosome of a beta-glycosidase mutant of Sulfolobus solfataricus. J Bacteriol 2003; 185:3948-57. [PMID: 12813089 PMCID: PMC161586 DOI: 10.1128/jb.185.13.3948-3957.2003] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
LacS(-) mutants of Sulfolobus solfataricus defective in beta-glycosidase activity were isolated in order to explore genomic instability and exploit novel strategies for transformation and complementation. One of the mutants showed a stable phenotype with no reversion; analysis of its chromosome revealed the total absence of the beta-glycosidase gene (lacS). Fine mapping performed in comparison to the genomic sequence of S. solfataricus P2 indicated an extended deletion of approximately 13 kb. The sequence analysis also revealed that this chromosomal rearrangement was a nonconservative transposition event driven by the mobile insertion sequence element ISC1058. In order to complement the LacS(-) phenotype, an expression vector was constructed by inserting the lacS coding sequence with its 5' and 3' flanking regions into the pEXSs plasmid. Since no transformant could be recovered by selection on lactose as the sole nutrient, another plasmid construct containing a larger genomic fragment was tested for complementation; this region also comprised the lacTr (lactose transporter) gene encoding a putative membrane protein homologous to the major facilitator superfamily. Cells transformed with both genes were able to form colonies on lactose plates and to be stained with the beta-glycosidase chromogenic substrate X-Gal (5-bromo-4-chloro-3-indoyl-beta-D-galactopyranoside).
Collapse
Affiliation(s)
- Simonetta Bartolucci
- Dipartimento di Chimica Biologica, Università degli Studi di Napoli Federico II, Naples, Italy
| | | | | |
Collapse
|
144
|
Horng JT, Huang HD, Jin MH, Wu LC, Huang SL. The repetitive sequence database and mining putative regulatory elements in gene promoter regions. J Comput Biol 2003; 9:621-40. [PMID: 12323097 DOI: 10.1089/106652702760277354] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
At least 43% of the human genome is occupied by repetitive elements. Moreover, around 51% of the rice genome is occupied by repetitive elements. The analysis of repetitive elements reveals that repetitive elements in our genome may have been very important in the evolutionary genomics. The first part of this study is to describe a database of repetitive elements - RSDB. The RSDB database contains repetitive elements, which are classified into the following categories: exact, tandem, and similar. The interfaces needed to query and show the results and statistical data, such as the relationship between repetitive elements and genes, cross-references of repetitive elements among different organisms, and so on, are provided. The second part of this study then attempts to mine the putative binding site for information on how combinations of the known regulatory sites and overrepresented repetitive elements in RSDB are distributed in the promoter regions of groups of functionally related genes. The overrepresented repetitive elements appearing in the associations are possible transcription factor binding sites. Our proposed approach is applied to Saccharomyces cerevisiae and the promoter regions of Yeast ORFs. The complete contents of RSDB and partial putative binding sites are available to the public at www.rsdb.csie.ncu.edu.tw. The readers may download partial query results.
Collapse
Affiliation(s)
- Jorng-Tzong Horng
- Department of Computer Science and Information Engineering, National Central University, Taiwan.
| | | | | | | | | |
Collapse
|
145
|
Zearfoss NR, Chan AP, Kloc M, Allen LH, Etkin LD. Identification of new Xlsirt family members in the Xenopus laevis oocyte. Mech Dev 2003; 120:503-9. [PMID: 12676327 DOI: 10.1016/s0925-4773(02)00459-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Xenopus laevis short interspersed repeat transcripts (Xlsirts) are a family of noncoding RNAs defined by the presence of a specific repeated sequence that acts as a vegetal localization element. Previous studies have demonstrated that Xlsirts function as localization elements to localize RNA and also in anchoring mRNA at the vegetal cortex. However, the identity of the Xlsirts containing family members present at the cortex was unknown. We identified 17 new Xlsirt cDNAs from an oocyte cDNA library. In addition to being associated with noncoding sequences, the repeats were also present in cDNAs with open reading frames. Xlsirt RNAs with repeats in the correct orientation were capable of localizing to the vegetal cortex. Our observations demonstrate that a heterogeneous population of Xlsirt RNAs is present at the cortex and that this population contains both noncoding RNAs and RNAs encoding proteins that are likely to play important roles in the subsequent development of the embryo.
Collapse
Affiliation(s)
- N Ruth Zearfoss
- Department of Molecular Genetics, The University of Texas M.D. Anderson Cancer Center, 1515 Holcombe Blvd., Houston, TX 77030, USA
| | | | | | | | | |
Collapse
|
146
|
Sironi M, Pozzoli U, Cagliani R, Giorda R, Comi GP, Bardoni A, Menozzi G, Bresolin N. Relevance of sequence and structure elements for deletion events in the dystrophin gene major hot-spot. Hum Genet 2003; 112:272-88. [PMID: 12596052 DOI: 10.1007/s00439-002-0881-5] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2002] [Accepted: 11/04/2002] [Indexed: 11/24/2022]
Abstract
Large intragenic deletions within the DMD locus account for about 60% of Duchenne and Becker muscular dystrophy patients. Two deletion hot-spots have been described in the dystrophin gene, but the mechanisms that determine chromosome breaks in these regions are unknown, and the huge dimensions of the gene have hampered the description of a consistent number of breakpoint sequences. A long-distance polymerase chain reaction strategy was used to amplify 20 deletion junctions involving the major hot-spot and to describe breakpoint position at the sequence level. These junctions were analyzed together with previously reported breakpoint locations so as to increase the sample number and possibly provide a comprehensive study. Minisatellite core sequences, chi elements, translin-binding sites, Pur elements, and matrix attachment regions were sought over the whole gene. Sequence-dependent DNA curvature and duplex stability were also calculated throughout the gene, and their cumulative frequency distribution was evaluated. No association with either sequence or structure elements involved in known illegitimate recombination mechanisms was identified. This study highlights the importance of a whole gene approach to rule out the presumptive role of specific features that, when locally analyzed, might suggest involvement in gene rearrangements.
Collapse
Affiliation(s)
- Manuela Sironi
- IRCCS E. Medea, Associazione La Nostra Famiglia, Via Don Luigi Monza 20, 23842, Bosisio Parini (LC), Italy.
| | | | | | | | | | | | | | | |
Collapse
|
147
|
Ziolkowski PA, Blanc G, Sadowski J. Structural divergence of chromosomal segments that arose from successive duplication events in the Arabidopsis genome. Nucleic Acids Res 2003; 31:1339-50. [PMID: 12582254 PMCID: PMC150220 DOI: 10.1093/nar/gkg201] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2002] [Revised: 11/04/2002] [Accepted: 12/12/2002] [Indexed: 11/14/2022] Open
Abstract
Using the extensive segmental duplications of the Arabidopsis thaliana genome, a comparative study of homoeologous segments occurring in chromosomes 1, 2, 4 and 5 was performed. The gene-by-gene BLASTP approach was applied to identify duplicated genes in homoeologues. The levels of synonymous substitutions between duplicated coding sequences suggest that these regions were formed by at least two rounds of duplications. Moreover, remnants of even more ancient duplication events were recognised by a whole-genome study. We describe a subchromosomal organisation of genes, including the tandemly repeated genes, and the distribution of transposable elements (TEs). In certain cases, evidence of the possible mechanisms of structural rearrangements within the segments could be found. We provide a probable scenario of the rearrangements that took place during the evolution of the homoeologous regions. Furthermore, on the basis of the comparative analysis of the chromosomal segments in the Columbia and Landsberg erecta accessions, an additional structural variation in the A.thaliana genome is described. Analysis of the segments, spanning 7 Mb or 5.6% of the genome, permitted us to propose a model of evolution at the subchromosomal level.
Collapse
Affiliation(s)
- Piotr A Ziolkowski
- Institute of Plant Genetics, Polish Academy of Sciences, Strzeszynska 34, 60-479 Poznan, Poland
| | | | | |
Collapse
|
148
|
Zhang P, Gu Z, Li WH. Different evolutionary patterns between young duplicate genes in the human genome. Genome Biol 2003; 4:R56. [PMID: 12952535 PMCID: PMC193656 DOI: 10.1186/gb-2003-4-9-r56] [Citation(s) in RCA: 109] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2003] [Revised: 06/24/2003] [Accepted: 07/24/2003] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Following gene duplication, two duplicate genes may experience relaxed functional constraints or acquire different mutations, and may also diverge in function. Whether the two copies will evolve in different patterns remains unclear, however, because previous studies have reached conflicting conclusions. In order to resolve this issue, by providing a general picture, we studied 250 independent pairs of young duplicate genes from the whole human genome. RESULTS We showed that nearly 60% of the young duplicate gene pairs have evolved at the amino-acid level at significantly different rates from each other. More than 25% of these gene pairs also showed significantly different ratios of nonsynonymous to synonymous rates (Ka/Ks ratios). Moreover, duplicate pairs with different rates of amino-acid substitution also tend to differ in the Ka/Ks ratio, with the fast-evolving copy tending to have a slightly higher Ks than the slow-evolving one. Lastly, a substantial portion of fast-evolving copies have accumulated amino-acid substitutions evenly across the protein sequences, whereas most of the slow-evolving copies exhibit uneven substitution patterns. CONCLUSIONS Our results suggest that duplicate genes tend to evolve in different patterns following the duplication event. One copy evolves faster than the other and accumulates amino-acid substitutions evenly across the sequence, whereas the other copy evolves more slowly and accumulates amino-acid substitutions unevenly across the sequence. Such different evolutionary patterns may be largely due to different functional constraints on the two copies.
Collapse
Affiliation(s)
- Peng Zhang
- Department of Ecology and Evolution, University of Chicago, East 57th Street, Chicago, IL 60637, USA.
| | | | | |
Collapse
|
149
|
Brosius J. The contribution of RNAs and retroposition to evolutionary novelties. CONTEMPORARY ISSUES IN GENETICS AND EVOLUTION 2003. [DOI: 10.1007/978-94-010-0229-5_1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
150
|
Rouchka EC, Gish W, States DJ. Comparison of whole genome assemblies of the human genome. Nucleic Acids Res 2002; 30:5004-14. [PMID: 12434005 PMCID: PMC137179 DOI: 10.1093/nar/gkf633] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
A fundamental problem in the human genome project is uncovering the correct assembly of the human genome. Many studies, including transcriptional analysis, SNP detection and characterization, gene finding and EST clustering, use genome assemblies as templates so it is important to determine the consistency among the various whole genome assemblies. A comparison of the order and orientation of the GenBank entries used to construct the NCBI and UCSC Goldenpath assemblies was made. In addition, a sequence level comparison was performed using MULTI, an efficient database search tool developed to make whole genome comparisons possible. The resulting comparisons show significant discrepancies in the sequence as well as in the order and orientation of GenBank entries used in constructing the NCBI and UCSC assemblies.
Collapse
Affiliation(s)
- Eric C Rouchka
- Department of Computer Science, Washington University, Washington University School of Medicine, St Louis, MO 63110, USA
| | | | | |
Collapse
|