1
|
Paar V, Pavin N, Basar I, Rosandić M, Gluncić M, Paar N. Hierarchical structure of cascade of primary and secondary periodicities in Fourier power spectrum of alphoid higher order repeats. BMC Bioinformatics 2008; 9:466. [PMID: 18980673 PMCID: PMC2661002 DOI: 10.1186/1471-2105-9-466] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2008] [Accepted: 11/03/2008] [Indexed: 11/28/2022] Open
Abstract
Background Identification of approximate tandem repeats is an important task of broad significance and still remains a challenging problem of computational genomics. Often there is no single best approach to periodicity detection and a combination of different methods may improve the prediction accuracy. Discrete Fourier transform (DFT) has been extensively used to study primary periodicities in DNA sequences. Here we investigate the application of DFT method to identify and study alphoid higher order repeats. Results We used method based on DFT with mapping of symbolic into numerical sequence to identify and study alphoid higher order repeats (HOR). For HORs the power spectrum shows equidistant frequency pattern, with characteristic two-level hierarchical organization as signature of HOR. Our case study was the 16 mer HOR tandem in AC017075.8 from human chromosome 7. Very long array of equidistant peaks at multiple frequencies (more than a thousand higher harmonics) is based on fundamental frequency of 16 mer HOR. Pronounced subset of equidistant peaks is based on multiples of the fundamental HOR frequency (multiplication factor n for nmer) and higher harmonics. In general, nmer HOR-pattern contains equidistant secondary periodicity peaks, having a pronounced subset of equidistant primary periodicity peaks. This hierarchical pattern as signature for HOR detection is robust with respect to monomer insertions and deletions, random sequence insertions etc. For a monomeric alphoid sequence only primary periodicity peaks are present. The 1/fβ – noise and periodicity three pattern are missing from power spectra in alphoid regions, in accordance with expectations. Conclusion DFT provides a robust detection method for higher order periodicity. Easily recognizable HOR power spectrum is characterized by hierarchical two-level equidistant pattern: higher harmonics of the fundamental HOR-frequency (secondary periodicity) and a subset of pronounced peaks corresponding to constituent monomers (primary periodicity). The number of lower frequency peaks (secondary periodicity) below the frequency of the first primary periodicity peak reveals the size of nmer HOR, i.e., the number n of monomers contained in consensus HOR.
Collapse
Affiliation(s)
- Vladimir Paar
- Faculty of Science, University of Zagreb, Bijenicka 32, Zagreb, Croatia.
| | | | | | | | | | | |
Collapse
|
2
|
Zhang S, Xiao Y. Quasiperiodic property in Alu repeats. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2006; 74:022901. [PMID: 17025492 DOI: 10.1103/physreve.74.022901] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2005] [Revised: 04/06/2006] [Indexed: 05/12/2023]
Abstract
We investigate the possible quasiperiodic property in the sequences of Alu repeats, one of typical noncoding DNA sequences. We calculated the quasiperiods of the right and left monomers of Alu repeats of different families with quasiperiodic matrix algorithm. It is interesting that the right monomers of all families show significant quasiperiod 8 in their sequences while the left monomers show quasiperiods 8 or 5. Our results indicate that there exist common quasiperiods in most Alu repeats. This may be helpful to further explore possible functions of Alu repeats.
Collapse
Affiliation(s)
- Shihua Zhang
- Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
| | | |
Collapse
|
3
|
Li W, Holste D. An unusual 500,000 bases long oscillation of guanine and cytosine content in human chromosome 21. Comput Biol Chem 2004; 28:393-9. [PMID: 15556480 DOI: 10.1016/j.compbiolchem.2004.09.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2004] [Revised: 09/30/2004] [Accepted: 09/30/2004] [Indexed: 01/09/2023]
Abstract
An oscillation with a period of around 500 kb in guanine and cytosine content (GC%) is observed in the DNA sequence of human chromosome 21. This oscillation is localized in the rightmost one-eighth region of the chromosome, from 43.5 Mb to 46.5 Mb. Five cycles of oscillation are observed in this region with six GC-rich peaks and five GC-poor valleys. The GC-poor valleys comprise regions with low density of CpG islands and, alternating between the two DNA strands, low gene density regions. Consequently, the long-range oscillation of GC% result in spacing patterns of both CpG island density, and to a lesser extent, gene densities.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, North Shore LIJ Institute for Medical Research, 350 Community Drive, Manhasset, NY 11030, USA.
| | | |
Collapse
|
4
|
Wickstead B, Ersfeld K, Gull K. Repetitive elements in genomes of parasitic protozoa. Microbiol Mol Biol Rev 2003; 67:360-75, table of contents. [PMID: 12966140 PMCID: PMC193867 DOI: 10.1128/mmbr.67.3.360-375.2003] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Repetitive DNA elements have been a part of the genomic fauna of eukaryotes perhaps since their very beginnings. Millions of years of coevolution have given repeats central roles in chromosome maintenance and genetic modulation. Here we review the genomes of parasitic protozoa in the context of the current understanding of repetitive elements. Particular reference is made to repeats in five medically important species with ongoing or completed genome sequencing projects: Plasmodium falciparum, Leishmania major, Trypanosoma brucei, Trypanosoma cruzi, and Giardia lamblia. These organisms are used to illustrate five thematic classes of repeats with different structures and genomic locations. We discuss how these repeat classes may interact with parasitic life-style and also how they can be used as experimental tools. The story which emerges is one of opportunism and upheaval which have been employed to add genetic diversity and genomic flexibility.
Collapse
Affiliation(s)
- Bill Wickstead
- Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, United Kingdom
| | | | | |
Collapse
|
5
|
Rosandić M, Paar V, Basar I. Key-string segmentation algorithm and higher-order repeat 16mer (54 copies) in human alpha satellite DNA in chromosome 7. J Theor Biol 2003; 221:29-37. [PMID: 12634041 DOI: 10.1006/jtbi.2003.3165] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
A new key-string segmentation algorithm for identification of alpha satellite DNAs and higher-order repeat (HOR) units was introduced and exemplified. Starting with an initial key string, we determine the dominant key string and HOR. Our key-string algorithm was used to scan the recent GenBank data for human alpha satellite DNA sequence AC017075.8 (193 277 bp) from the centromeric region of chromosome 7. The sequence was computationally segmented into one HOR domain (super-repeat domain) and two non-HOR domains. Dominant key-string GTTTCT provided segmentation in terms of alpha monomers. The HOR is tandemly repeated in 54 copies in the super-repeat (HOR) domain. Five insertions and three deletions in the HOR structure associated with a dominant key string were identified. Concensus HOR was constructed. Divergence of individual HOR copies from concensus amounts to 0.7% on the average, while divergence between 16 monomer variants within each HOR is on the average 20%. In the front and back domain, 199 monomer variants were identified that are not organized in HOR and diverge by 20-40%.
Collapse
Affiliation(s)
- M Rosandić
- Department of Internal Medicine, University Hospital Rebro, University of Zagreb, Kispatićeva 12, Zagreb, Croatia
| | | | | |
Collapse
|
6
|
Affiliation(s)
- E Pizzi
- Laboratory of Cell Biology, Istituto Superiore di Sanità, Viale Regina Elena 299, 00161 Rome, Italy
| | | |
Collapse
|
7
|
Abstract
Full-sequence data available for Plasmodium falciparumchromosomes 2 and 3 are exploited to perform a statistical analysis of the long tracts of biased amino acid composition that characterize the vast majority of P. falciparum proteins and to make a comparison with similarly defined tracts from other simple eukaryotes. When the relatively minor subset of prevalently hydrophobic segments is discarded from the set of low-complexity segments identified by current segmentation methods in P. falciparum proteins, a good correspondence is found between prevalently hydrophilic low-complexity segments and the species-specific, rapidly diverging insertions detected by multiple-alignment procedures when sequences of bona fide homologs are available. Amino acid preferences are fairly uniform in the set of hydrophilic low-complexity segments identified in the twoP. falciparum chromosomes sequenced, as well as in sequenced genes from Plasmodium berghei, but differ from those observed in Saccharomyces cerevisiae and Dictyostelium discoideum. In the two plasmodial species, amino acid frequencies do not correlate with properties such as hydrophilicity, small volume, or flexibility, which might be expected to characterize residues involved in nonglobular domains but do correlate with A-richness in codons. An effect of phenotypic selection versus neutral drift, however, is suggested by the predominance of asparagine over lysine.
Collapse
|
8
|
Abstract
Full-sequence data available for Plasmodium falciparum chromosomes 2 and 3 are exploited to perform a statistical analysis of the long tracts of biased amino acid composition that characterize the vast majority of P. falciparum proteins and to make a comparison with similarly defined tracts from other simple eukaryotes. When the relatively minor subset of prevalently hydrophobic segments is discarded from the set of low-complexity segments identified by current segmentation methods in P. falciparum proteins, a good correspondence is found between prevalently hydrophilic low-complexity segments and the species-specific, rapidly diverging insertions detected by multiple-alignment procedures when sequences of bona fide homologs are available. Amino acid preferences are fairly uniform in the set of hydrophilic low-complexity segments identified in the two P. falciparum chromosomes sequenced, as well as in sequenced genes from Plasmodium berghei, but differ from those observed in Saccharomyces cerevisiae and Dictyostelium discoideum. In the two plasmodial species, amino acid frequencies do not correlate with properties such as hydrophilicity, small volume, or flexibility, which might be expected to characterize residues involved in nonglobular domains but do correlate with A-richness in codons. An effect of phenotypic selection versus neutral drift, however, is suggested by the predominance of asparagine over lysine.
Collapse
Affiliation(s)
- E Pizzi
- Laboratorio di Biologia Cellulare, Istituto Superiore di Sanitá, 00161 Rome, Italy
| | | |
Collapse
|
9
|
Abstract
Biological macromolecules such as DNA, RNA, and proteins can be regarded as finite sequences of symbols (or words) over a finite alphabet. In this paper, we refer to DNA (RNA) sequences which are words on a four-letter alphabet. A comparison is made between some "genes", or fragments of them, with random sequences or random reshuffled sequences on the same alphabet and having the same length. Some combinatorial techniques of analysis of finite words are developed. A crucial role in the comparison is played by the so-called special factors of a given word. In all the analysed DNA (RNA) fragments the distribution on the length of the number of right (left) special factors differs, in a very typical way, from the corresponding distribution in a string on the same alphabet and having the same length generated by a random source or obtained by making a random alteration (=shuffling) of the original string. This kind of change is irrespective of the length in the range that we have considered <2650 bp and of the phylogenetic origin of the fragment.
Collapse
Affiliation(s)
- A Colosimo
- Dipartimento di Scienze Biochimiche, Università di Roma "La Sapienza", Piazzale A. Moro 2, Roma, 00185, Italy
| | | |
Collapse
|
10
|
Rich SM, Hudson RR, Ayala FJ. Plasmodium falciparum antigenic diversity: evidence of clonal population structure. Proc Natl Acad Sci U S A 1997; 94:13040-5. [PMID: 9371796 PMCID: PMC24259 DOI: 10.1073/pnas.94.24.13040] [Citation(s) in RCA: 85] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Plasmodium falciparum, the agent of malignant malaria, is one of mankind's most severe scourges. Efforts to develop preventive vaccines or remedial drugs are handicapped by the parasite's rapid evolution of drug resistance and protective antigens. We examine 25 DNA sequences of the gene coding for the highly polymorphic antigenic circumsporozoite protein. We observe total absence of silent nucleotide variation in the two nonrepeated regions of the gene. We propose that this absence reflects a recent origin (within several thousand years) of the world populations of P. falciparum from a single individual; the amino acid polymorphisms observed in these nonrepeat regions would result from strong natural selection. Analysis of these polymorphisms indicates that: (i) the incidence of recombination events does not increase with nucleotide distance; (ii) the strength of linkage disequilibrium between nucleotides is also independent of distance; and (iii) haplotypes in the two nonrepeat regions are correlated with one another, but not with the central repeat region they span. We propose two hypotheses: (i) variation in the highly polymorphic central repeat region arises by mitotic intragenic recombination, and (ii) the population structure of P. falciparum is clonal--a state of affairs that persists in spite of the necessary stage of physiological sexuality that the parasite must sustain in the mosquito vector to complete its life cycle.
Collapse
Affiliation(s)
- S M Rich
- Department of Ecology and Evolutionary Biology, University of California, Irvine 92697-2525, USA
| | | | | |
Collapse
|
11
|
Felger I, Marshal VM, Reeder JC, Hunt JA, Mgone CS, Beck HP. Sequence diversity and molecular evolution of the merozoite surface antigen 2 of Plasmodium falciparum. J Mol Evol 1997; 45:154-60. [PMID: 9236275 DOI: 10.1007/pl00006215] [Citation(s) in RCA: 44] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Eleven new alleles of the Plasmodium falciparum merozoite surface antigen 2 (MSA2) from Papua New Guinea were analyzed by direct sequencing of polymerase chain reaction (PCR) products. We have used the sequence information to trace the molecular evolution of MSA2. The repeats of ten alleles belonging to the 3D7 allelic family differed considerably in size, nucleotide sequence, and repeat copy number. In the repeat region of these new alleles, codon usage was extremely biased with an exclusive use of NNT codons. Another new allele sequenced belonged to the FC27 family and confirmed the family-specific conserved structure of 96 and 36 bp repeats. In order to assess sequence microheterogeneity within samples defined as the same genotype by restriction fragment length polymorphism (RFLP), we have analyzed single-strand conformation polymorphism (SSCP) of different samples of the most frequent allele (D10 of the FC27 family) in the study population. No sequence heterogeneity could be detected within the repeat region. Based on analysis of the repeat regions in both allelic families, we discuss the hypothesis of a different evolutionary strategy being represented by each of the allelic families. Kew words: Merozoite surface antigen 2 - Nucleotide sequence comparisons - Molecular evolution
Collapse
Affiliation(s)
- I Felger
- Institut für Zellbiologie, Universität Witten-Herdecke, Stockumer Str. 10, 58448 Witten, Germany
| | | | | | | | | | | |
Collapse
|
12
|
Abstract
Extensive genome plasticity in Plasmodium involves frequent loss of dispensable functions under non-selective conditions, polymorphisms in subtelomeric repetitive regions, as well as rapid and apparently concerted variation in the intra-genic repetitive arrays that are typical of plasmodial antigen genes. As an example of the latter type of variation, the region of the merozoite surface antigen gene MSA-1 of Plasmodium falciparum, which encodes a tri-peptide repeat, is analysed in detail. The example illustrates how evasion of the immune defenses of the vertebrate host can be achieved through repeat homogenization mechanisms, acting at the DNA level, and leading to rapid fixation of variant epitopes. The remarkable ability of Plasmodia to utilize mechanisms which operate on its own nuclear DNA in the course of mitotic multiplication is discussed against the need of life cycle closure as a haploid unicellular. The possibility is suggested that active genomic diversification in a (clonal) multicellular population evolved as an adaptive tool.
Collapse
Affiliation(s)
- C Frontali
- Laboratorio di Biologia Cellulare, Istituto Superiore di Sanità, Rome, Italy
| |
Collapse
|
13
|
Scotti R, Pace T, Ponzi M. A 40-kilobase subtelomeric region is common to most Plasmodium falciparum 3D7 chromosomes. Mol Biochem Parasitol 1993; 58:1-6. [PMID: 8459822 DOI: 10.1016/0166-6851(93)90084-b] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Starting from previous evidence indicating that some features are shared by several Plasmodium falciparum chromosomal extremities, a subtelomeric region present on most P. falciparum 3D7 chromosomes has been mapped. It was shown to occupy about 40 kb, and to include the proximal portion of pPftel. 1, the only telomeric clone described for P. falciparum [12], the complete 21-bp repetitive cluster and some conserved sites (PstI, EcoRI) proximally located with respect to this cluster.
Collapse
Affiliation(s)
- R Scotti
- Laboratorio di Biologia Cellulare, Istituto Superiore di Sanità, Rome, Italy
| | | | | |
Collapse
|
14
|
Ponzi M, Pace T, Dore E, Picci L, Pizzi E, Frontali C. Extensive turnover of telomeric DNA at a Plasmodium berghei chromosomal extremity marked by a rare recombinational event. Nucleic Acids Res 1992; 20:4491-7. [PMID: 1408751 PMCID: PMC334176 DOI: 10.1093/nar/20.17.4491] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
The dynamics of telomere turnover were studied in Plasmodium, whose telomeric structures consist of linear, recognisable sequences of two distinct repeats (TTTAGGG and TTCAGGG). Independent recombinant clones containing a well-defined chromosomal extremity of Plasmodium berghei, both before and after a rare insertion event took place, were obtained from clonal parasite populations and analysed. The insertion, which splits the original telomere and causes a significant reduction in the size of the telomeric structure, is shown to consist of an integer number of subtelomeric repeats typical of P.berghei, flanked on both sides by telomere-derived motifs. Analysis of the telomeric repeat sequence heterogeneity in the otherwise homogeneous populations examined, is compatible with a model in which diversification of a given telomere is driven by the occurrence of breakpoints whose frequency rapidly increases along the telomeric tract when moving in the outward direction. The breakpoints might be due either to terminal deletions followed by random serial addition of the two repeat versions, or to recombination events. The shortening/elongation mechanism is favoured against the recombination hypothesis because of the absence of higher-order patterns in the sequence of telomeric repeats.
Collapse
Affiliation(s)
- M Ponzi
- Laboratorio di Biologia Cellulare, Istituto Superiore di Sanità, Rome, Italy
| | | | | | | | | | | |
Collapse
|