1
|
Bonnici V, Manca V. Informational laws of genome structures. Sci Rep 2016; 6:28840. [PMID: 27354155 PMCID: PMC4937431 DOI: 10.1038/srep28840] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2016] [Accepted: 06/09/2016] [Indexed: 01/06/2023] Open
Abstract
In recent years, the analysis of genomes by means of strings of length k occurring in the genomes, called k-mers, has provided important insights into the basic mechanisms and design principles of genome structures. In the present study, we focus on the proper choice of the value of k for applying information theoretic concepts that express intrinsic aspects of genomes. The value k = lg2(n), where n is the genome length, is determined to be the best choice in the definition of some genomic informational indexes that are studied and computed for seventy genomes. These indexes, which are based on information entropies and on suitable comparisons with random genomes, suggest five informational laws, to which all of the considered genomes obey. Moreover, an informational genome complexity measure is proposed, which is a generalized logistic map that balances entropic and anti-entropic components of genomes and is related to their evolutionary dynamics. Finally, applications to computational synthetic biology are briefly outlined.
Collapse
Affiliation(s)
- Vincenzo Bonnici
- University of Verona, Department of Computer Science, University of Verona, Verona 37134, Italy,Center for BioMedical Computing, University of Verona, Verona, 37134, Italy
| | - Vincenzo Manca
- University of Verona, Department of Computer Science, University of Verona, Verona 37134, Italy,Center for BioMedical Computing, University of Verona, Verona, 37134, Italy,
| |
Collapse
|
2
|
Zhao Y, Duan S, Zhang W. (De)localization and the mobility edges in a disordered double chain with long-range intrachain correlation and short-range interchain correlation. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2012; 24:245502. [PMID: 22609638 DOI: 10.1088/0953-8984/24/24/245502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Correlation effects and phase transitions are central issues in current studies on disordered systems. In this paper, we study the electronic properties of a disordered double chain with long-range intrachain correlation and short-range interchain correlation. Based on detailed numerical calculations, finite size scaling analysis and empirical analytical calculations, we obtain a phase diagram containing rich physics due to the interplay among the disorder, short-range and long-range correlations. Besides the long-range correlation induced localization-delocalization transitions, we find both first-order and second-order quantum phase transitions on changing the short-range correlation. Interestingly, the localization may be suppressed by increasing the disorder strength in some parameter regime and the 'anti-correlation' leads to the most delocalized state. Our studies shine some light on the mechanism of the charge transport in DNA molecules, where both types of correlated disorders are present.
Collapse
Affiliation(s)
- Yi Zhao
- Institute of Applied Physics and Computational Mathematics, Beijing, People's Republic of China
| | | | | |
Collapse
|
3
|
Wright CF, Morelli MJ, Thébaud G, Knowles NJ, Herzyk P, Paton DJ, Haydon DT, King DP. Beyond the consensus: dissecting within-host viral population diversity of foot-and-mouth disease virus by using next-generation genome sequencing. J Virol 2011; 85:2266-75. [PMID: 21159860 PMCID: PMC3067773 DOI: 10.1128/jvi.01396-10] [Citation(s) in RCA: 111] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2010] [Accepted: 11/29/2010] [Indexed: 01/03/2023] Open
Abstract
The diverse sequences of viral populations within individual hosts are the starting material for selection and subsequent evolution of RNA viruses such as foot-and-mouth disease virus (FMDV). Using next-generation sequencing (NGS) performed on a Genome Analyzer platform (Illumina), this study compared the viral populations within two bovine epithelial samples (foot lesions) from a single animal with the inoculum used to initiate experimental infection. Genomic sequences were determined in duplicate sequencing runs, and the consensus sequence of the inoculum determined by NGS was identical to that previously determined using the Sanger method. However, NGS revealed the fine polymorphic substructure of the viral population, from nucleotide variants present at just below 50% frequency to those present at fractions of 1%. Some of the higher-frequency polymorphisms identified encoded changes within codons associated with heparan sulfate binding and were present in both foot lesions, revealing intermediate stages in the evolution of a tissue culture-adapted virus replicating within a mammalian host. We identified 2,622, 1,434, and 1,703 polymorphisms in the inoculum and in the two foot lesions, respectively: most of the substitutions occurred in only a small fraction of the population and represented the progeny from recent cellular replication prior to onset of any selective pressures. We estimated the upper limit for the genome-wide mutation rate of the virus within a cell to be 7.8 × 10(-4) per nucleotide. The greater depth of detection achieved by NGS demonstrates that this method is a powerful and valuable tool for the dissection of FMDV populations within hosts.
Collapse
Affiliation(s)
- Caroline F. Wright
- Institute for Animal Health, Ash Road, Pirbright, Woking, Surrey GU24 0NF, United Kingdom, MRC, University of Glasgow Centre for Virus Research, Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow G12 8QQ, United Kingdom, Institut National de la Recherche Agronomique (INRA), UMR BGPI, Cirad TA A-54/K, Campus de Baillarguet, 34938 Montpellier Cedex 5, France, The Sir Henry Wellcome Functional Genomics Facility, Faculty of Biomedical and Life Sciences, University of Glasgow, Glasgow G12 8QQ, United Kingdom
| | - Marco J. Morelli
- Institute for Animal Health, Ash Road, Pirbright, Woking, Surrey GU24 0NF, United Kingdom, MRC, University of Glasgow Centre for Virus Research, Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow G12 8QQ, United Kingdom, Institut National de la Recherche Agronomique (INRA), UMR BGPI, Cirad TA A-54/K, Campus de Baillarguet, 34938 Montpellier Cedex 5, France, The Sir Henry Wellcome Functional Genomics Facility, Faculty of Biomedical and Life Sciences, University of Glasgow, Glasgow G12 8QQ, United Kingdom
| | - Gaël Thébaud
- Institute for Animal Health, Ash Road, Pirbright, Woking, Surrey GU24 0NF, United Kingdom, MRC, University of Glasgow Centre for Virus Research, Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow G12 8QQ, United Kingdom, Institut National de la Recherche Agronomique (INRA), UMR BGPI, Cirad TA A-54/K, Campus de Baillarguet, 34938 Montpellier Cedex 5, France, The Sir Henry Wellcome Functional Genomics Facility, Faculty of Biomedical and Life Sciences, University of Glasgow, Glasgow G12 8QQ, United Kingdom
| | - Nick J. Knowles
- Institute for Animal Health, Ash Road, Pirbright, Woking, Surrey GU24 0NF, United Kingdom, MRC, University of Glasgow Centre for Virus Research, Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow G12 8QQ, United Kingdom, Institut National de la Recherche Agronomique (INRA), UMR BGPI, Cirad TA A-54/K, Campus de Baillarguet, 34938 Montpellier Cedex 5, France, The Sir Henry Wellcome Functional Genomics Facility, Faculty of Biomedical and Life Sciences, University of Glasgow, Glasgow G12 8QQ, United Kingdom
| | - Pawel Herzyk
- Institute for Animal Health, Ash Road, Pirbright, Woking, Surrey GU24 0NF, United Kingdom, MRC, University of Glasgow Centre for Virus Research, Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow G12 8QQ, United Kingdom, Institut National de la Recherche Agronomique (INRA), UMR BGPI, Cirad TA A-54/K, Campus de Baillarguet, 34938 Montpellier Cedex 5, France, The Sir Henry Wellcome Functional Genomics Facility, Faculty of Biomedical and Life Sciences, University of Glasgow, Glasgow G12 8QQ, United Kingdom
| | - David J. Paton
- Institute for Animal Health, Ash Road, Pirbright, Woking, Surrey GU24 0NF, United Kingdom, MRC, University of Glasgow Centre for Virus Research, Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow G12 8QQ, United Kingdom, Institut National de la Recherche Agronomique (INRA), UMR BGPI, Cirad TA A-54/K, Campus de Baillarguet, 34938 Montpellier Cedex 5, France, The Sir Henry Wellcome Functional Genomics Facility, Faculty of Biomedical and Life Sciences, University of Glasgow, Glasgow G12 8QQ, United Kingdom
| | - Daniel T. Haydon
- Institute for Animal Health, Ash Road, Pirbright, Woking, Surrey GU24 0NF, United Kingdom, MRC, University of Glasgow Centre for Virus Research, Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow G12 8QQ, United Kingdom, Institut National de la Recherche Agronomique (INRA), UMR BGPI, Cirad TA A-54/K, Campus de Baillarguet, 34938 Montpellier Cedex 5, France, The Sir Henry Wellcome Functional Genomics Facility, Faculty of Biomedical and Life Sciences, University of Glasgow, Glasgow G12 8QQ, United Kingdom
| | - Donald P. King
- Institute for Animal Health, Ash Road, Pirbright, Woking, Surrey GU24 0NF, United Kingdom, MRC, University of Glasgow Centre for Virus Research, Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow G12 8QQ, United Kingdom, Institut National de la Recherche Agronomique (INRA), UMR BGPI, Cirad TA A-54/K, Campus de Baillarguet, 34938 Montpellier Cedex 5, France, The Sir Henry Wellcome Functional Genomics Facility, Faculty of Biomedical and Life Sciences, University of Glasgow, Glasgow G12 8QQ, United Kingdom
| |
Collapse
|
4
|
Abstract
Abstract
Aperiodic order plays a very significant role in biology, as it determines most informative content of genomes. Amongst the various physical, chemical or biological phenomena that might be inferred from sequence correlations, charge transfer properties deserve particular attention. Indeed, the nature of DNA-mediated charge migration has been related to the understanding of damage recognition process, protein binding, or with the task of engineering biological processes (e.g. designing nanoscale sensing of genomic mutations), opening new challenges for emerging nanobiotechnologies. Nevertheless, the solution of Schrödinger´s equation with a potential that is given by a one-dimensional array of the double-stranded DNA remains as a main open theme in solid state physics of biological macromolecules. In this contribution, I will shortly review several approaches introduced during the last few years in order to describe charge transfer migration in DNA in terms of tight-binding effective Hamiltonians.
Collapse
|
5
|
Marqués MI. Monte Carlo study of the competition between long-range and short-range correlated disorder in a second-order phase transition. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2009; 79:052103. [PMID: 19518500 DOI: 10.1103/physreve.79.052103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2009] [Indexed: 05/27/2023]
Abstract
The influence of coexisting correlated and noncorrelated impurities on the critical behavior of the three-dimensional Ising model is studied using Monte Carlo numerical simulations and finite-size scaling. The amount of correlated and noncorrelated vacancies is modified and controlled during the simulations. The long-range correlated (LRC) critical behavior is always found for any value of the concentration of correlated vacancies. The smaller the amount of correlated vacancies the larger the system size needed to detect the LRC universality class. This result explains why critical values measured in xerogel liquid-vapor experiments, where the concentration of correlated vacancies is marginal, seem to correspond to a short-range correlated disorder.
Collapse
Affiliation(s)
- Manuel I Marqués
- Departamento de Física de Materiales C-IV, Universidad Autónoma de Madrid, 28049 Madrid, Spain.
| |
Collapse
|
6
|
Guo AM, Xiong SJ. Violation of the single-parameter scaling hypothesis in human chromosome 22 with charge transfer models. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2009; 79:041924. [PMID: 19518273 DOI: 10.1103/physreve.79.041924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2008] [Revised: 12/15/2008] [Indexed: 05/27/2023]
Abstract
We investigate transport properties of DNA sequences in human chromosome 22 and compare the results with those of a random artificial DNA sequence based on the single- and double-stranded charge transfer models. The statistical quantities, including the Hurst exponent, the distribution of Lyapunov exponent (LE), the central moments, and the scaling parameter, are numerically calculated by using the transfer-matrix approach. It is found that the existence of satellite DNA segments in human chromosome 22 could result in deviations from usual Gaussian distribution of LE. Our results suggest that the presence of the satellite DNA segments, together with the long-range correlations and the base-pairing correlations could lead to the violation of single-parameter scaling hypothesis which holds for the random artificial DNA sequence although the behaviors of the averaged LEs for both DNA sequences are similar. This provides a viewpoint to analyze differences between the genomic DNA sequences and the nonliving random ones on the basis of localization properties of wave functions in the sequences.
Collapse
Affiliation(s)
- Ai-Min Guo
- Department of Physics and National Laboratory of Solid State Microstructures, Nanjing University, Nanjing 210093, China
| | | |
Collapse
|
7
|
Paar V, Pavin N, Basar I, Rosandić M, Gluncić M, Paar N. Hierarchical structure of cascade of primary and secondary periodicities in Fourier power spectrum of alphoid higher order repeats. BMC Bioinformatics 2008; 9:466. [PMID: 18980673 PMCID: PMC2661002 DOI: 10.1186/1471-2105-9-466] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2008] [Accepted: 11/03/2008] [Indexed: 11/28/2022] Open
Abstract
Background Identification of approximate tandem repeats is an important task of broad significance and still remains a challenging problem of computational genomics. Often there is no single best approach to periodicity detection and a combination of different methods may improve the prediction accuracy. Discrete Fourier transform (DFT) has been extensively used to study primary periodicities in DNA sequences. Here we investigate the application of DFT method to identify and study alphoid higher order repeats. Results We used method based on DFT with mapping of symbolic into numerical sequence to identify and study alphoid higher order repeats (HOR). For HORs the power spectrum shows equidistant frequency pattern, with characteristic two-level hierarchical organization as signature of HOR. Our case study was the 16 mer HOR tandem in AC017075.8 from human chromosome 7. Very long array of equidistant peaks at multiple frequencies (more than a thousand higher harmonics) is based on fundamental frequency of 16 mer HOR. Pronounced subset of equidistant peaks is based on multiples of the fundamental HOR frequency (multiplication factor n for nmer) and higher harmonics. In general, nmer HOR-pattern contains equidistant secondary periodicity peaks, having a pronounced subset of equidistant primary periodicity peaks. This hierarchical pattern as signature for HOR detection is robust with respect to monomer insertions and deletions, random sequence insertions etc. For a monomeric alphoid sequence only primary periodicity peaks are present. The 1/fβ – noise and periodicity three pattern are missing from power spectra in alphoid regions, in accordance with expectations. Conclusion DFT provides a robust detection method for higher order periodicity. Easily recognizable HOR power spectrum is characterized by hierarchical two-level equidistant pattern: higher harmonics of the fundamental HOR-frequency (secondary periodicity) and a subset of pronounced peaks corresponding to constituent monomers (primary periodicity). The number of lower frequency peaks (secondary periodicity) below the frequency of the first primary periodicity peak reveals the size of nmer HOR, i.e., the number n of monomers contained in consensus HOR.
Collapse
Affiliation(s)
- Vladimir Paar
- Faculty of Science, University of Zagreb, Bijenicka 32, Zagreb, Croatia.
| | | | | | | | | | | |
Collapse
|
8
|
Liu Z, Venkatesh SS, Maley CC. Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples. BMC Genomics 2008; 9:509. [PMID: 18973670 PMCID: PMC2628393 DOI: 10.1186/1471-2164-9-509] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2008] [Accepted: 10/30/2008] [Indexed: 12/17/2022] Open
Abstract
Background Genomes store information for building and maintaining organisms. Complete sequencing of many genomes provides the opportunity to study and compare global information properties of those genomes. Results We have analyzed aspects of the information content of Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, Saccharomyces cerevisiae, and Escherichia coli (K-12) genomes. Virtually all possible (> 98%) 12 bp oligomers appear in vertebrate genomes while < 2% of 19 bp oligomers are present. Other species showed different ranges of > 98% to < 2% of possible oligomers in D. melanogaster (12–17 bp), C. elegans (11–17 bp), A. thaliana (11–17 bp), S. cerevisiae (10–16 bp) and E. coli (9–15 bp). Frequencies of unique oligomers in the genomes follow similar patterns. We identified a set of 2.6 M 15-mers that are more than 1 nucleotide different from all 15-mers in the human genome and so could be used as probes to detect microbes in human samples. In a human sample, these probes would detect 100% of the 433 currently fully sequenced prokaryotes and 75% of the 3065 fully sequenced viruses. The human genome is significantly more compact in sequence space than a random genome. We identified the most frequent 5- to 20-mers in the human genome, which may prove useful as PCR primers. We also identified a bacterium, Anaeromyxobacter dehalogenans, which has an exceptionally low diversity of oligomers given the size of its genome and its GC content. The entropy of coding regions in the human genome is significantly higher than non-coding regions and chromosomes. However chromosomes 1, 2, 9, 12 and 14 have a relatively high proportion of coding DNA without high entropy, and chromosome 20 is the opposite with a low frequency of coding regions but relatively high entropy. Conclusion Measures of the frequency of oligomers are useful for designing PCR assays and for identifying chromosomes and organisms with hidden structure that had not been previously recognized. This information may be used to detect novel microbes in human tissues.
Collapse
Affiliation(s)
- Zhandong Liu
- Genomics and Computational Biology Graduate Group, School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | | | | |
Collapse
|
9
|
Perera A, Vallverdu M, Claria F, Soria JM, Caminal P. DNA binding site characterization by means of Rényi entropy measures on nucleotide transitions. IEEE Trans Nanobioscience 2008; 7:133-41. [PMID: 18556261 DOI: 10.1109/tnb.2008.2000744] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
In this work, parametric information-theory measures for the characterization of binding sites in DNA are extended with the use of transitional probabilities on the sequence. We propose the use of parametric uncertainty measures such as Rényi entropies obtained from the transition probabilities for the study of the binding sites, in addition to nucleotide frequency-based Rényi measures. Results are reported in this work comparing transition frequencies (i.e., dinucleotides) and base frequencies for Shannon and parametric Rényi entropies for a number of binding sites found in E. Coli, lambda and T7 organisms. We observe that the information provided by both approaches is not redundant. Furthermore, under the presence of noise in the binding site matrix we observe overall improved robustness of nucleotide transition-based algorithms when compared with nucleotide frequency-based method.
Collapse
Affiliation(s)
- A Perera
- Centre for Biomedical Engineering Research, Technical University of Catalonia, Barcelona, Spain.
| | | | | | | | | |
Collapse
|
10
|
Perera A, Vallverdu M, Claria F, Soria JM, Caminal P. DNA binding sites characterization by means of Rényi entropy measures on nucleotide transitions. CONFERENCE PROCEEDINGS : ... ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL CONFERENCE 2008; 2006:5783-6. [PMID: 17946719 DOI: 10.1109/iembs.2006.260482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In this work, parametric information-theory measures for the characterization of binding sites in DNA are extended with the use of transitional probabilities on the sequence. We propose the use of parametric uncertainty measure such as Renyi entropies obtained from the transition probabilities for the study of the binding sites, in addition to nucleotide frequency based Renyi measures. Results are reported in this manuscript comparing transition frequencies (i.e. dinucelotides) and base frequencies for Shannon and parametric Renyi for a number of binding sites found in E. Coli, lambda and T7 organisms. We observe that, for the evaluated datasets, the information provided by both approaches is not redundant, as they evolve differently under increasing Renyi orders.
Collapse
Affiliation(s)
- Alexandre Perera
- Centre for Biomed. Eng. Res., Tech. Univ. of Catalonia, Barcelona, Spain.
| | | | | | | | | |
Collapse
|
11
|
Vinga S, Almeida JS. Local Renyi entropic profiles of DNA sequences. BMC Bioinformatics 2007; 8:393. [PMID: 17939871 PMCID: PMC2238722 DOI: 10.1186/1471-2105-8-393] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2007] [Accepted: 10/16/2007] [Indexed: 11/18/2022] Open
Abstract
Background In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf) using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs. Results The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at . Conclusion The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures.
Collapse
Affiliation(s)
- Susana Vinga
- Instituto de Engenharia de Sistemas e Computadores: Investigação e Desenvolvimento (INESC-ID), R, Alves Redol 9, 1000-029 Lisboa, Portugal.
| | | |
Collapse
|
12
|
Lillo F, Spanò M. Inverted and mirror repeats in model nucleotide sequences. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2007; 76:041914. [PMID: 17995033 DOI: 10.1103/physreve.76.041914] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2007] [Indexed: 05/25/2023]
Abstract
We analytically and numerically study the probabilistic properties of inverted and mirror repeats in model sequences of nucleic acids. We consider both perfect and nonperfect repeats, i.e., repeats with mismatches and gaps. The considered sequence models are independent identically distributed (i.i.d.) sequences, Markov processes and long-range sequences. We show that the number of repeats in correlated sequences is significantly larger than in i.i.d. sequences and that this discrepancy increases exponentially with the repeat length for long-range sequences.
Collapse
Affiliation(s)
- Fabrizio Lillo
- Dipartimento di Fisica e Tecnologie Relative, Università di Palermo, Viale delle Scienze, I-90128, Palermo, Italy
| | | |
Collapse
|
13
|
Guo AM. Long-range correlation and charge transfer efficiency in substitutional sequences of DNA molecules. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2007; 75:061915. [PMID: 17677308 DOI: 10.1103/physreve.75.061915] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2007] [Revised: 04/15/2007] [Indexed: 05/16/2023]
Abstract
We address the relation between long-range correlations and coherent charge transfer in substitutional DNA sequences using the transfer matrix approach. The substitutional sequences exhibit long-range correlations and show good transmittivity in comparison with uncorrelated random ones. It is found that the charge transfer efficiency varies for different substitutional sequences and many will present electronic delocalization in the system. Further, the resistivity for substitutional sequences may range from decreasing with the length, length independence, or increasing with the length. The conduction mechanisms of various behaviors observed for these sequences are analyzed.
Collapse
Affiliation(s)
- Ai-Min Guo
- Department of Physics Science and Technology, Central South University, Changsha 410083, China
| |
Collapse
|
14
|
On tests of independence based on minimum -divergence estimator with constraints: An application to modeling DNA. Comput Stat Data Anal 2006. [DOI: 10.1016/j.csda.2005.11.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
15
|
Caetano RA, Schulz PA. Sequencing-independent delocalization in a DNA-like double chain with base pairing. PHYSICAL REVIEW LETTERS 2005; 95:126601. [PMID: 16197093 DOI: 10.1103/physrevlett.95.126601] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2004] [Indexed: 05/04/2023]
Abstract
The question of whether or not DNA is intrinsically conducting is still a challenge. The ongoing debate on DNA molecules as an electronic material has so far underestimated a key distinction of the system: the role of base pairing in opposition to correlations along each chain. We show that a disordered base paired double chain presents truly or, at least, effectively delocalized states. This effect is irrespective to the sequencing along each chain.
Collapse
Affiliation(s)
- R A Caetano
- Instituto de Física Gleb Wataghin, UNICAMP, Cx.P. 6165, 13083-970, Campinas, SP, Brazil
| | | |
Collapse
|
16
|
Li W, Holste D. Universal 1/f noise, crossovers of scaling exponents, and chromosome-specific patterns of guanine-cytosine content in DNA sequences of the human genome. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2005; 71:041910. [PMID: 15903704 DOI: 10.1103/physreve.71.041910] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/28/2004] [Revised: 10/28/2004] [Indexed: 05/02/2023]
Abstract
Spatial fluctuations of guanine and cytosine base content (GC%) are studied by spectral analysis for the complete set of human genomic DNA sequences. We find that (i) 1/ f(alpha) decay is universally observed in the power spectra of all 24 chromosomes, and (ii) the exponent alpha approximately 1 extends to about 10(7) bases, one order of magnitude longer than has previously been observed. We further find that (iii) almost all human chromosomes exhibit a crossover from alpha(1) approximately 1 (1/ f (alpha(1))) at lower frequency to alpha(2) <1 (1/ f (alpha(2))) at higher frequency, typically occurring at around 30,000-100,000 bases, while (iv) the crossover in this frequency range is virtually absent in human chromosome 22. In addition to the universal 1/ f(alpha) noise in power spectra, we find (v) several lines of evidence for chromosome-specific correlation structures, including a 500,000 base long oscillation in human chromosome 21. The universal 1/ f(alpha) spectrum in the human genome is further substantiated by a resistance to reduction in variance of guanine and cytosine content when the window size is increased.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, North Shore LIJ Institute for Medical Research, 350 Community Drive, Manhasset, New York 10030, USA.
| | | |
Collapse
|
17
|
Borstnik B, Pumpernik D. Evidence on DNA slippage step-length distribution. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2005; 71:031913. [PMID: 15903465 DOI: 10.1103/physreve.71.031913] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2004] [Revised: 11/30/2004] [Indexed: 05/02/2023]
Abstract
A simple model based on a master equation is constructed in order to reveal the details of the mutational events modifying simple sequence repeats in the human genome, A database of simple repeats together with their flanking sequences comprising approximately 10(5) entries from all 24 human chromosomes was constructed. By aligning the pairs of fragments of sequences containing the repeat elements, the matrices that count the number of slippage events were evaluated. The counts were then used as a target to be reproduced by our theoretical model, in which the elongation and shortening of the repeats proceed through a mechanism in which the step lengths exhibit a decaying distribution in the form of an inverse power law rather than through one nucleotide extension or deletion, which was the most frequent supposition in previous studies.
Collapse
|
18
|
Burroughs NJ, de Boer RJ, Keşmir C. Discriminating self from nonself with short peptides from large proteomes. Immunogenetics 2004; 56:311-20. [PMID: 15322777 DOI: 10.1007/s00251-004-0691-0] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2004] [Revised: 05/26/2004] [Indexed: 10/26/2022]
Abstract
We studied whether the peptides of nine amino acids (9-mers) that are typically used in MHC class I presentation are sufficiently unique for self:nonself discrimination. The human proteome contains 28,783 proteins, comprising 10(7) distinct 9-mers. Enumerating distinct 9-mers for a variety of microorganisms we found that the average overlap, i.e., the probability that a foreign peptide also occurs in the human self, is about 0.2%. This self:nonself overlap increased when shorter peptides were used, e.g., was 30% for 6-mers and 3% for 7-mers. Predicting all 9-mers that are expected to be cleaved by the immunoproteasome and to be translocated by TAP, we find that about 25% of the self and the nonself 9-mers are processed successfully. For the HLA-A*0201 and HLA-A*0204 alleles, we predicted which of the processed 9-mers from each proteome are expected to be presented on the MHC. Both alleles prefer to present processed 9-mers to nonprocessed 9-mers, and both have small preference to present foreign peptides. Because a number of amino acids from each 9-mer bind the MHC, and are therefore not exposed to the TCR, antigen presentation seems to involve a significant loss of information. Our results show that this is not the case because the HLA molecules are fairly specific. Removing the two anchor residues from each presented peptide, we find that the self:nonself overlap of these exposed 7-mers resembles that of 9-mers. Summarizing, the 9-mers used in MHC class I presentation tend to carry sufficient information to detect nonself peptides amongst self peptides.
Collapse
|
19
|
Holste D, Grosse I, Beirer S, Schieg P, Herzel H. Repeats and correlations in human DNA sequences. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2003; 67:061913. [PMID: 16241267 DOI: 10.1103/physreve.67.061913] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2003] [Indexed: 05/04/2023]
Abstract
We study the nucleotide-nucleotide mutual information function I(k) of the DNA sequences of the three completely sequenced human chromosomes 20, 21, and 22. We find in each human chromosome (i) the absence of the k=3 base pair (bp) sequence periodicity characteristic for protein coding regions, (ii) the absence of the k=10-11 bp sequence periodicity characteristic for both protein secondary structure and DNA bendability, and (iii) the presence of significant statistical dependencies at about k=135 bp and at about k=165 bp. We investigate to which degree the density and composition of interspersed repeats might explain these observed statistical patterns in all three human chromosomes. We use simple stochastic models to substitute known interspersed repeats and find by numerical studies that (iv) the presence of interspersed repeats dominates short-range correlations as measured by I(k) on the scale of several hundred base pairs in human chromosomes 20, 21, and 22. On the other hand, we find that (v) interspersed repeats contribute only weakly to long-range correlations due to the clustering of highly abundant Alu repeats.
Collapse
Affiliation(s)
- Dirk Holste
- Department of Biology, Massachusetts Institute of Technology, Cambridge 02139, USA.
| | | | | | | | | |
Collapse
|
20
|
Som A, Sahoo S, Mukhopadhyay I, Chakrabarti J, Chaudhury R. Scaling violations in coding DNA. EUROPHYSICS LETTERS (EPL) 2003; 62:271-277. [DOI: 10.1209/epl/i2003-00341-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/19/2023]
|
21
|
Abstract
Here we present a study of statistical correlations among different positions in DNA sequences and their implications by directly using the autocorrelation function. Such an analysis is possible now because of the availability of large sequences or even complete genomes of many organisms. After describing the way in which the autocorrelation function can be applied to DNA-sequence analysis, we show that long-range correlations, implying scale independence, appear in several bacterial genomes as well as in long human chromosome contigs. The source for such correlations in bacteria, which may extend up to 60 kb in Bacillus subtilis, may be related to massive lateral transfer of compositionally biased genes from other genomes. In the human genome, correlations extend for more than five decades and may be related to the evolution of the 'neogenome', a modern evolutionary acquisition composed by GC-rich isochores displaying long-range correlations and scale invariance.
Collapse
Affiliation(s)
- P Bernaola-Galván
- Departamento de Física Aplicada II, E.T.S.I. de Telecomunicación, Universidad de Málaga, Málaga, Spain.
| | | | | | | |
Collapse
|
22
|
Pöschel T, Freund JA. Finite-sample frequency distributions originating from an equiprobability distribution. ACTA ACUST UNITED AC 2002; 66:026103. [PMID: 12241233 DOI: 10.1103/physreve.66.026103] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2001] [Indexed: 11/07/2022]
Abstract
Given an equidistribution for probabilities p(i)=1/N, i=1, ..., N. What is the expected corresponding rank ordered frequency distribution f(i), i=1, ..., N, if an ensemble of M events is drawn?
Collapse
Affiliation(s)
- Thorsten Pöschel
- Humboldt-Universität zu Berlin, Charité, Institut für Biochemie, Monbijoustrasse 2, Berlin D-10117, Germany.
| | | |
Collapse
|
23
|
Carpena P, Bernaola-Galván P, Ivanov PC, Stanley HE. Metal-insulator transition in chains with correlated disorder. Nature 2002; 418:955-9. [PMID: 12198542 DOI: 10.1038/nature00948] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
According to Bloch's theorem, electronic wavefunctions in perfectly ordered crystals are extended, which implies that the probability of finding an electron is the same over the entire crystal. Such extended states can lead to metallic behaviour. But when disorder is introduced in the crystal, electron states can become localized, and the system can undergo a metal-insulator transition (also known as an Anderson transition). Here we theoretically investigate the effect on the physical properties of the electron wavefunctions of introducing long-range correlations in the disorder in one-dimensional binary solids, and find a correlation-induced metal-insulator transition. We perform numerical simulations using a one-dimensional tight-binding model, and find a threshold value for the exponent characterizing the long-range correlations of the system. Above this threshold, and in the thermodynamic limit, the system behaves as a conductor within a broad energy band; below threshold, the system behaves as an insulator. We discuss the possible relevance of this result for electronic transport in DNA, which displays long-range correlations and has recently been reported to be a one-dimensional disordered conductor.
Collapse
Affiliation(s)
- Pedro Carpena
- Departamento de Física Aplicada II, ETSI de Telecomunicación, Universidad de Málaga, 29071 Málaga, Spain.
| | | | | | | |
Collapse
|
24
|
Azad RK, Bernaola-Galván P, Ramaswamy R, Rao JS. Segmentation of genomic DNA through entropic divergence: power laws and scaling. PHYSICAL REVIEW E 2002; 65:051909. [PMID: 12059595 DOI: 10.1103/physreve.65.051909] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2001] [Revised: 01/22/2002] [Indexed: 11/07/2022]
Abstract
Genomic DNA is fragmented into segments using the Jensen-Shannon divergence. Use of this criterion results in the fragments being entropically homogeneous to within a predefined level of statistical significance. Application of this procedure is made to complete genomes of organisms from archaebacteria, eubacteria, and eukaryotes. The distribution of fragment lengths in bacterial and primitive eukaryotic DNAs shows two distinct regimes of power-law scaling. The characteristic length separating these two regimes appears to be an intrinsic property of the sequence rather than a finite-size artifact, and is independent of the significance level used in segmenting a given genome. Fragment length distributions obtained in the segmentation of the genomes of more highly evolved eukaryotes do not have such distinct regimes of power-law behavior.
Collapse
Affiliation(s)
- Rajeev K Azad
- School of Environmental Sciences, Jawaharlal Nehru University, New Delhi 110 067, India.
| | | | | | | |
Collapse
|