1
|
Liu X, Teng L, Luo Y, Xu Y. Prediction of prokaryotic and eukaryotic promoters based on information-theoretic features. Biosystems 2023; 231:104979. [PMID: 37423595 DOI: 10.1016/j.biosystems.2023.104979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 07/06/2023] [Accepted: 07/07/2023] [Indexed: 07/11/2023]
Abstract
Promoters are DNA regulatory elements located near the transcription start site and are responsible for regulating the transcription of genes. DNA fragments arranged in a certain order form specific functional regions with different information contents. Information theory is the science that studies the extraction, measurement and transmission of information. The genetic information contained in DNA follows the general laws of information storage. Therefore, method in information theory can be used for the analysis of promoters carrying genetic information. In this study, we introduced the concept of information theory to the study of promoter prediction. We used 107 features extracted based on information theory methods and a backpropagation neural network to build a classifier. Then, the trained classifier was applied to predict the promoters of 6 organisms. The average AUCs of the 6 organisms obtained by using hold-out validation and ten-fold cross-validation were 0.885 and 0.886, respectively. The results verified the effectiveness of information-theoretic features in promoter prediction. Considering the possible redundancy in the feature set, we performed feature selection and obtained key feature subsets related to promoter characteristics. The results indicate the potential utility of information-theoretic features in promoter prediction.
Collapse
Affiliation(s)
- Xiao Liu
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing, 400044, China.
| | - Li Teng
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing, 400044, China
| | - Yachuan Luo
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing, 400044, China
| | - Yuqiao Xu
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing, 400044, China
| |
Collapse
|
2
|
ALUminating the Path of Atherosclerosis Progression: Chaos Theory Suggests a Role for Alu Repeats in the Development of Atherosclerotic Vascular Disease. Int J Mol Sci 2018; 19:ijms19061734. [PMID: 29895733 PMCID: PMC6032270 DOI: 10.3390/ijms19061734] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2018] [Revised: 06/04/2018] [Accepted: 06/09/2018] [Indexed: 12/12/2022] Open
Abstract
Atherosclerosis (ATH) and coronary artery disease (CAD) are chronic inflammatory diseases with an important genetic background; they derive from the cumulative effect of multiple common risk alleles, most of which are located in genomic noncoding regions. These complex diseases behave as nonlinear dynamical systems that show a high dependence on their initial conditions; thus, long-term predictions of disease progression are unreliable. One likely possibility is that the nonlinear nature of ATH could be dependent on nonlinear correlations in the structure of the human genome. In this review, we show how chaos theory analysis has highlighted genomic regions that have shared specific structural constraints, which could have a role in ATH progression. These regions were shown to be enriched with repetitive sequences of the Alu family, genomic parasites that have colonized the human genome, which show a particular secondary structure and are involved in the regulation of gene expression. Here, we show the impact of Alu elements on the mechanisms that regulate gene expression, especially highlighting the molecular mechanisms via which the Alu elements alter the inflammatory response. We devote special attention to their relationship with the long noncoding RNA (lncRNA); antisense noncoding RNA in the INK4 locus (ANRIL), a risk factor for ATH; their role as microRNA (miRNA) sponges; and their ability to interfere with the regulatory circuitry of the (nuclear factor kappa B) NF-κB response. We aim to characterize ATH as a nonlinear dynamic system, in which small initial alterations in the expression of a number of repetitive elements are somehow amplified to reach phenotypic significance.
Collapse
|
3
|
Korotkov EV, Suvorova YM, Skryabin KG. Search of tandem repeats with insertion and deletions in the A. thaliana genome. DOKL BIOCHEM BIOPHYS 2018; 477:398-400. [PMID: 29297128 DOI: 10.1134/s160767291706014x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2017] [Indexed: 11/22/2022]
Abstract
A new mathematical method was used for the first time to search for tandem repeats with insertions and deletions in the full-length sequence of the A. thaliana genome. The method is based on a new algorithm for multiple alignment of sequences of certain periods without using paired comparisons of sequences. We identified 13997 periodic sites 2 to 50 characters long, only approximately 30% of which were known earlier. The possible origin and use of the identified sites with tandem repeats are discussed.
Collapse
Affiliation(s)
- E V Korotkov
- Federal Research Center "Fundamentals of Biotechnology", Russian Academy of Sciences, Moscow, Russia.
| | - Yu M Suvorova
- Federal Research Center "Fundamentals of Biotechnology", Russian Academy of Sciences, Moscow, Russia
| | - K G Skryabin
- Federal Research Center "Fundamentals of Biotechnology", Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|
4
|
Pugacheva V, Korotkov A, Korotkov E. Search of latent periodicity in amino acid sequences by means of genetic algorithm and dynamic programming. Stat Appl Genet Mol Biol 2017; 15:381-400. [PMID: 27337743 DOI: 10.1515/sagmb-2015-0079] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The aim of this study was to show that amino acid sequences have a latent periodicity with insertions and deletions of amino acids in unknown positions of the analyzed sequence. Genetic algorithm, dynamic programming and random weight matrices were used to develop a new mathematical algorithm for latent periodicity search. A multiple alignment of periods was calculated with help of the direct optimization of the position-weight matrix without using pairwise alignments. The developed algorithm was applied to analyze amino acid sequences of a small number of proteins. This study showed the presence of latent periodicity with insertions and deletions in the amino acid sequences of such proteins, for which the presence of latent periodicity was not previously known. The origin of latent periodicity with insertions and deletions is discussed.
Collapse
|
5
|
Database of Periodic DNA Regions in Major Genomes. BIOMED RESEARCH INTERNATIONAL 2017; 2017:7949287. [PMID: 28182099 PMCID: PMC5274682 DOI: 10.1155/2017/7949287] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/09/2016] [Revised: 12/07/2016] [Accepted: 12/21/2016] [Indexed: 12/11/2022]
Abstract
Summary. We analyzed several prokaryotic and eukaryotic genomes looking for the periodicity sequences availability and employing a new mathematical method. The method envisaged using the random position weight matrices and dynamic programming. Insertions and deletions were allowed inside periodicities, thus adding a novelty to the results we obtained. A periodicity length, one of the key periodicity features, varied from 2 to 50 nt. Totally over 60,000 periodicity sequences were found in 15 genomes including some chromosomes of the H. sapiens (partial), C. elegans, D. melanogaster, and A. thaliana genomes.
Collapse
|
6
|
Mallona I, Jordà M, Peinado MA. A knowledgebase of the human Alu repetitive elements. J Biomed Inform 2016; 60:77-83. [PMID: 26827622 DOI: 10.1016/j.jbi.2016.01.010] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2015] [Revised: 01/20/2016] [Accepted: 01/22/2016] [Indexed: 01/13/2023]
Abstract
Alu elements are the most abundant retrotransposons in the human genome with more than one million copies. Alu repeats have been reported to participate in multiple processes related with genome regulation and compartmentalization. Moreover, they have been involved in the facilitation of pathological mutations in many diseases, including cancer. The contribution of Alus and other repeats in genomic regulation is often overlooked because their study poses technical and analytical challenges hardly attainable with conventional strategies. Here we propose the integration of ontology-based semantic methods to query a knowledgebase for the human Alus. The knowledgebase for the human Alus leverages Sequence (SO) and Gene Ontologies (GO) and is devoted to address functional and genetic information in the genomic context of the Alus. For each Alu element, the closest gene and transcript are stored, as well their functional annotation according to GO, the state of the chromatin and the transcription factors binding sites inside the Alu. The model uses Web Ontology Language (OWL) and Semantic Web Rule Language (SWRL). As a case of use and to illustrate the utility of the tool, we have evaluated the epigenetic states of Alu repeats associated with gene promoters according to their transcriptional activity. The ontology is easily extendable, offering a scaffold for the inclusion of new experimental data. The RDF/XML formalization is freely available at http://aluontology.sourceforge.net/.
Collapse
Affiliation(s)
- Izaskun Mallona
- Institute of Predictive and Personalized Medicine of Cancer (IMPPC) and Health Research Institute Germans Trias i Pujol (IGTP), Can Ruti Campus. Ctra. de Can Ruti, camí de les escoles, s/n, 08916 Badalona, Spain.
| | - Mireia Jordà
- Institute of Predictive and Personalized Medicine of Cancer (IMPPC) and Health Research Institute Germans Trias i Pujol (IGTP), Can Ruti Campus. Ctra. de Can Ruti, camí de les escoles, s/n, 08916 Badalona, Spain
| | - Miguel A Peinado
- Institute of Predictive and Personalized Medicine of Cancer (IMPPC) and Health Research Institute Germans Trias i Pujol (IGTP), Can Ruti Campus. Ctra. de Can Ruti, camí de les escoles, s/n, 08916 Badalona, Spain
| |
Collapse
|
7
|
Wu ZB. Analysis of correlation structures in the Synechocystis PCC6803 genome. Comput Biol Chem 2014; 53 Pt A:49-58. [PMID: 25199594 DOI: 10.1016/j.compbiolchem.2014.08.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2014] [Indexed: 11/26/2022]
Abstract
Transfer of nucleotide strings in the Synechocystis sp. PCC6803 genome is investigated to exhibit periodic and non-periodic correlation structures by using the recurrence plot method and the phase space reconstruction technique. The periodic correlation structures are generated by periodic transfer of several substrings in long periodic or non-periodic nucleotide strings embedded in the coding regions of genes. The non-periodic correlation structures are generated by non-periodic transfer of several substrings covering or overlapping with the coding regions of genes. In the periodic and non-periodic transfer, some gaps divide the long nucleotide strings into the substrings and prevent their global transfer. Most of the gaps are either the replacement of one base or the insertion/reduction of one base. In the reconstructed phase space, the points generated from two or three steps for the continuous iterative transfer via the second maximal distance can be fitted by two lines. It partly reveals an intrinsic dynamics in the transfer of nucleotide strings. Due to the comparison of the relative positions and lengths, the substrings concerned with the non-periodic correlation structures are almost identical to the mobile elements annotated in the genome. The mobile elements are thus endowed with the basic results on the correlation structures.
Collapse
Affiliation(s)
- Zuo-Bing Wu
- State Key Laboratory of Nonlinear Mechanics, Institute of Mechanics, Chinese Academy of Sciences, Beijing 100190, China.
| |
Collapse
|
8
|
Periodic distribution of a putative nucleosome positioning motif in human, nonhuman primates, and archaea: mutual information analysis. Int J Genomics 2013; 2013:963956. [PMID: 23841049 PMCID: PMC3691935 DOI: 10.1155/2013/963956] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2013] [Accepted: 04/29/2013] [Indexed: 12/12/2022] Open
Abstract
Recently, Trifonov's group proposed a 10-mer DNA motif YYYYYRRRRR as a solution of the long-standing problem of sequence-based nucleosome positioning. To test whether this generic decamer represents a biological meaningful signal, we compare the distribution of this motif in primates and Archaea, which are known to contain nucleosomes, and in Eubacteria, which do not possess nucleosomes. The distribution of the motif is analyzed by the mutual information function (MIF) with a shifted version of itself (MIF profile). We found common features in the patterns of this generic decamer on MIF profiles among primate species, and interestingly we found conspicuous but dissimilar MIF profiles for each Archaea tested. The overall MIF profiles for each chromosome in each primate species also follow a similar pattern. Trifonov's generic decamer may be a highly conserved motif for the nucleosome positioning, but we argue that this is not the only motif. The distribution of this generic decamer exhibits previously unidentified periodicities, which are associated to highly repetitive sequences in the genome. Alu repetitive elements contribute to the most fundamental structure of nucleosome positioning in higher Eukaryotes. In some regions of primate chromosomes, the distribution of the decamer shows symmetrical patterns including inverted repeats.
Collapse
|
9
|
Pokorný J, Hašek J, Jelínek F. Electromagnetic field of microtubules: effects on transfer of mass particles and electrons. J Biol Phys 2013; 31:501-14. [PMID: 23345914 DOI: 10.1007/s10867-005-1286-1] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
Abstract
Biological polar molecules and polymer structures with energy supply (such as microtubules in the cytoskeleton) can get excited and generate an endogenous electromagnetic field with strong electrical component in their vicinity. The endogenous electrical fields through action on charges, on dipoles and multipoles, and through polarization (causing dielectrophoretic effect) exert forces and can drive charges and particles in the cell. The transport of mass particles and electrons is analyzed as a Wiener-Lévy process with inclusion of deterministic force (validity of the Bloch theorem is assumed for transport of electrons in molecular chains too). We compare transport driven by deterministic forces (together with an inseparable thermal component) with that driven thermally and evaluate the probability to reach the target. Deterministic forces can transport particles and electrons with higher probability than forces of thermal origin only. The effect of deterministic forces on directed transport is dominant.
Collapse
Affiliation(s)
- Jiří Pokorný
- Institute of Radio Engineering and Electronics, Academy of Sciences of the Czech Republic, Chaberská 57, 182 51 Prague 8, Czech Republic
| | | | | |
Collapse
|
10
|
Rushdi A, Tuqan J, Strohmer T. Map-invariant spectral analysis for the identification of DNA periodicities. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2012; 2012:16. [PMID: 23067324 PMCID: PMC3751961 DOI: 10.1186/1687-4153-2012-16] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2012] [Accepted: 09/06/2012] [Indexed: 11/10/2022]
Abstract
Many signal processing based methods for finding hidden periodicities in DNA sequences have primarily focused on assigning numerical values to the symbolic DNA sequence and then applying spectral analysis tools such as the short-time discrete Fourier transform (ST-DFT) to locate these repeats. The key results pertaining to this approach are however obtained using a very specific symbolic to numerical map, namely the so-called Voss representation. An important research problem is to therefore quantify the sensitivity of these results to the choice of the symbolic to numerical map. In this article, a novel algebraic approach to the periodicity detection problem is presented and provides a natural framework for studying the role of the symbolic to numerical map in finding these repeats. More specifically, we derive a new matrix-based expression of the DNA spectrum that comprises most of the widely used mappings in the literature as special cases, shows that the DNA spectrum is in fact invariable under all these mappings, and generates a necessary and sufficient condition for the invariance of the DNA spectrum to the symbolic to numerical map. Furthermore, the new algebraic framework decomposes the periodicity detection problem into several fundamental building blocks that are totally independent of each other. Sophisticated digital filters and/or alternate fast data transforms such as the discrete cosine and sine transforms can therefore be always incorporated in the periodicity detection scheme regardless of the choice of the symbolic to numerical map. Although the newly proposed framework is matrix based, identification of these periodicities can be achieved at a low computational cost.
Collapse
Affiliation(s)
- Ahmad Rushdi
- Department of Electrical and Computer Engineering at the University of California, Davis, CA 95616, USA, and is now with Cisco Systems, Inc,, San Jose CA 95134, USA.
| | | | | |
Collapse
|
11
|
CAGO: a software tool for dynamic visual comparison and correlation measurement of genome organization. PLoS One 2011; 6:e27080. [PMID: 22114666 PMCID: PMC3219657 DOI: 10.1371/journal.pone.0027080] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2011] [Accepted: 10/10/2011] [Indexed: 11/26/2022] Open
Abstract
CAGO (Comparative Analysis of Genome Organization) is developed to address two critical shortcomings of conventional genome atlas plotters: lack of dynamic exploratory functions and absence of signal analysis for genomic properties. With dynamic exploratory functions, users can directly manipulate chromosome tracks of a genome atlas and intuitively identify distinct genomic signals by visual comparison. Signal analysis of genomic properties can further detect inconspicuous patterns from noisy genomic properties and calculate correlations between genomic properties across various genomes. To implement dynamic exploratory functions, CAGO presents each genome atlas in Scalable Vector Graphics (SVG) format and allows users to interact with it using a SVG viewer through JavaScript. Signal analysis functions are implemented using R statistical software and a discrete wavelet transformation package waveslim. CAGO is not only a plotter for generating complex genome atlases, but also a platform for exploring genome atlases with dynamic exploratory functions for visual comparison and with signal analysis for comparing genomic properties across multiple organisms. The web-based application of CAGO, its source code, user guides, video demos, and live examples are publicly available and can be accessed at http://cbs.ym.edu.tw/cago.
Collapse
|
12
|
Gao Y, Luo L. Genome-based phylogeny of dsDNA viruses by a novel alignment-free method. Gene 2011; 492:309-14. [PMID: 22100880 DOI: 10.1016/j.gene.2011.11.004] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2011] [Revised: 09/19/2011] [Accepted: 11/01/2011] [Indexed: 12/25/2022]
Abstract
Sequence alignment is not directly applicable to whole genome phylogeny since several events such as rearrangements make full length alignments impossible. Here, a novel alignment-free method derived from the standpoint of information theory is proposed and used to construct the whole-genome phylogeny for a population of viruses from 13 viral families comprising 218 dsDNA viruses. The method is based on information correlation (IC) and partial information correlation (PIC). We observe that (i) the IC-PIC tree segregates the population into clades, the membership of each is remarkably consistent with biologist's systematics only with little exceptions; (ii) the IC-PIC tree reveals potential evolutionary relationships among some viral families; and (iii) the IC-PIC tree predicts the taxonomic positions of certain "unclassified" viruses. Our approach provides a new way for recovering the phylogeny of viruses, and has practical applications in developing alignment-free methods for sequence classification.
Collapse
Affiliation(s)
- Yang Gao
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | | |
Collapse
|
13
|
Xu Y, Ma QD, Schmitt DT, Bernaola-Galván P, Ivanov PC. Effects of coarse-graining on the scaling behavior of long-range correlated and anti-correlated signals. PHYSICA A 2011; 390:4057-4072. [PMID: 25392599 PMCID: PMC4226277 DOI: 10.1016/j.physa.2011.05.015] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
We investigate how various coarse-graining (signal quantization) methods affect the scaling properties of long-range power-law correlated and anti-correlated signals, quantified by the detrended fluctuation analysis. Specifically, for coarse-graining in the magnitude of a signal, we consider (i) the Floor, (ii) the Symmetry and (iii) the Centro-Symmetry coarse-graining methods. We find that for anti-correlated signals coarse-graining in the magnitude leads to a crossover to random behavior at large scales, and that with increasing the width of the coarse-graining partition interval Δ, this crossover moves to intermediate and small scales. In contrast, the scaling of positively correlated signals is less affected by the coarse-graining, with no observable changes when Δ < 1, while for Δ > 1 a crossover appears at small scales and moves to intermediate and large scales with increasing Δ. For very rough coarse-graining (Δ > 3) based on the Floor and Symmetry methods, the position of the crossover stabilizes, in contrast to the Centro-Symmetry method where the crossover continuously moves across scales and leads to a random behavior at all scales; thus indicating a much stronger effect of the Centro-Symmetry compared to the Floor and the Symmetry method. For coarse-graining in time, where data points are averaged in non-overlapping time windows, we find that the scaling for both anti-correlated and positively correlated signals is practically preserved. The results of our simulations are useful for the correct interpretation of the correlation and scaling properties of symbolic sequences.
Collapse
Affiliation(s)
- Yinlin Xu
- Center for Polymer Studies and Department of Physics, Boston University, Boston, MA 02215, USA
- College of Physics Science and Technology, Nanjing Normal University, Nanjing 210097, China
| | - Qianli D.Y. Ma
- Harvard Medical School and Division of Sleep Medicine, Brigham & Women’s Hospital, Boston, MA 02215, USA
- College of Geography and Biological Information, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
| | - Daniel T. Schmitt
- Center for Polymer Studies and Department of Physics, Boston University, Boston, MA 02215, USA
| | | | - Plamen Ch. Ivanov
- Center for Polymer Studies and Department of Physics, Boston University, Boston, MA 02215, USA
- Harvard Medical School and Division of Sleep Medicine, Brigham & Women’s Hospital, Boston, MA 02215, USA
- Departamento de Física Aplicada II, Universidad de Málaga, 29071 Málaga, Spain
| |
Collapse
|
14
|
Rybalko S, Larionov S, Poptsova M, Loskutov A. Intermittency as a universal characteristic of the complete chromosome DNA sequences of eukaryotes: from protozoa to human genomes. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2011; 84:042902. [PMID: 22181210 DOI: 10.1103/physreve.84.042902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2011] [Revised: 04/14/2011] [Indexed: 05/31/2023]
Abstract
Large-scale dynamical properties of complete chromosome DNA sequences of eukaryotes are considered. Using the proposed deterministic models with intermittency and symbolic dynamics we describe a wide spectrum of large-scale patterns inherent in these sequences, such as segmental duplications, tandem repeats, and other complex sequence structures. It is shown that the recently discovered gene number balance on the strands is not of a random nature, and certain subsystems of a complete chromosome DNA sequence exhibit the properties of deterministic chaos.
Collapse
Affiliation(s)
- S Rybalko
- Universite de Franche-Comte UMR CNRS 6174, route de Gray, F-25030 Besancon, France.
| | | | | | | |
Collapse
|
15
|
Computation of mutual information from Hidden Markov Models. Comput Biol Chem 2010; 34:328-33. [DOI: 10.1016/j.compbiolchem.2010.08.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2010] [Revised: 08/30/2010] [Accepted: 08/30/2010] [Indexed: 11/22/2022]
|
16
|
Data Compression Concepts and Algorithms and their Applications to Bioinformatics. ENTROPY 2009; 12:34. [PMID: 20157640 DOI: 10.3390/e12010034] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Data compression at its base is concerned with how information is organized in data. Understanding this organization can lead to efficient ways of representing the information and hence data compression. In this paper we review the ways in which ideas and approaches fundamental to the theory and practice of data compression have been used in the area of bioinformatics. We look at how basic theoretical ideas from data compression, such as the notions of entropy, mutual information, and complexity have been used for analyzing biological sequences in order to discover hidden patterns, infer phylogenetic relationships between organisms and study viral populations. Finally, we look at how inferred grammars for biological sequences have been used to uncover structure in biological sequences.
Collapse
|
17
|
Pokorný J, Hašek J, Jelínek F. Endogenous Electric Field and Organization of Living Matter. Electromagn Biol Med 2009. [DOI: 10.1080/15368370500379566] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
18
|
Blinowska KJ, Trzaskowski B, Kaminski M, Kus R. Multivariate autoregressive model for a study of phylogenetic diversity. Gene 2009; 435:104-18. [PMID: 19393180 DOI: 10.1016/j.gene.2009.01.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2008] [Revised: 12/17/2008] [Accepted: 01/05/2009] [Indexed: 12/01/2022]
Abstract
We present a computationally effective model to parameterize DNA sequences in a way describing comprehensively its auto and cross-correlation structure. The approach is based on four-channel Multivariate Autoregressive Model (MVAR). The model was applied to a study of genes from the globin family for 6 vertebrate species. First, the sequences were coded as four signals (corresponding to the nucleotides), which were fitted to a four-channel MVAR. From the correlation matrices the vectors of model coefficients were calculated as functions of the nucleotide distance. The between-chromosomes and inter-species differences were best distinguished in the cross-coefficients binding different nucleotide sequences. For clustering purposes different metrics were tested and then two clustering procedures (Nearest Neighbor and UPGMA) were applied. The clustering trees and consensus trees were constructed for exons, introns and whole genes. The results were in agreement with the known dependencies between the chromosomes of the globin family. The orthological genes for different species were grouped together. Inside these groups the phylogenetically close organisms were localized in proximity.
Collapse
Affiliation(s)
- K J Blinowska
- Department of Biomedical Physics, Warsaw University, Poland.
| | | | | | | |
Collapse
|
19
|
Gonzalez DL, Giannerini S, Rosa R. Strong short-range correlations and dichotomic codon classes in coding DNA sequences. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2008; 78:051918. [PMID: 19113166 DOI: 10.1103/physreve.78.051918] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/09/2008] [Revised: 10/07/2008] [Indexed: 05/27/2023]
Abstract
The study of correlation structures in DNA sequences is of great interest because it allows us to obtain structural and functional information about underlying genetic mechanisms. In this paper we present a study of the correlation structure of protein coding sequences of DNA based on a recently developed mathematical representation of the genetic code. A fundamental consequence of such representation is that codons can be assigned a parity class (odd-even). Such parity can be obtained by means of a nonlinear algorithm acting on the chemical character of the codon bases. In the same setting the Rumer's class can be naturally described and a new dichotomic class, the hidden class, can be defined. Moreover, we show that the set of DNA's base transformations associated to the three dichotomic classes can be put in a compact group-theoretic framework. We use the dichotomic classes as a coding scheme for DNA sequences and study the mutual dependence between such classes. The same analysis is carried out also on the chemical dichotomies of DNA bases. In both cases, the statistical analysis is performed by using an entropy-based dependence metric possessing many desirable properties. We obtain meaningful tests for mutual dependence by using suitable resampling techniques. We find strong short-range correlations between certain combinations of dichotomic codon classes. These results support our previous hypothesis that codon classes might play an active role in the organization of genetic information.
Collapse
|
20
|
The average mutual information profile as a genomic signature. BMC Bioinformatics 2008; 9:48. [PMID: 18218139 PMCID: PMC2335307 DOI: 10.1186/1471-2105-9-48] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2007] [Accepted: 01/25/2008] [Indexed: 12/19/2022] Open
Abstract
Background Occult organizational structures in DNA sequences may hold the key to understanding functional and evolutionary aspects of the DNA molecule. Such structures can also provide the means for identifying and discriminating organisms using genomic data. Species specific genomic signatures are useful in a variety of contexts such as evolutionary analysis, assembly and classification of genomic sequences from large uncultivated microbial communities and a rapid identification system in health hazard situations. Results We have analyzed genomic sequences of eukaryotic and prokaryotic chromosomes as well as various subtypes of viruses using an information theoretic framework. We confirm the existence of a species specific average mutual information (AMI) profile. We use these profiles to define a very simple, computationally efficient, alignment free, distance measure that reflects the evolutionary relationships between genomic sequences. We use this distance measure to classify chromosomes according to species of origin, to separate and cluster subtypes of the HIV-1 virus, and classify DNA fragments to species of origin. Conclusion AMI profiles of DNA sequences prove to be species specific and easy to compute. The structure of AMI profiles are conserved, even in short subsequences of a species' genome, rendering a pervasive signature. This signature can be used to classify relatively short DNA fragments to species of origin.
Collapse
|
21
|
Abstract
Fast-sequencing throughput methods have increased the number of completely sequenced bacterial genomes to about 400 by December 2006, with the number increasing rapidly. These include several strains. In silico methods of comparative genomics are of use in categorizing and phylogenetically sorting these bacteria. Various word-based tools have been used for quantifying the similarities and differences between entire genomes. The simple di-nucleotide frequency comparison, codon specificity and k-mer repeat detection are among some of the well-known methods. In this paper, we show that the Mutual Information function, which is a measure of correlations and a concept from Information Theory, is very effective in determining the similarities and differences among genome sequences of various strains of bacteria such as the plant pathogen Xylella fastidiosa, marine Cyanobacteria Prochlorococcus marinus or animal and human pathogens such as species of Ehrlichia and Legionella. The short-range three-base periodicity, small sequence repeats and long-range correlations taken together constitute a genome signature that can be used as a technique for identifying new bacterial strains with the help of strains already catalogued in the database. There have been several applications of using the Mutual Information function as a measure of correlations in genomics but this is the first whole genome analysis done to detect strain similarities and differences.
Collapse
Affiliation(s)
- D Swati
- Department of Physics, MMV, Banaras Hindu University, Varanasi 221005, India.
| |
Collapse
|
22
|
Vinga S, Almeida JS. Local Renyi entropic profiles of DNA sequences. BMC Bioinformatics 2007; 8:393. [PMID: 17939871 PMCID: PMC2238722 DOI: 10.1186/1471-2105-8-393] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2007] [Accepted: 10/16/2007] [Indexed: 11/18/2022] Open
Abstract
Background In a recent report the authors presented a new measure of continuous entropy for DNA sequences, which allows the estimation of their randomness level. The definition therein explored was based on the Rényi entropy of probability density estimation (pdf) using the Parzen's window method and applied to Chaos Game Representation/Universal Sequence Maps (CGR/USM). Subsequent work proposed a fractal pdf kernel as a more exact solution for the iterated map representation. This report extends the concepts of continuous entropy by defining DNA sequence entropic profiles using the new pdf estimations to refine the density estimation of motifs. Results The new methodology enables two results. On the one hand it shows that the entropic profiles are directly related with the statistical significance of motifs, allowing the study of under and over-representation of segments. On the other hand, by spanning the parameters of the kernel function it is possible to extract important information about the scale of each conserved DNA region. The computational applications, developed in Matlab m-code, the corresponding binary executables and additional material and examples are made publicly available at . Conclusion The ability to detect local conservation from a scale-independent representation of symbolic sequences is particularly relevant for biological applications where conserved motifs occur in multiple, overlapping scales, with significant future applications in the recognition of foreign genomic material and inference of motif structures.
Collapse
Affiliation(s)
- Susana Vinga
- Instituto de Engenharia de Sistemas e Computadores: Investigação e Desenvolvimento (INESC-ID), R, Alves Redol 9, 1000-029 Lisboa, Portugal.
| | | |
Collapse
|
23
|
Carpena P, Bernaola-Galván P, Coronado AV, Hackenberg M, Oliver JL. Identifying characteristic scales in the human genome. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2007; 75:032903. [PMID: 17500745 DOI: 10.1103/physreve.75.032903] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2006] [Indexed: 05/15/2023]
Abstract
The scale-free, long-range correlations detected in DNA sequences contrast with characteristic lengths of genomic elements, being particularly incompatible with the isochores (long, homogeneous DNA segments). By computing the local behavior of the scaling exponent alpha of detrended fluctuation analysis (DFA), we discriminate between sequences with and without true scaling, and we find that no single scaling exists in the human genome. Instead, human chromosomes show a common compositional structure with two characteristic scales, the large one corresponding to the isochores and the other to small and medium scale genomic elements.
Collapse
Affiliation(s)
- P Carpena
- Departamento de Física Aplicada II, Universidad de Málaga, 29071 Málaga, Spain
| | | | | | | | | |
Collapse
|
24
|
On tests of independence based on minimum -divergence estimator with constraints: An application to modeling DNA. Comput Stat Data Anal 2006. [DOI: 10.1016/j.csda.2005.11.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
25
|
Dehnert M, Helm WE, Hütt MT. Informational structure of two closely related eukaryotic genomes. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2006; 74:021913. [PMID: 17025478 DOI: 10.1103/physreve.74.021913] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2006] [Indexed: 05/12/2023]
Abstract
Attempts to identify a species on the basis of its DNA sequence on purely statistical grounds have been formulated for more than a decade. The most prominent of such genome signatures relies on neighborhood correlations (i.e., dinucleotide frequencies) and, consequently, attributes species identification to mechanisms operating on the dinucleotide level (e.g., neighbor-dependent mutations). For the examples of Mus musculus and Rattus norvegicus we analyze short- and intermediate-range statistical correlations in DNA sequences. These correlation profiles are computed for all chromosomes of the two species. We find that with increasing range of correlations the capacity to distinguish between the species on the basis of this correlation profile is getting better and requires ever shorter sequence segments for obtaining a full species separation. This finding suggests that distinctive traits within the sequence are situated beyond the level of few nucleotides. The large-scale statistical patterning of DNA sequences on which such genome signatures are based is thus substantially determined by mobile elements (e.g., transposons and retrotransposons). The study and interspecies comparison of such correlation profiles can, therefore, reveal features of retrotransposition, segmental duplications, and other processes of genome evolution.
Collapse
Affiliation(s)
- Manuel Dehnert
- Computational Systems Biology, School of Engineering and Science, International University Bremen, Campus Ring 1, D-28759 Bremen, Germany
| | | | | |
Collapse
|
26
|
Affiliation(s)
- Diego Luis Gonzalez
- Laboratorio di acustica musicale e architettonica, CNR-Fondazione Scuola di S. Giorgio, Venezia, Italy.
| | | | | |
Collapse
|
27
|
Dehnert M, Plaumann R, Helm WE, Hütt MT. Genome phylogeny based on short-range correlations in DNA sequences. J Comput Biol 2005; 12:545-53. [PMID: 15952877 DOI: 10.1089/cmb.2005.12.545] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The surprising fact that global statistical properties computed on a genomewide scale may reveal species information has first been observed in studies of dinucleotide frequencies. Here we will look at the same phenomenon with a totally different statistical approach. We show that patterns in the short-range statistical correlations in DNA sequences serve as evolutionary fingerprints of eukaryotes. All chromosomes of a species display the same characteristic pattern, markedly different from those of other species. The chromosomes of a species are sorted onto the same branch of a phylogenetic tree due to this correlation pattern. The average correlation between nucleotides at a distance k is quantified in two independent ways: (i) by estimating it from a higher-order Markov process and (ii) by computing the mutual information function at a distance k. We show how the quality of phylogenetic reconstruction depends on the range of correlation strengths and on the length of the underlying sequence segment. This concept of the correlation pattern as a phylogenetic signature of eukaryote species combines two rather distant domains of research, namely phylogenetic analysis based on molecular observation and the study of the correlation structure of DNA sequences.
Collapse
Affiliation(s)
- Manuel Dehnert
- Bioinformatics Group, Department of Biology, Darmstadt University of Technology, D-64287 Darmstadt, Germany
| | | | | | | |
Collapse
|
28
|
Messer PW, Arndt PF, Lässig M. Solvable sequence evolution models and genomic correlations. PHYSICAL REVIEW LETTERS 2005; 94:138103. [PMID: 15904043 DOI: 10.1103/physrevlett.94.138103] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2004] [Indexed: 05/02/2023]
Abstract
We study a minimal model for genome evolution whose elementary processes are single site mutation, duplication and deletion of sequence regions, and insertion of random segments. These processes are found to generate long-range correlations in the composition of letters as long as the sequence length is growing; i.e., the combined rates of duplications and insertions are higher than the deletion rate. For constant sequence length, on the other hand, all initial correlations decay exponentially. These results are obtained analytically and by simulations. They are compared with the long-range correlations observed in genomic DNA, and the implications for genome evolution are discussed.
Collapse
Affiliation(s)
- Philipp W Messer
- Institute for Theoretical Physics, University of Cologne, Köln, Germany
| | | | | |
Collapse
|
29
|
Li W, Holste D. Universal 1/f noise, crossovers of scaling exponents, and chromosome-specific patterns of guanine-cytosine content in DNA sequences of the human genome. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2005; 71:041910. [PMID: 15903704 DOI: 10.1103/physreve.71.041910] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/28/2004] [Revised: 10/28/2004] [Indexed: 05/02/2023]
Abstract
Spatial fluctuations of guanine and cytosine base content (GC%) are studied by spectral analysis for the complete set of human genomic DNA sequences. We find that (i) 1/ f(alpha) decay is universally observed in the power spectra of all 24 chromosomes, and (ii) the exponent alpha approximately 1 extends to about 10(7) bases, one order of magnitude longer than has previously been observed. We further find that (iii) almost all human chromosomes exhibit a crossover from alpha(1) approximately 1 (1/ f (alpha(1))) at lower frequency to alpha(2) <1 (1/ f (alpha(2))) at higher frequency, typically occurring at around 30,000-100,000 bases, while (iv) the crossover in this frequency range is virtually absent in human chromosome 22. In addition to the universal 1/ f(alpha) noise in power spectra, we find (v) several lines of evidence for chromosome-specific correlation structures, including a 500,000 base long oscillation in human chromosome 21. The universal 1/ f(alpha) spectrum in the human genome is further substantiated by a resistance to reduction in variance of guanine and cytosine content when the window size is increased.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, North Shore LIJ Institute for Medical Research, 350 Community Drive, Manhasset, New York 10030, USA.
| | | |
Collapse
|
30
|
Dehnert M, Helm WE, Hütt MT. Information theory reveals large-scale synchronisation of statistical correlations in eukaryote genomes. Gene 2005; 345:81-90. [PMID: 15716116 DOI: 10.1016/j.gene.2004.11.026] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2004] [Revised: 10/18/2004] [Accepted: 11/09/2004] [Indexed: 11/20/2022]
Abstract
We study short-range correlations in DNA sequences with methods from information theory and statistics. We find a persisting degree of identity between the correlation patterns of different chromosomes of a species. Except for the case of human and chimpanzee inter-species differences in this correlation pattern allow robust species distinction: in a clustering tree based upon the correlation curves on the level of individual chromosomes distinct clusters for the individual species are found. This capacity of distinguishing species persists, even when the length of the underlying sequences is drastically reduced. In comparison to the standard tool for studying symbol correlations in DNA sequences, namely the mutual information function, we find that an autoregressive model for higher order Markov processes significantly improves species distinction due to an implicit subtraction of random background.
Collapse
Affiliation(s)
- Manuel Dehnert
- Bioinformatics Group, Department of Biology, Darmstadt University of Technology, D-64287 Darmstadt, Germany
| | | | | |
Collapse
|
31
|
Li W, Holste D. An unusual 500,000 bases long oscillation of guanine and cytosine content in human chromosome 21. Comput Biol Chem 2004; 28:393-9. [PMID: 15556480 DOI: 10.1016/j.compbiolchem.2004.09.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2004] [Revised: 09/30/2004] [Accepted: 09/30/2004] [Indexed: 01/09/2023]
Abstract
An oscillation with a period of around 500 kb in guanine and cytosine content (GC%) is observed in the DNA sequence of human chromosome 21. This oscillation is localized in the rightmost one-eighth region of the chromosome, from 43.5 Mb to 46.5 Mb. Five cycles of oscillation are observed in this region with six GC-rich peaks and five GC-poor valleys. The GC-poor valleys comprise regions with low density of CpG islands and, alternating between the two DNA strands, low gene density regions. Consequently, the long-range oscillation of GC% result in spacing patterns of both CpG island density, and to a lesser extent, gene densities.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, North Shore LIJ Institute for Medical Research, 350 Community Drive, Manhasset, NY 11030, USA.
| | | |
Collapse
|
32
|
Mrowka R, Patzak A, Herzel H, Holste D. Sequence-related human proteins cluster by degree of evolutionary conservation. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2004; 70:051908. [PMID: 15600657 DOI: 10.1103/physreve.70.051908] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2004] [Indexed: 05/24/2023]
Abstract
Gene duplication followed by adaptive evolution is thought to be a central mechanism for the emergence of novel genes. To illuminate the contribution of duplicated protein-coding sequences to the complexity of the human genome, we study the connectivity of pairwise sequence-related human proteins and construct a network (N) of linked protein sequences with shared similarities. We find that (i) the connectivity distribution P (k) for k sequence-related proteins decays as a power law P (k) approximately k(-gamma) with gamma approximately 1.2 , (ii) the top rank of N consists of a single large cluster of proteins ( approximately 70%) , while bottom ranks consist of multiple isolated clusters, and (iii) structural characteristics of N show both a high degree of clustering and an intermediate connectivity ("small-world" features). We gain further insight into structural properties of N by studying the relationship between the connectivity distribution and the phylogenetic conservation of proteins in bacteria, plants, invertebrates, and vertebrates. We find that (iv) the proportion of sequence-related proteins increases with increasing extent of evolutionary conservation. Our results support that small-world network properties constitute a footprint of an evolutionary mechanism and extend the traditional interpretation of protein families.
Collapse
Affiliation(s)
- Ralf Mrowka
- Systems Biology Group, Department of Physiology, Charité Universitätsmedizin Berlin, Tucholskystrasse 2, 10117 Berlin, Germany.
| | | | | | | |
Collapse
|
33
|
Schieg P, Herzel H. Periodicities of 10–11bp as Indicators of the Supercoiled State of Genomic DNA. J Mol Biol 2004; 343:891-901. [PMID: 15476808 DOI: 10.1016/j.jmb.2004.08.068] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2003] [Revised: 03/30/2004] [Accepted: 08/10/2004] [Indexed: 11/21/2022]
Abstract
DNA sequences contain information about the bendability and native conformation of DNA. For example, a repetition of certain dinucleotides at distances of 10-11bp supports wrapping around nucleosomes and supercoiled structures of bacterial DNA. We analyzed 86 eubacterial genomes, 16 archaea, and six genomes of higher eukaryotes. First, we discuss whether or not the observed periodicities represent indeed bendability signals. This claim is confirmed since: (1) dinucleotide signals are of comparable size to mononucleotide signals, (2) the signals are present in non-coding DNA as well, and (3) repeat masking has only a minor effect on 10-11bp periodicities. Moreover, the periodicities persist up to 150bp, comparable to the nucleosome size. We show that doublet peaks in Caenorhabditis elegans and some prokaryotes can be traced back to long-ranging modulations. In mammalian genomes, we find consistently spectral peaks as observed earlier in human chromosomes 20, 21 and 22. It has been shown in previous studies that archaea have periods of 10bp, whereas eubacteria exhibit 11bp periodicities. These differences reflect different supercoiled states of microbial DNA. Is the period of 10bp an archaeal or a thermophilic feature? This question is addressed by relating periodicities to optimal growth temperatures. It turns out that the archaea Methanopyrus kandleri (t(opt)=80 degrees C) and a Halobacterium strain (t(opt)=42 degrees C) both have longer periods of about 11bp. Eubacterial genomes have consistently periods around 11bp indicative of negative supercoiling.
Collapse
Affiliation(s)
- Patrick Schieg
- Institute for Theoretical Biology, Humboldt University, Invalidenstr. 43, 10115 Berlin, Germany
| | | |
Collapse
|
34
|
Roche S, Bicout D, Maciá E, Kats E. Long range correlations in DNA: scaling properties and charge transfer efficiency. PHYSICAL REVIEW LETTERS 2003; 91:228101. [PMID: 14683275 DOI: 10.1103/physrevlett.91.228101] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2003] [Indexed: 05/24/2023]
Abstract
We address the relation between long-range correlations and charge transfer efficiency in aperiodic artificial or genomic DNA sequences. Coherent charge transfer through the highest occupied molecular orbital states of the guanine nucleotide is studied using the transmission approach, and the focus is on how the sequence-dependent backscattering profile can be inferred from correlations between base pairs.
Collapse
Affiliation(s)
- Stephan Roche
- Commissariat à l'Energie Atomique, DSM/DRFMC/SPSMS, 17 avenue des Martyrs, 38054 Grenoble, France
| | | | | | | |
Collapse
|