1
|
Long-Range Dependence and Multifractality of Ship Flow Sequences in Container Ports: A Comparison of Shanghai, Singapore, and Rotterdam. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app112110378] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The prediction of ship traffic flow is an important fundamental preparation for layout and design of ports as well as management of ship navigation. However, until now, the temporal characteristics and accurate prediction of ship flow sequence in port are rarely studied. Therefore, in this study, we investigated the presence of long-range dependence in container ship flow sequences using the Multifractal Detrended Fluctuation Analysis (MF-DFA). We considered three representative container ports in the world—including Shanghai, Singapore, and Rotterdam container ports—as the study sample, from 1 January 2013 to 31 December 2017. Empirical results suggested that the ship flow sequences are deviated from normal distribution, and the sequences with different time scales exhibited varying degrees of long-range dependence. Furthermore, the ship flow sequences possessed a multifractal nature, where the larger the time scale of ship flow time series, the stronger the multifractal characteristics are. The weekly ship flow sequence in the port of Singapore owned the highest degree of multifractality. Furthermore, the multifractality presented in the ship flow sequences of container ports are due to the correlation properties as well as the probability density function of the ship flow sequences. The study outlines the importance of adopting these features for an accurate modeling and prediction for maritime ship flow series.
Collapse
|
2
|
Somatic and Germline Mutation Periodicity Follow the Orientation of the DNA Minor Groove around Nucleosomes. Cell 2019; 175:1074-1087.e18. [PMID: 30388444 DOI: 10.1016/j.cell.2018.10.004] [Citation(s) in RCA: 69] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Revised: 08/27/2018] [Accepted: 10/01/2018] [Indexed: 12/11/2022]
Abstract
Mutation rates along the genome are highly variable and influenced by several chromatin features. Here, we addressed how nucleosomes, the most pervasive chromatin structure in eukaryotes, affect the generation of mutations. We discovered that within nucleosomes, the somatic mutation rate across several tumor cohorts exhibits a strong 10 base pair (bp) periodicity. This periodic pattern tracks the alternation of the DNA minor groove facing toward and away from the histones. The strength and phase of the mutation rate periodicity are determined by the mutational processes active in tumors. We uncovered similar periodic patterns in the genetic variation among human and Arabidopsis populations, also detectable in their divergence from close species, indicating that the same principles underlie germline and somatic mutation rates. We propose that differential DNA damage and repair processes dependent on the minor groove orientation in nucleosome-bound DNA contribute to the 10-bp periodicity in AT/CG content in eukaryotic genomes.
Collapse
|
3
|
Battail G. Error-correcting codes and information in biology. Biosystems 2019; 184:103987. [PMID: 31295534 DOI: 10.1016/j.biosystems.2019.103987] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2019] [Revised: 06/20/2019] [Accepted: 06/27/2019] [Indexed: 11/19/2022]
Abstract
Shannon's channel coding theorem (1948), a major result of information theory, paradoxically states that errorless communication is possible using an unreliable channel. Since then, engineers developed many error-correcting codes and decoding algorithms. A performance close to the predicted one was eventually achieved no earlier than the beginning of the nineties. Many communication facilities would not exist without error-correcting codes, e.g., mobile telephony and terrestrial digital television. This article explains first how they work without mathematical formalism. An error-correcting code is a minority subset among some set of messages. Within this subset, the messages are sufficiently different from each other to be exactly identified even if a number of their symbols, up to a certain limit, are changed. Beyond this limit, another message can be erroneously identified. An error-correcting code is interpreted as a set of messages subjected to constraints which make their symbols mutually dependent. Although mathematical constraints are conveniently used in engineering, constraints of any other kind, possibly of natural origin, can generate error-correcting codes. Biologists implicitly assume that genomes were conserved during the geological ages, without realizing that this is impossible without error-correcting means. Symbol errors occur during replication of a genome; chemical reactions and radiations are other sources of errors. Their number increases with time in the absence of correction. A genomic code will exactly regenerate the genome provided its decoding is attempted after a short enough time interval. If the number of errors is too large, however, the decoded genome will differ from the initial one and a mutation will occur. Periodically attempted decodings thus will conserve a genome except for very infrequent mutations if decoding attempts are frequent enough. The better conservation of very ancient parts of genomes, like the HOX genes, cannot be explained unless assuming that a genomic error-correcting code resulting from a stepwise encoding exists: a first encoding was followed later by a second one where a new information and the result of the first encoding were jointly encoded, and this process was repeated several times, eventually resulting in an overall code made of nested components where the older is an information, the better it is protected. Organic codes in Barbieri's meaning result from the same process and have the same structure. Any new organic code induces new genomic constraints, hence new components in a nested system of codes. Organic codes may thus be identified with the system of nested error-correcting codes needed to conserve the genetic information. A majority of biologists deny that information theory can be useful to them. It is shown on the contrary that the living world cannot be understood if the scientific concept of information is ignored. Heredity makes the present communicate with the past, and as a communication process is relevant to information theory, which is thus a necessary basis of biology besides physics and chemistry. The nested genomic error-correcting codes which are needed for conserving the genetic information account for the hierarchical taxonomy which structures the living world. Moreover, the main features of biological evolution, including its trend towards increasing complexity, find an explanation within this framework. Incorporating the scientific concept of information and the science based on it in the foundations of biology can widely renew the discipline but meets epistemological difficulties which must be overcome.
Collapse
Affiliation(s)
- Gérard Battail
- École nationale supérieure des Télécommunications de Paris, France.
| |
Collapse
|
4
|
Chuang HM, Reifenberger JG, Cao H, Dorfman KD. Sequence-Dependent Persistence Length of Long DNA. PHYSICAL REVIEW LETTERS 2017; 119:227802. [PMID: 29286779 PMCID: PMC5839665 DOI: 10.1103/physrevlett.119.227802] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Indexed: 05/04/2023]
Abstract
Using a high-throughput genome-mapping approach, we obtained circa 50 million measurements of the extension of internal human DNA segments in a 41 nm×41 nm nanochannel. The underlying DNA sequences, obtained by mapping to the reference human genome, are 2.5-393 kilobase pairs long and contain percent GC contents between 32.5% and 60%. Using Odijk's theory for a channel-confined wormlike chain, these data reveal that the DNA persistence length increases by almost 20% as the percent GC content increases. The increased persistence length is rationalized by a model, containing no adjustable parameters, that treats the DNA as a statistical terpolymer with a sequence-dependent intrinsic persistence length and a sequence-independent electrostatic persistence length.
Collapse
Affiliation(s)
- Hui-Min Chuang
- Department of Chemical Engineering and Materials Science, University of Minnesota-Twin Cities, 421 Washington Avenue SE, Minneapolis, Minnesota 55455, USA
| | | | - Han Cao
- BioNano Genomics, 9640 Towne Centre Drive, Suite 100, San Diego, California 92121, USA
| | - Kevin D Dorfman
- Department of Chemical Engineering and Materials Science, University of Minnesota-Twin Cities, 421 Washington Avenue SE, Minneapolis, Minnesota 55455, USA
| |
Collapse
|
5
|
Bogachev MI, Markelov OA, Kayumov AR, Bunde A. Superstatistical model of bacterial DNA architecture. Sci Rep 2017; 7:43034. [PMID: 28225058 PMCID: PMC5320525 DOI: 10.1038/srep43034] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2016] [Accepted: 01/18/2017] [Indexed: 12/15/2022] Open
Abstract
Understanding the physical principles that govern the complex DNA structural organization as well as its mechanical and thermodynamical properties is essential for the advancement in both life sciences and genetic engineering. Recently we have discovered that the complex DNA organization is explicitly reflected in the arrangement of nucleotides depicted by the universal power law tailed internucleotide interval distribution that is valid for complete genomes of various prokaryotic and eukaryotic organisms. Here we suggest a superstatistical model that represents a long DNA molecule by a series of consecutive ~150 bp DNA segments with the alternation of the local nucleotide composition between segments exhibiting long-range correlations. We show that the superstatistical model and the corresponding DNA generation algorithm explicitly reproduce the laws governing the empirical nucleotide arrangement properties of the DNA sequences for various global GC contents and optimal living temperatures. Finally, we discuss the relevance of our model in terms of the DNA mechanical properties. As an outlook, we focus on finding the DNA sequences that encode a given protein while simultaneously reproducing the nucleotide arrangement laws observed from empirical genomes, that may be of interest in the optimization of genetic engineering of long DNA molecules.
Collapse
Affiliation(s)
- Mikhail I. Bogachev
- Biomedical Engineering Research Centre, St. Petersburg Electrotechnical University, St. Petersburg, 197376, Russia
- Molecular Genetics of Microorganisms Lab, Institute of Fundamental Medicine and Biology, Kazan (Volga Region) Federal University, Kazan, Tatarstan, 420008, Russia
| | - Oleg A. Markelov
- Biomedical Engineering Research Centre, St. Petersburg Electrotechnical University, St. Petersburg, 197376, Russia
| | - Airat R. Kayumov
- Molecular Genetics of Microorganisms Lab, Institute of Fundamental Medicine and Biology, Kazan (Volga Region) Federal University, Kazan, Tatarstan, 420008, Russia
| | - Armin Bunde
- Institut für Theoretische Physik, Justus-Liebig-Universität Giessen, 35392 Giessen, Germany
| |
Collapse
|
6
|
Statistical prediction of protein structural, localization and functional properties by the analysis of its fragment mass distributions after proteolytic cleavage. Sci Rep 2016; 6:22286. [PMID: 26924271 PMCID: PMC4770285 DOI: 10.1038/srep22286] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2015] [Accepted: 02/11/2016] [Indexed: 12/03/2022] Open
Abstract
Structural, localization and functional properties of unknown proteins are often being predicted from their primary polypeptide chains using sequence alignment with already characterized proteins and consequent molecular modeling. Here we suggest an approach to predict various structural and structure-associated properties of proteins directly from the mass distributions of their proteolytic cleavage fragments. For amino-acid-specific cleavages, the distributions of fragment masses are determined by the distributions of inter-amino-acid intervals in the protein, that in turn apparently reflect its structural and structure-related features. Large-scale computer simulations revealed that for transmembrane proteins, either α-helical or β -barrel secondary structure could be predicted with about 90% accuracy after thermolysin cleavage. Moreover, 3/4 intrinsically disordered proteins could be correctly distinguished from proteins with fixed three-dimensional structure belonging to all four SCOP structural classes by combining 3–4 different cleavages. Additionally, in some cases the protein cellular localization (cytosolic or membrane-associated) and its host organism (Firmicute or Proteobacteria) could be predicted with around 80% accuracy. In contrast to cytosolic proteins, for membrane-associated proteins exhibiting specific structural conformations, their monotopic or transmembrane localization and functional group (ATP-binding, transporters, sensors and so on) could be also predicted with high accuracy and particular robustness against missing cleavages.
Collapse
|
7
|
Zhang X, Shen Z, Zhang G, Shen Y, Chen M, Zhao J, Wu R. Short Exon Detection via Wavelet Transform Modulus Maxima. PLoS One 2016; 11:e0163088. [PMID: 27635656 PMCID: PMC5026382 DOI: 10.1371/journal.pone.0163088] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Accepted: 09/04/2016] [Indexed: 02/05/2023] Open
Abstract
The detection of short exons is a challenging open problem in the field of bioinformatics. Due to the fact that the weakness of existing model-independent methods lies in their inability to reliably detect small exons, a model-independent method based on the singularity detection with wavelet transform modulus maxima has been developed for detecting short coding sequences (exons) in eukaryotic DNA sequences. In the analysis of our method, the local maxima can capture and characterize singularities of short exons, which helps to yield significant patterns that are rarely observed with the traditional methods. In order to get some information about singularities on the differences between the exon signal and the background noise, the noise level is estimated by filtering the genomic sequence through a notch filter. Meanwhile, a fast method based on a piecewise cubic Hermite interpolating polynomial is applied to reconstruct the wavelet coefficients for improving the computational efficiency. In addition, the output measure of a paired-numerical representation calculated in both forward and reverse directions is used to incorporate a useful DNA structural property. The performances of our approach and other techniques are evaluated on two benchmark data sets. Experimental results demonstrate that the proposed method outperforms all assessed model-independent methods for detecting short exons in terms of evaluation metrics.
Collapse
Affiliation(s)
- Xiaolei Zhang
- Shantou University Medical College, Shantou, P.R. China
| | - Zhiwei Shen
- Department of Radiology, Second Affiliated Hospital of Shantou University Medical College, Shantou, P.R. China
| | - Guishan Zhang
- College of Engineering, Shantou University, Shantou, P.R. China
| | - Yuanyu Shen
- Department of Radiology, Second Affiliated Hospital of Shantou University Medical College, Shantou, P.R. China
| | - Miaomiao Chen
- Department of Radiology, Second Affiliated Hospital of Shantou University Medical College, Shantou, P.R. China
| | - Jiaxiang Zhao
- College of Electronic Information and Optical Engineering, Nankai University, Tianjin, P.R. China
- * E-mail: (JXZ); (RHW)
| | - Renhua Wu
- Department of Radiology, Second Affiliated Hospital of Shantou University Medical College, Shantou, P.R. China
- * E-mail: (JXZ); (RHW)
| |
Collapse
|
8
|
Colliva A, Pellegrini R, Testori A, Caselle M. Ising-model description of long-range correlations in DNA sequences. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2015; 91:052703. [PMID: 26066195 DOI: 10.1103/physreve.91.052703] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2014] [Indexed: 06/04/2023]
Abstract
We model long-range correlations of nucleotides in the human DNA sequence using the long-range one-dimensional (1D) Ising model. We show that, for distances between 10(3) and 10(6) bp, the correlations show a universal behavior and may be described by the non-mean-field limit of the long-range 1D Ising model. This allows us to make some testable hypothesis on the nature of the interaction between distant portions of the DNA chain which led to the DNA structure that we observe today in higher eukaryotes.
Collapse
Affiliation(s)
- A Colliva
- Dipartimento di Fisica dell'Università di Torino and I.N.F.N. sez. di Torino, Via Pietro Giuria 1, I-10125 Torino, Italy
| | - R Pellegrini
- Physics Department, Swansea University, Singleton Park, Swansea SA2 8PP, UK
| | - A Testori
- Dipartimento di Fisica dell'Università di Torino and I.N.F.N. sez. di Torino, Via Pietro Giuria 1, I-10125 Torino, Italy
| | - M Caselle
- Dipartimento di Fisica dell'Università di Torino and I.N.F.N. sez. di Torino, Via Pietro Giuria 1, I-10125 Torino, Italy
| |
Collapse
|
9
|
Drillon G, Audit B, Argoul F, Arneodo A. Ubiquitous human 'master' origins of replication are encoded in the DNA sequence via a local enrichment in nucleosome excluding energy barriers. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2015; 27:064102. [PMID: 25563930 DOI: 10.1088/0953-8984/27/6/064102] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
As the elementary building block of eukaryotic chromatin, the nucleosome is at the heart of the compromise between the necessity of compacting DNA in the cell nucleus and the required accessibility to regulatory proteins. The recent availability of genome-wide experimental maps of nucleosome positions for many different organisms and cell types has provided an unprecedented opportunity to elucidate to what extent the DNA sequence conditions the primary structure of chromatin and in turn participates in the chromatin-mediated regulation of nuclear functions, such as gene expression and DNA replication. In this study, we use in vivo and in vitro genome-wide nucleosome occupancy data together with the set of nucleosome-free regions (NFRs) predicted by a physical model of nucleosome formation based on sequence-dependent bending properties of the DNA double-helix, to investigate the role of intrinsic nucleosome occupancy in the regulation of the replication spatio-temporal programme in human. We focus our analysis on the so-called replication U/N-domains that were shown to cover about half of the human genome in the germline (skew-N domains) as well as in embryonic stem cells, somatic and HeLa cells (mean replication timing U-domains). The 'master' origins of replication (MaOris) that border these megabase-sized U/N-domains were found to be specified by a few hundred kb wide regions that are hyper-sensitive to DNase I cleavage, hypomethylated, and enriched in epigenetic marks involved in transcription regulation, the hallmarks of localized open chromatin structures. Here we show that replication U/N-domain borders that are conserved in all considered cell lines have an environment highly enriched in nucleosome-excluding-energy barriers, suggesting that these ubiquitous MaOris have been selected during evolution. In contrast, MaOris that are cell-type-specific are mainly regulated epigenetically and are no longer favoured by a local abundance of intrinsic NFRs encoded in the DNA sequence. At the smaller few hundred bp scale of gene promoters, CpG-rich promoters of housekeeping genes found nearby ubiquitous MaOris as well as CpG-poor promoters of tissue-specific genes found nearby cell-type-specific MaOris, both correspond to in vivo NFRs that are not coded as nucleosome-excluding-energy barriers. Whereas the former promoters are likely to correspond to high occupancy transcription factor binding regions, the latter are an illustration that gene regulation in human is typically cell-type-specific.
Collapse
Affiliation(s)
- Guénola Drillon
- Université de Lyon, F-69000 Lyon, France. Laboratoire de Physique, CNRS UMR 5672, École Normale Supérieure de Lyon, F-69007 Lyon, France
| | | | | | | |
Collapse
|
10
|
Bogachev MI, Kayumov AR, Bunde A. Universal internucleotide statistics in full genomes: a footprint of the DNA structure and packaging? PLoS One 2014; 9:e112534. [PMID: 25438044 PMCID: PMC4249851 DOI: 10.1371/journal.pone.0112534] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2014] [Accepted: 10/07/2014] [Indexed: 11/18/2022] Open
Abstract
Uncovering the fundamental laws that govern the complex DNA structural organization remains challenging and is largely based upon reconstructions from the primary nucleotide sequences. Here we investigate the distributions of the internucleotide intervals and their persistence properties in complete genomes of various organisms from Archaea and Bacteria to H. Sapiens aiming to reveal the manifestation of the universal DNA architecture. We find that in all considered organisms the internucleotide interval distributions exhibit the same -exponential form. While in prokaryotes a single -exponential function makes the best fit, in eukaryotes the PDF contains additionally a second -exponential, which in the human genome makes a perfect approximation over nearly 10 decades. We suggest that this functional form is a footprint of the heterogeneous DNA structure, where the first -exponential reflects the universal helical pitch that appears both in pro- and eukaryotic DNA, while the second -exponential is a specific marker of the large-scale eukaryotic DNA organization.
Collapse
Affiliation(s)
- Mikhail I. Bogachev
- Radio Systems Department & Biomedical Engineering Research Center, Saint Petersburg Electrotechnical University, Saint Petersburg, Russia
- * E-mail:
| | - Airat R. Kayumov
- Department of Genetics & Institute of Fundamental Medicine and Biology, Kazan (Volga Region) Federal University, Kazan, Tatarstan, Russia
| | - Armin Bunde
- Institut für Theoretische Physik, Justus-Liebig-Universität Giessen, Giessen, Hessen, Germany
| |
Collapse
|
11
|
Audit B, Baker A, Chen CL, Rappailles A, Guilbaud G, Julienne H, Goldar A, d'Aubenton-Carafa Y, Hyrien O, Thermes C, Arneodo A. Multiscale analysis of genome-wide replication timing profiles using a wavelet-based signal-processing algorithm. Nat Protoc 2012; 8:98-110. [PMID: 23237832 DOI: 10.1038/nprot.2012.145] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
In this protocol, we describe the use of the LastWave open-source signal-processing command language (http://perso.ens-lyon.fr/benjamin.audit/LastWave/) for analyzing cellular DNA replication timing profiles. LastWave makes use of a multiscale, wavelet-based signal-processing algorithm that is based on a rigorous theoretical analysis linking timing profiles to fundamental features of the cell's DNA replication program, such as the average replication fork polarity and the difference between replication origin density and termination site density. We describe the flow of signal-processing operations to obtain interactive visual analyses of DNA replication timing profiles. We focus on procedures for exploring the space-scale map of apparent replication speeds to detect peaks in the replication timing profiles that represent preferential replication initiation zones, and for delimiting U-shaped domains in the replication timing profile. In comparison with the generally adopted approach that involves genome segmentation into regions of constant timing separated by timing transition regions, the present protocol enables the recognition of more complex patterns of the spatio-temporal replication program and has a broader range of applications. Completing the full procedure should not take more than 1 h, although learning the basics of the program can take a few hours and achieving full proficiency in the use of the software may take days.
Collapse
|
12
|
Spiriti J, van der Vaart A. DNA Bending through Roll Angles Is Independent of Adjacent Base Pairs. J Phys Chem Lett 2012; 3:3029-3033. [PMID: 26292244 DOI: 10.1021/jz301227y] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
We have studied DNA bending for a wide range of DNA sequences by two-dimensional adaptive umbrella sampling simulations on adjacent roll angles. Calculated free energy surfaces are largely additive and can be well approximated by the sum of the one-dimensional free energy surfaces. Cooperativity between adjacent roll angles was found to be negligible: less than 1.0 kcal/mol and a small fraction of the overall bending energy. Our calculations validate the assumptions underlying many popular coarse-grained models for DNA bending, and demonstrate their theoretical validity for investigating DNA bending.
Collapse
Affiliation(s)
- Justin Spiriti
- Department of Chemistry, University of South Florida, 4202 East Fowler Avenue CHE 205, Tampa, Florida 33620, United States
| | - Arjan van der Vaart
- Department of Chemistry, University of South Florida, 4202 East Fowler Avenue CHE 205, Tampa, Florida 33620, United States
| |
Collapse
|
13
|
Chua GH, Krishnan A, Li KB, Tomita M. MULTIRESOLUTION ANALYSIS UNCOVERS HIDDEN CONSERVATION OF PROPERTIES IN STRUCTURALLY AND FUNCTIONALLY SIMILAR PROTEINS. J Bioinform Comput Biol 2011; 4:1245-67. [PMID: 17245813 DOI: 10.1142/s0219720006002442] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2006] [Revised: 09/13/2006] [Accepted: 09/13/2006] [Indexed: 11/18/2022]
Abstract
Physicochemcial properties of amino acids are important factors in determining protein structure and function. Most approaches make use of averaged properties over entire domains or even proteins to analyze their structure or function. This level of coarseness tends to hide the richness of the variability in the different properties across functional domains. This paper studies the conservation of physicochemical properties in a functionally similar family of proteins using a novel wavelet-based technique known as multiresolution analysis. Such an analysis can help uncover characteristics that can otherwise remain hidden. We have studied the protein kinase family of sequences and our findings are as follows: (a) a number of different properties are conserved over the functional catalytic domain irrespective of the sequence identities; (b) conservation of properties can be observed at different frequency levels and they agree well with the known structural/functional properties of the subdomains for the protein kinase family; (c) structural differences between the different kinase family members are reflected in the waveforms; and (d) functionally important mutations show distortions in the waveforms of conserved properties. The potential usefulness of the above findings in identifying functionally similar sequences in the twilight and midnight zones is demonstrated through a simple prediction model for the protein kinase family which achieved a recall of 93.7% and a precision of 96.75% in cross-validation tests.
Collapse
Affiliation(s)
- Gek-Huey Chua
- Bioinformatics Institute, 30, Biopolis Street, #07-01, Matrix, Singapore
| | | | | | | |
Collapse
|
14
|
CAGO: a software tool for dynamic visual comparison and correlation measurement of genome organization. PLoS One 2011; 6:e27080. [PMID: 22114666 PMCID: PMC3219657 DOI: 10.1371/journal.pone.0027080] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2011] [Accepted: 10/10/2011] [Indexed: 11/26/2022] Open
Abstract
CAGO (Comparative Analysis of Genome Organization) is developed to address two critical shortcomings of conventional genome atlas plotters: lack of dynamic exploratory functions and absence of signal analysis for genomic properties. With dynamic exploratory functions, users can directly manipulate chromosome tracks of a genome atlas and intuitively identify distinct genomic signals by visual comparison. Signal analysis of genomic properties can further detect inconspicuous patterns from noisy genomic properties and calculate correlations between genomic properties across various genomes. To implement dynamic exploratory functions, CAGO presents each genome atlas in Scalable Vector Graphics (SVG) format and allows users to interact with it using a SVG viewer through JavaScript. Signal analysis functions are implemented using R statistical software and a discrete wavelet transformation package waveslim. CAGO is not only a plotter for generating complex genome atlases, but also a platform for exploring genome atlases with dynamic exploratory functions for visual comparison and with signal analysis for comparing genomic properties across multiple organisms. The web-based application of CAGO, its source code, user guides, video demos, and live examples are publicly available and can be accessed at http://cbs.ym.edu.tw/cago.
Collapse
|
15
|
Moukhtar J, Vaillant C, Audit B, Arneodo A. Revisiting polymer statistical physics to account for the presence of long-range-correlated structural disorder in 2D DNA chains. THE EUROPEAN PHYSICAL JOURNAL. E, SOFT MATTER 2011; 34:119. [PMID: 22083495 DOI: 10.1140/epje/i2011-11119-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2011] [Accepted: 10/11/2011] [Indexed: 05/31/2023]
Abstract
We elaborate on a generalization of the 2D wormlike chain (WLC) model that accounts for the presence of long-range correlations (LRC) in the intrinsic curvature distribution of eukaryotic DNA. This model predicts some decrease of the DNA persistence length resulting from some large-scale intrinsic curvature induced by sequence-dependent persistent random distribution of local bending sites. When assisting exact analytical calculations by numerical DNA simulations, we show that the conjugated contributions of i) the thermal curvature fluctuations characterized by the "dynamic" persistence length ℓ(p)(d) = 2A, where A is the elastic bending modulus, and ii) the intrinsic LRC curvature disorder of amplitude σ(o) and Hurst exponent H > 1/2, characterized by a "static" persistence length ℓ(p)(H) = A(1/2H)σ(o)(-1/H) Γ(1/2H + 1), can be described by a continuum of generalized WLC (GWLC) models parametrized by the LRC exponent H. We use perturbation analysis to investigate the two limiting cases of weak static disorder (w(H) << 1 and weak dynamical fluctuations (1/w (H) << 1), where w(H) = l(p)(d)/l(p)(H) is a dimensionless parameter. From a quantitative point of view, our study demonstrates that even for a small value of the LRC (H approximately equal 0.6-0.8) static disorder amplitude σ(o) ~ 10(-2), as previously reported for genomic DNA, the decrease of the persistence length from the WLC prediction l(p)(d) can be very significant, up to twofold. The implications of these results on the first steps of compaction of DNA in eukaryotic cells are discussed.
Collapse
|
16
|
Chevereau G, Arneodo A, Vaillant C. Influence of the genomic sequence on the primary structure of chromatin. FRONTIERS IN LIFE SCIENCE 2011. [DOI: 10.1080/21553769.2012.708882] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
17
|
Athanasopoulou L, Athanasopoulos S, Karamanos K, Almirantis Y. Scaling properties and fractality in the distribution of coding segments in eukaryotic genomes revealed through a block entropy approach. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2010; 82:051917. [PMID: 21230510 DOI: 10.1103/physreve.82.051917] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2010] [Revised: 09/19/2010] [Indexed: 05/30/2023]
Abstract
Statistical methods, including block entropy based approaches, have already been used in the study of long-range features of genomic sequences seen as symbol series, either considering the full alphabet of the four nucleotides or the binary purine or pyrimidine character set. Here we explore the alternation of short protein-coding segments with long noncoding spacers in entire chromosomes, focusing on the scaling properties of block entropy. In previous studies, it has been shown that the sizes of noncoding spacers follow power-law-like distributions in most chromosomes of eukaryotic organisms from distant taxa. We have developed a simple evolutionary model based on well-known molecular events (segmental duplications followed by elimination of most of the duplicated genes) which reproduces the observed linearity in log-log plots. The scaling properties of block entropy H(n) have been studied in several works. Their findings suggest that linearity in semilogarithmic scale characterizes symbol sequences which exhibit fractal properties and long-range order, while this linearity has been shown in the case of the logistic map at the Feigenbaum accumulation point. The present work starts with the observation that the block entropy of the Cantor-like binary symbol series scales in a similar way. Then, we perform the same analysis for the full set of human chromosomes and for several chromosomes of other eukaryotes. A similar but less extended linearity in semilogarithmic scale, indicating fractality, is observed, while randomly formed surrogate sequences clearly lack this type of scaling. Genomic sequences always present entropy values much lower than their random surrogates. Symbol sequences produced by the aforementioned evolutionary model follow the scaling found in genomic sequences, thus corroborating the conjecture that "segmental duplication-gene elimination" dynamics may have contributed to the observed long rangeness in the coding or noncoding alternation in genomes.
Collapse
|
18
|
Crato N, Linhares RR, Lopes SR. Statistical properties of detrended fluctuation analysis. J STAT COMPUT SIM 2010. [DOI: 10.1080/00949650902755152] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
19
|
Moukhtar J, Faivre-Moskalenko C, Milani P, Audit B, Vaillant C, Fontaine E, Mongelard F, Lavorel G, St-Jean P, Bouvet P, Argoul F, Arneodo A. Effect of Genomic Long-Range Correlations on DNA Persistence Length: From Theory to Single Molecule Experiments. J Phys Chem B 2010; 114:5125-43. [DOI: 10.1021/jp911031y] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Julien Moukhtar
- Université de Lyon, F-69000 Lyon, France, Laboratoire Joliot-Curie and Laboratoire de Physique, CNRS/Ecole Normale Supérieure de Lyon, 46 allée d’Italie, F-69007 Lyon, France, and Laboratoire Joliot-Curie and Laboratoire de Biologie Moléculaire de la Cellule, CNRS/Ecole Normale Supérieure de Lyon, 46 allée d’Italie, F-69007 Lyon, France
| | - Cendrine Faivre-Moskalenko
- Université de Lyon, F-69000 Lyon, France, Laboratoire Joliot-Curie and Laboratoire de Physique, CNRS/Ecole Normale Supérieure de Lyon, 46 allée d’Italie, F-69007 Lyon, France, and Laboratoire Joliot-Curie and Laboratoire de Biologie Moléculaire de la Cellule, CNRS/Ecole Normale Supérieure de Lyon, 46 allée d’Italie, F-69007 Lyon, France
| | - Pascale Milani
- Université de Lyon, F-69000 Lyon, France, Laboratoire Joliot-Curie and Laboratoire de Physique, CNRS/Ecole Normale Supérieure de Lyon, 46 allée d’Italie, F-69007 Lyon, France, and Laboratoire Joliot-Curie and Laboratoire de Biologie Moléculaire de la Cellule, CNRS/Ecole Normale Supérieure de Lyon, 46 allée d’Italie, F-69007 Lyon, France
| | - Benjamin Audit
- Université de Lyon, F-69000 Lyon, France, Laboratoire Joliot-Curie and Laboratoire de Physique, CNRS/Ecole Normale Supérieure de Lyon, 46 allée d’Italie, F-69007 Lyon, France, and Laboratoire Joliot-Curie and Laboratoire de Biologie Moléculaire de la Cellule, CNRS/Ecole Normale Supérieure de Lyon, 46 allée d’Italie, F-69007 Lyon, France
| | - Cedric Vaillant
- Université de Lyon, F-69000 Lyon, France, Laboratoire Joliot-Curie and Laboratoire de Physique, CNRS/Ecole Normale Supérieure de Lyon, 46 allée d’Italie, F-69007 Lyon, France, and Laboratoire Joliot-Curie and Laboratoire de Biologie Moléculaire de la Cellule, CNRS/Ecole Normale Supérieure de Lyon, 46 allée d’Italie, F-69007 Lyon, France
| | - Emeline Fontaine
- Université de Lyon, F-69000 Lyon, France, Laboratoire Joliot-Curie and Laboratoire de Physique, CNRS/Ecole Normale Supérieure de Lyon, 46 allée d’Italie, F-69007 Lyon, France, and Laboratoire Joliot-Curie and Laboratoire de Biologie Moléculaire de la Cellule, CNRS/Ecole Normale Supérieure de Lyon, 46 allée d’Italie, F-69007 Lyon, France
| | - Fabien Mongelard
- Université de Lyon, F-69000 Lyon, France, Laboratoire Joliot-Curie and Laboratoire de Physique, CNRS/Ecole Normale Supérieure de Lyon, 46 allée d’Italie, F-69007 Lyon, France, and Laboratoire Joliot-Curie and Laboratoire de Biologie Moléculaire de la Cellule, CNRS/Ecole Normale Supérieure de Lyon, 46 allée d’Italie, F-69007 Lyon, France
| | - Guillaume Lavorel
- Université de Lyon, F-69000 Lyon, France, Laboratoire Joliot-Curie and Laboratoire de Physique, CNRS/Ecole Normale Supérieure de Lyon, 46 allée d’Italie, F-69007 Lyon, France, and Laboratoire Joliot-Curie and Laboratoire de Biologie Moléculaire de la Cellule, CNRS/Ecole Normale Supérieure de Lyon, 46 allée d’Italie, F-69007 Lyon, France
| | - Philippe St-Jean
- Université de Lyon, F-69000 Lyon, France, Laboratoire Joliot-Curie and Laboratoire de Physique, CNRS/Ecole Normale Supérieure de Lyon, 46 allée d’Italie, F-69007 Lyon, France, and Laboratoire Joliot-Curie and Laboratoire de Biologie Moléculaire de la Cellule, CNRS/Ecole Normale Supérieure de Lyon, 46 allée d’Italie, F-69007 Lyon, France
| | - Philippe Bouvet
- Université de Lyon, F-69000 Lyon, France, Laboratoire Joliot-Curie and Laboratoire de Physique, CNRS/Ecole Normale Supérieure de Lyon, 46 allée d’Italie, F-69007 Lyon, France, and Laboratoire Joliot-Curie and Laboratoire de Biologie Moléculaire de la Cellule, CNRS/Ecole Normale Supérieure de Lyon, 46 allée d’Italie, F-69007 Lyon, France
| | - Françoise Argoul
- Université de Lyon, F-69000 Lyon, France, Laboratoire Joliot-Curie and Laboratoire de Physique, CNRS/Ecole Normale Supérieure de Lyon, 46 allée d’Italie, F-69007 Lyon, France, and Laboratoire Joliot-Curie and Laboratoire de Biologie Moléculaire de la Cellule, CNRS/Ecole Normale Supérieure de Lyon, 46 allée d’Italie, F-69007 Lyon, France
| | - Alain Arneodo
- Université de Lyon, F-69000 Lyon, France, Laboratoire Joliot-Curie and Laboratoire de Physique, CNRS/Ecole Normale Supérieure de Lyon, 46 allée d’Italie, F-69007 Lyon, France, and Laboratoire Joliot-Curie and Laboratoire de Biologie Moléculaire de la Cellule, CNRS/Ecole Normale Supérieure de Lyon, 46 allée d’Italie, F-69007 Lyon, France
| |
Collapse
|
20
|
Abstract
Recent genome-wide nucleosome mappings along with bioinformatics studies have confirmed that the DNA sequence plays a more important role in the collective organization of nucleosomes in vivo than previously thought. Yet in living cells, this organization also results from the action of various external factors like DNA-binding proteins and chromatin remodelers. To decipher the code for intrinsic chromatin organization, there is thus a need for in vitro experiments to bridge the gap between computational models of nucleosome sequence preferences and in vivo nucleosome occupancy data. Here we combine atomic force microscopy in liquid and theoretical modeling to demonstrate that a major sequence signaling in vivo are high-energy barriers that locally inhibit nucleosome formation rather than favorable positioning motifs. We show that these genomic excluding-energy barriers condition the collective assembly of neighboring nucleosomes consistently with equilibrium statistical ordering principles. The analysis of two gene promoter regions in Saccharomyces cerevisiae and the human genome indicates that these genomic barriers direct the intrinsic nucleosome occupancy of regulatory sites, thereby contributing to gene expression regulation.
Collapse
|
21
|
Zhou Z, Joós B. Disordered, stretched, and semiflexible biopolymers in two dimensions. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2009; 80:061911. [PMID: 20365194 DOI: 10.1103/physreve.80.061911] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/08/2009] [Revised: 09/18/2009] [Indexed: 05/29/2023]
Abstract
We study the effects of intrinsic sequence-dependent curvature for a two-dimensional semiflexible biopolymer with short-range correlation in intrinsic curvatures. We show exactly that when not subjected to any external force, such a system is equivalent to a system with a well-defined intrinsic curvature and a proper renormalized persistence length. We find the exact expression for the distribution function of the equivalent system. However, we show that such an equivalent system does not always exist for the polymer subjected to an external force. We find that under an external force, the effect of sequence disorder depends upon the averaging order, the degree of disorder, and the experimental conditions, such as the boundary conditions. Furthermore, a short to moderate length biopolymer may be much softer or has a smaller apparent persistent length than what would be expected from the "equivalent system." Moreover, under a strong stretching force and for a long biopolymer, the sequence disorder is immaterial for elasticity. Finally, the effect of sequence disorder may depend upon the quantity considered.
Collapse
Affiliation(s)
- Zicong Zhou
- Department of Physics, Tamkang University, 151 Ying-chuan, Tamsui 25137, Taiwan, Republic of China.
| | | |
Collapse
|
22
|
Guo AM, Xiong SJ. Violation of the single-parameter scaling hypothesis in human chromosome 22 with charge transfer models. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2009; 79:041924. [PMID: 19518273 DOI: 10.1103/physreve.79.041924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2008] [Revised: 12/15/2008] [Indexed: 05/27/2023]
Abstract
We investigate transport properties of DNA sequences in human chromosome 22 and compare the results with those of a random artificial DNA sequence based on the single- and double-stranded charge transfer models. The statistical quantities, including the Hurst exponent, the distribution of Lyapunov exponent (LE), the central moments, and the scaling parameter, are numerically calculated by using the transfer-matrix approach. It is found that the existence of satellite DNA segments in human chromosome 22 could result in deviations from usual Gaussian distribution of LE. Our results suggest that the presence of the satellite DNA segments, together with the long-range correlations and the base-pairing correlations could lead to the violation of single-parameter scaling hypothesis which holds for the random artificial DNA sequence although the behaviors of the averaged LEs for both DNA sequences are similar. This provides a viewpoint to analyze differences between the genomic DNA sequences and the nonliving random ones on the basis of localization properties of wave functions in the sequences.
Collapse
Affiliation(s)
- Ai-Min Guo
- Department of Physics and National Laboratory of Solid State Microstructures, Nanjing University, Nanjing 210093, China
| | | |
Collapse
|
23
|
Liu H, Wu J, Xie J, Yang X, Lu Z, Sun X. Characteristics of nucleosome core DNA and their applications in predicting nucleosome positions. Biophys J 2008; 94:4597-604. [PMID: 18326654 PMCID: PMC2397361 DOI: 10.1529/biophysj.107.117028] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2007] [Accepted: 01/18/2008] [Indexed: 11/18/2022] Open
Abstract
By analyzing dinucleotide position-frequency data of yeast nucleosome-bound DNA sequences, dinucleotide periodicities of core DNA sequences were investigated. Within frequency domains, weakly bound dinucleotides (AA, AT, and the combinations AA-TT-TA and AA-TT-TA-AT) present doublet peaks in a periodicity range of 10-11 bp, and strongly bound dinucleotides present a single peak. A time-frequency analysis, based on wavelet transformation, indicated that weakly bound dinucleotides of core DNA sequences were spaced smaller (approximately 10.3 bp) at the two ends, with larger (approximately 11.1 bp) spacing in the middle section. The finding was supported by DNA curvature and was prevalent in all core DNA sequences. Therefore, three approaches were developed to predict nucleosome positions. After analyzing a 2200-bp DNA sequence, results indicated that the predictions were feasible; areas near protein-DNA binding sites resulted in periodicity profiles with irregular signals. The effects of five dinucleotide patterns were evaluated, indicating that the AA-TT pattern exhibited better performance. A chromosome-scale prediction demonstrated that periodicity profiles perform better than previously described, with up to 59% accuracy. Based on predictions, nucleosome distributions near the beginning and end of open reading frames were analyzed. Results indicated that the majority of open reading frames' start and end sites were occupied by nucleosomes.
Collapse
Affiliation(s)
- Hongde Liu
- State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China
| | | | | | | | | | | |
Collapse
|
24
|
Vaillant C, Audit B, Arneodo A. Experiments confirm the influence of genome long-range correlations on nucleosome positioning. PHYSICAL REVIEW LETTERS 2007; 99:218103. [PMID: 18233262 DOI: 10.1103/physrevlett.99.218103] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2007] [Indexed: 05/25/2023]
Abstract
From the statistical analysis of nucleosome positioning data for chromosome III of S. cerevisiae, we demonstrate that long-range correlations (LRC) in the genomic sequence strongly influence the organization of nucleosomes. We present a physical explanation of how LRC may significantly condition the overall formation and positioning of nucleosomes including the nucleosome-free regions observed at gene promoters. From grand canonical Monte Carlo simulations based upon a simple sequence-dependent nucleosome model, we show that LRC induce a patchy nucleosome occupancy landscape with alternation of "crystal-like" phases of confined regularly spaced nucleosomes and "fluidlike" phases of rather diluted nonpositioned nucleosomes.
Collapse
Affiliation(s)
- C Vaillant
- Laboratoire Joliot-Curie and Laboratoire de Physique, CNRS, ENS-Lyon, 46 Allée d'Italie, 69364 Lyon Cedex 07, France
| | | | | |
Collapse
|
25
|
Liu F, Tøstesen E, Sundet JK, Jenssen TK, Bock C, Jerstad GI, Thilly WG, Hovig E. The human genomic melting map. PLoS Comput Biol 2007; 3:e93. [PMID: 17511513 PMCID: PMC1868775 DOI: 10.1371/journal.pcbi.0030093] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2006] [Accepted: 04/11/2007] [Indexed: 11/19/2022] Open
Abstract
In a living cell, the antiparallel double-stranded helix of DNA is a dynamically changing structure. The structure relates to interactions between and within the DNA strands, and the array of other macromolecules that constitutes functional chromatin. It is only through its changing conformations that DNA can organize and structure a large number of cellular functions. In particular, DNA must locally uncoil, or melt, and become single-stranded for DNA replication, repair, recombination, and transcription to occur. It has previously been shown that this melting occurs cooperatively, whereby several base pairs act in concert to generate melting bubbles, and in this way constitute a domain that behaves as a unit with respect to local DNA single-strandedness. We have applied a melting map calculation to the complete human genome, which provides information about the propensities of forming local bubbles determined from the whole sequence, and present a first report on its basic features, the extent of cooperativity, and correlations to various physical and biological features of the human genome. Globally, the melting map covaries very strongly with GC content. Most importantly, however, cooperativity of DNA denaturation causes this correlation to be weaker at resolutions fewer than 500 bps. This is also the resolution level at which most structural and biological processes occur, signifying the importance of the informational content inherent in the genomic melting map. The human DNA melting map may be further explored at http://meltmap.uio.no.
Collapse
Affiliation(s)
- Fang Liu
- Department of Tumor Biology, Institute for Cancer Research, Rikshospitalet-Radiumhospitalet Medical Center, Oslo, Norway
- PubGene AS, Vinderen, Oslo, Norway
| | - Eivind Tøstesen
- Department of Tumor Biology, Institute for Cancer Research, Rikshospitalet-Radiumhospitalet Medical Center, Oslo, Norway
| | | | | | - Christoph Bock
- Max-Planck-Institut für Informatik, Saarbrücken, Germany
| | - Geir Ivar Jerstad
- Department of Tumor Biology, Institute for Cancer Research, Rikshospitalet-Radiumhospitalet Medical Center, Oslo, Norway
| | - William G Thilly
- Biological Engineering Division, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Eivind Hovig
- Department of Tumor Biology, Institute for Cancer Research, Rikshospitalet-Radiumhospitalet Medical Center, Oslo, Norway
- Institute of Informatics, University of Oslo, Norway
- Medical Informatics, Institute for Cancer Research, Rikshospitalet-Radiumhospitalet Medical Center, Oslo, Norway
| |
Collapse
|
26
|
Moukhtar J, Fontaine E, Faivre-Moskalenko C, Arneodo A. Probing persistence in DNA curvature properties with atomic force microscopy. PHYSICAL REVIEW LETTERS 2007; 98:178101. [PMID: 17501536 DOI: 10.1103/physrevlett.98.178101] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2006] [Indexed: 05/15/2023]
Abstract
We elaborate on a mean-field extension of the wormlike chain model that accounts for the presence of long-range correlations (LRC) in the intrinsic curvature disorder of genomic DNA, the stronger the LRC, the smaller the persistence length. The comparison of atomic force microscopy imaging of straight, uncorrelated virus and correlated human DNA fragments with DNA simulations confirms that the observed decrease in persistence length for human DNA more likely results from a sequence-induced large-scale intrinsic curvature than from some increased flexibility.
Collapse
Affiliation(s)
- J Moukhtar
- Laboratoires Joliot Curie (USR 3010) et de Physique (UMR 5672), Ecole Normale Supérieure de Lyon, 46 allée d'Italie, 69364 Lyon cedex 07, France
| | | | | | | |
Collapse
|
27
|
Nicolay S, Brodie Of Brodie EB, Touchon M, Audit B, d'Aubenton-Carafa Y, Thermes C, Arneodo A. Bifractality of human DNA strand-asymmetry profiles results from transcription. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2007; 75:032902. [PMID: 17500744 DOI: 10.1103/physreve.75.032902] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2006] [Indexed: 05/15/2023]
Abstract
We use the wavelet transform modulus maxima method to investigate the multifractal properties of strand-asymmetry DNA walk profiles in the human genome. This study reveals the bifractal nature of these profiles, which involve two competing scale-invariant (up to repeat-masked distances less, or similar 40 kbp) components characterized by Hölder exponents h{1}=0.78 and h{2}=1, respectively. The former corresponds to the long-range-correlated homogeneous fluctuations previously observed in DNA walks generated with structural codings. The latter is associated with the presence of jumps in the original strand-asymmetry noisy signal S. We show that a majority of upward (downward) jumps co-locate with gene transcription start (end) sites. Here 7228 human gene transcription start sites from the refGene database are found within 2 kbp from an upward jump of amplitude DeltaS > or = 0.1 which suggests that about 36% of annotated human genes present significant transcription-induced strand asymmetry and very likely high expression rate.
Collapse
Affiliation(s)
- S Nicolay
- Laboratoire Joliot-Curie and Laboratoire de Physique, UMR 5672, CNRS, ENS-Lyon, 46 Allée d'Italie, 69364 Lyon Cedex 07, France
| | | | | | | | | | | | | |
Collapse
|
28
|
Cattani C, D'Auria CR. Correlations in DNA sequences. JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES 2007. [DOI: 10.1080/02522667.2007.10699728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
29
|
Salerno W, Havlak P, Miller J. Scale-invariant structure of strongly conserved sequence in genomic intersections and alignments. Proc Natl Acad Sci U S A 2006; 103:13121-5. [PMID: 16924100 PMCID: PMC1559763 DOI: 10.1073/pnas.0605735103] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
A power-law distribution of the length of perfectly conserved sequence from mouse/human whole-genome intersection and alignment is exhibited. Spatial correlations of these elements within the mouse genome are studied. It is argued that these power-law distributions and correlations are comprised in part by functional noncoding sequence and ought to be accounted for in estimating the statistical significance of apparent sequence conservation. These inter-genomic correlations of conservation are placed in the context of previously observed intra-genomic correlations, and their possible origins and consequences are discussed.
Collapse
Affiliation(s)
| | - Paul Havlak
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030
| | - Jonathan Miller
- *Department of Biochemistry and Molecular Biology and
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030
- To whom correspondence should be addressed. E-mail:
| |
Collapse
|
30
|
Vaillant C, Audit B, Thermes C, Arnéodo A. Formation and positioning of nucleosomes: effect of sequence-dependent long-range correlated structural disorder. THE EUROPEAN PHYSICAL JOURNAL. E, SOFT MATTER 2006; 19:263-77. [PMID: 16477390 DOI: 10.1140/epje/i2005-10053-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2005] [Accepted: 01/20/2006] [Indexed: 05/06/2023]
Abstract
The understanding of the long-range correlations (LRC) observed in DNA sequences is still an open and very challenging problem. In this paper, we start reviewing recent results obtained when exploring the scaling properties of eucaryotic, eubacterial and archaeal genomic sequences using the space-scale decomposition provided by the wavelet transform (WT). These results suggest that the existence of LRC up to distances approximately 20-30 kbp is the signature of the nucleosomal structure and dynamics of the chromatin fiber. Actually the LRC are mainly observed in the DNA bending profiles obtained when using some structural coding of the DNA sequences that accounts for the fluctuations of the local double-helix curvature within the nucleosome complex. Because of the approximate planarity of nucleosomal DNA loops, we then study the influence of the LRC structural disorder on the thermodynamical properties of 2D elastic chains submitted locally to mechanical/topological constraint as loops. The equilibrium properties of the one-loop system are derived numerically and analytically in the quite realistic weak-disorder limit. The LRC are shown to favor the spontaneous formation of small loops, the larger the LRC, the smaller the size of the loop. We further investigate the dynamical behavior of such a loop using the mean first passage time (MFPT) formalism. We show that the typical short-time loop dynamics is superdiffusive in the presence of LRC. For displacements larger than the loop size, we use large-deviation theory to derive a LRC-dependent anomalous-diffusion rule that accounts for the lack of disorder self-averaging. Potential biological implications on DNA loops involved in nucleosome positioning and dynamics in eucaryotic chromatin are discussed.
Collapse
Affiliation(s)
- C Vaillant
- Institut Bernouilli, EPFL, 1015, Lausanne, Switzerland
| | | | | | | |
Collapse
|
31
|
|
32
|
Larsabal E, Danchin A. Genomes are covered with ubiquitous 11 bp periodic patterns, the "class A flexible patterns". BMC Bioinformatics 2005; 6:206. [PMID: 16120222 PMCID: PMC1242344 DOI: 10.1186/1471-2105-6-206] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2005] [Accepted: 08/24/2005] [Indexed: 11/17/2022] Open
Abstract
Background The genomes of prokaryotes and lower eukaryotes display a very strong 11 bp periodic bias in the distribution of their nucleotides. This bias is present throughout a given genome, both in coding and non-coding sequences. Until now this bias remained of unknown origin. Results Using a technique for analysis of auto-correlations based on linear projection, we identified the sequences responsible for the bias. Prokaryotic and lower eukaryotic genomes are covered with ubiquitous patterns that we termed "class A flexible patterns". Each pattern is composed of up to ten conserved nucleotides or dinucleotides distributed into a discontinuous motif. Each occurrence spans a region up to 50 bp in length. They belong to what we named the "flexible pattern" type, in that there is some limited fluctuation in the distances between the nucleotides composing each occurrence of a given pattern. When taken together, these patterns cover up to half of the genome in the majority of prokaryotes. They generate the previously recognized 11 bp periodic bias. Conclusion Judging from the structure of the patterns, we suggest that they may define a dense network of protein interaction sites in chromosomes.
Collapse
Affiliation(s)
- Etienne Larsabal
- Unité de Génétique des Génomes Bactériens, Institut Pasteur, URA CNRS 2171, 28, rue du Docteur Roux, 75724 Paris Cedex 15, France
| | - Antoine Danchin
- Unité de Génétique des Génomes Bactériens, Institut Pasteur, URA CNRS 2171, 28, rue du Docteur Roux, 75724 Paris Cedex 15, France
| |
Collapse
|
33
|
Vaillant C, Audit B, Arnéodo A. Thermodynamics of DNA loops with long-range correlated structural disorder. PHYSICAL REVIEW LETTERS 2005; 95:068101. [PMID: 16090995 DOI: 10.1103/physrevlett.95.068101] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/09/2005] [Indexed: 05/03/2023]
Abstract
We study the influence of a structural disorder on the thermodynamical properties of 2D-elastic chains submitted to mechanical/topological constraint as loops. The disorder is introduced via a spontaneous curvature whose distribution along the chain presents either no correlation or long-range correlations (LRC). The equilibrium properties of the one-loop system are derived numerically and analytically for weak disorder. LRC are shown to favor the formation of small loop, larger the LRC, smaller the loop size. We use the mean first passage time formalism to show that the typical short time loop dynamics is superdiffusive in the presence of LRC. Potential biological implications on nucleosome positioning and dynamics in eukaryotic chromatin are discussed.
Collapse
Affiliation(s)
- C Vaillant
- Institut Bernouilli, EPFL, 1015 Lausanne, Switzerland
| | | | | |
Collapse
|
34
|
Florquin K, Saeys Y, Degroeve S, Rouzé P, Van de Peer Y. Large-scale structural analysis of the core promoter in mammalian and plant genomes. Nucleic Acids Res 2005; 33:4255-64. [PMID: 16049029 PMCID: PMC1181242 DOI: 10.1093/nar/gki737] [Citation(s) in RCA: 69] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2005] [Revised: 06/10/2005] [Accepted: 07/10/2005] [Indexed: 12/19/2022] Open
Abstract
DNA encodes at least two independent levels of functional information. The first level is for encoding proteins and sequence targets for DNA-binding factors, while the second one is contained in the physical and structural properties of the DNA molecule itself. Although the physical and structural properties are ultimately determined by the nucleotide sequence itself, the cell exploits these properties in a way in which the sequence itself plays no role other than to support or facilitate certain spatial structures. In this work, we focus on these structural properties, comparing them between different organisms and assessing their ability to describe the core promoter. We prove the existence of distinct types of core promoters, based on a clustering of their structural profiles. These results indicate that the structural profiles are much conserved within plants (Arabidopsis and rice) and animals (human and mouse), but differ considerably between plants and animals. Furthermore, we demonstrate that these structural profiles can be an alternative way of describing the core promoter, in addition to more classical motif or IUPAC-based approaches. Using the structural profiles as discriminatory elements to separate promoter regions from non-promoter regions, reliable models can be built to identify core-promoter regions using a strictly computational approach.
Collapse
Affiliation(s)
- Kobe Florquin
- Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology (VIB), Ghent UniversityTechnologiepark 927, B-9052 Ghent, Belgium
| | - Yvan Saeys
- Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology (VIB), Ghent UniversityTechnologiepark 927, B-9052 Ghent, Belgium
| | - Sven Degroeve
- Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology (VIB), Ghent UniversityTechnologiepark 927, B-9052 Ghent, Belgium
| | - Pierre Rouzé
- Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology (VIB), Ghent UniversityTechnologiepark 927, B-9052 Ghent, Belgium
| | - Yves Van de Peer
- Department of Plant Systems Biology, Flanders Interuniversity Institute for Biotechnology (VIB), Ghent UniversityTechnologiepark 927, B-9052 Ghent, Belgium
| |
Collapse
|
35
|
Abnizova I, te Boekhorst R, Walter K, Gilks WR. Some statistical properties of regulatory DNA sequences, and their use in predicting regulatory regions in the Drosophila genome: the fluffy-tail test. BMC Bioinformatics 2005; 6:109. [PMID: 15857505 PMCID: PMC1127108 DOI: 10.1186/1471-2105-6-109] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2004] [Accepted: 04/27/2005] [Indexed: 11/16/2022] Open
Abstract
Background This paper addresses the problem of recognising DNA cis-regulatory modules which are located far from genes. Experimental procedures for this are slow and costly, and computational methods are hard, because they lack positional information. Results We present a novel statistical method, the "fluffy-tail test", to recognise regulatory DNA. We exploit one of the basic informational properties of regulatory DNA: abundance of over-represented transcription factor binding site (TFBS) motifs, although we do not look for specific TFBS motifs, per se . Though overrepresentation of TFBS motifs in regulatory DNA has been intensively exploited by many algorithms, it is still a difficult problem to distinguish regulatory from other genomic DNA. Conclusion We show that, in the data used, our method is able to distinguish cis-regulatory modules by exploiting statistical differences between the probability distributions of similar words in regulatory and other DNA. The potential application of our method includes annotation of new genomic sequences and motif discovery.
Collapse
Affiliation(s)
- Irina Abnizova
- MRC Biostatistics Unit, Institute of Public Health, Robinson Way, Cambridge CB2 2SR, UK
| | - Rene te Boekhorst
- Computer Science Department, University of Hertfordshire, College Lane, AL10 92BA, Hatfield Campus, UK
| | - Klaudia Walter
- MRC Biostatistics Unit, Institute of Public Health, Robinson Way, Cambridge CB2 2SR, UK
| | - Walter R Gilks
- MRC Biostatistics Unit, Institute of Public Health, Robinson Way, Cambridge CB2 2SR, UK
| |
Collapse
|
36
|
Costa M, Goldberger AL, Peng CK. Multiscale entropy analysis of biological signals. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2005; 71:021906. [PMID: 15783351 DOI: 10.1103/physreve.71.021906] [Citation(s) in RCA: 1097] [Impact Index Per Article: 57.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2004] [Indexed: 05/02/2023]
Abstract
Traditional approaches to measuring the complexity of biological signals fail to account for the multiple time scales inherent in such time series. These algorithms have yielded contradictory findings when applied to real-world datasets obtained in health and disease states. We describe in detail the basis and implementation of the multiscale entropy (MSE) method. We extend and elaborate previous findings showing its applicability to the fluctuations of the human heartbeat under physiologic and pathologic conditions. The method consistently indicates a loss of complexity with aging, with an erratic cardiac arrhythmia (atrial fibrillation), and with a life-threatening syndrome (congestive heart failure). Further, these different conditions have distinct MSE curve profiles, suggesting diagnostic uses. The results support a general "complexity-loss" theory of aging and disease. We also apply the method to the analysis of coding and noncoding DNA sequences and find that the latter have higher multiscale entropy, consistent with the emerging view that so-called "junk DNA" sequences contain important biological information.
Collapse
Affiliation(s)
- Madalena Costa
- Margret and H. A. Rey Institute for Nonlinear Dynamics in Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts 02215, USA
| | | | | |
Collapse
|
37
|
Li T, Han B. Dampable waves along nucleic acid sequences mediating nucleotides' interactions. ACTA ACUST UNITED AC 2004; 15:135-9. [PMID: 15346768 DOI: 10.1080/10425170410001683476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
This work studied the relationship of any two nucleotides in genomic sequences, coding sequences and full-length cDNAs. We made a statistical hypothesis that there exist no interactions between any two nucleotides in sequences, therefore, a hypothetical combination distribution of two nucleotides is considered and the difference between the hypothetical combination distribution and the actual distribution is used to measure the average interaction between the two nucleotides. As a result, we found that the interactions between any two nucleotides are clearly and closely related with dampable wavelike patterns along the sequences. Based on the results we daringly make some hypotheses on several biological topics. Further, studies on the wave may provide new clues for gene prediction and genome structure study.
Collapse
Affiliation(s)
- Tao Li
- National Centre for Gene Research, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 500 Caobao Road, Shanghai 200233, China
| | | |
Collapse
|
38
|
Nicolay S, Argoul F, Touchon M, d'Aubenton-Carafa Y, Thermes C, Arneodo A. Low frequency rhythms in human DNA sequences: a key to the organization of gene location and orientation? PHYSICAL REVIEW LETTERS 2004; 93:108101. [PMID: 15447453 DOI: 10.1103/physrevlett.93.108101] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2003] [Indexed: 05/24/2023]
Abstract
We explore large-scale nucleotide compositional fluctuations of the human genome using multiresolution techniques. Analysis of the GC content and of the AT and GC skews reveals the existence of rhythms with two main periods of 110+/-20 kb and 400+/-50 kb that enlighten a remarkable cooperative gene organization. We show that the observed nonlinear oscillations are likely to display all the characteristic features of chaotic strange attractors which suggests a very attractive deterministic picture: gene orientation and location, in relation with the structure and dynamics of chromatin, might be governed by a low-dimensional nonlinear dynamical system.
Collapse
Affiliation(s)
- S Nicolay
- Laboratoire de Physique, Ecole Normale Supérieure de Lyon, 46 Allée d'Italie, 69364 Lyon Cedex 07, France
| | | | | | | | | | | |
Collapse
|
39
|
Battail G. An engineer’s view on genetic information and biological evolution. Biosystems 2004; 76:279-90. [PMID: 15351150 DOI: 10.1016/j.biosystems.2004.05.029] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2003] [Revised: 07/11/2003] [Accepted: 08/01/2003] [Indexed: 10/26/2022]
Abstract
We develop ideas on genome replication introduced in Battail [Europhys. Lett. 40 (1997) 343]. Starting with the hypothesis that the genome replication process uses error-correcting means, and the auxiliary one that nested codes are used to this end, we first review the concepts of redundancy and error-correcting codes. Then we show that these hypotheses imply that: distinct species exist with a hierarchical taxonomy, there is a trend of evolution towards complexity, and evolution proceeds by discrete jumps. At least the first two features above may be considered as biological facts so, in the absence of direct evidence, they provide an indirect proof in favour of the hypothesized error-correction system. The very high redundancy of genomes makes it possible. In order to explain how it is implemented, we suggest that soft codes and replication decoding, to be briefly described, are plausible candidates. Experimentally proven properties of long-range correlation of the DNA message substantiate this claim.
Collapse
|
40
|
Abstract
Nucleic acids are characterized by a vast structural variability. Secondary structural conformations include the main polymorphs A, B, and Z, cruciforms, intrinsic curvature, and multistranded motifs. DNA secondary motifs are stabilized and regulated by the primary base sequence, contextual effects, environmental factors, as well as by high-order DNA packaging modes. The high-order modes are, in turn, affected by secondary structures and by the environment. This review is concerned with the flow of structural information among the hierarchical structural levels of DNA molecules, the intricate interplay between the various factors that affect these levels, and the regulation and physiological significance of DNA high-order structures.
Collapse
Affiliation(s)
- Abraham Minsky
- Department of Organic Chemistry, The Weizmann Institute of Science, Rehovot 76100, Israel.
| |
Collapse
|
41
|
Audit B, Vaillant C, Arnéodo A, d'Aubenton-Carafa Y, Thermes C. Wavelet Analysis of DNA Bending Profiles reveals Structural Constraints on the Evolution of Genomic Sequences. J Biol Phys 2004; 30:33-81. [PMID: 23345861 PMCID: PMC3456503 DOI: 10.1023/b:jobp.0000016438.86794.8e] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Analyses of genomic DNA sequences have shown in previous works that base pairs are correlated at large distances with scale-invariant statistical properties. We show in the present study that these correlations between nucleotides (letters) result in fact from long-range correlations (LRC) between sequence-dependent DNA structural elements (words) involved in the packaging of DNA in chromatin. Using the wavelet transform technique, we perform a comparative analysis of the DNA text and of the corresponding bending profiles generated with curvature tables based on nucleosome positioning data. This exploration through the optics of the so-called `wavelet transform microscope' reveals a characteristic scale of 100-200 bp that separates two regimes of different LRC. We focus here on the existence of LRC in the small-scale regime (≲ 200 bp). Analysis of genomes in the three kingdoms reveals that this regime is specifically associated to the presence of nucleosomes. Indeed, small scale LRC are observed in eukaryotic genomes and to a less extent in archaeal genomes, in contrast with their absence in eubacterial genomes. Similarly, this regime is observed in eukaryotic but not in bacterial viral DNA genomes. There is one exception for genomes of Poxviruses, the only animal DNA viruses that do not replicate in the cell nucleus and do not present small scale LRC. Furthermore, no small scale LRC are detected in the genomes of all examined RNA viruses, with one exception in the case of retroviruses. Altogether, these results strongly suggest that small-scale LRC are a signature of the nucleosomal structure. Finally, we discuss possible interpretations of these small-scale LRC in terms of the mechanisms that govern the positioning, the stability and the dynamics of the nucleosomes along the DNA chain. This paper is maily devoted to a pedagogical presentation of the theoretical concepts and physical methods which are well suited to perform a statistical analysis of genomic sequences. We review the results obtained with the so-called wavelet-based multifractal analysis when investigating the DNA sequences of various organisms in the three kingdoms. Some of these results have been announced in B. Audit et al. [1, 2].
Collapse
Affiliation(s)
- Benjamin Audit
- Centre de Recherche Paul Pascal, avenue Schweitzer, 33600 Pessac, France
| | | | | | | | | |
Collapse
|
42
|
Nazina AG, Papatsenko DA. Statistical extraction of Drosophila cis-regulatory modules using exhaustive assessment of local word frequency. BMC Bioinformatics 2003; 4:65. [PMID: 14690551 PMCID: PMC341902 DOI: 10.1186/1471-2105-4-65] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2003] [Accepted: 12/22/2003] [Indexed: 11/13/2022] Open
Abstract
Background Transcription regulatory regions in higher eukaryotes are often represented by cis-regulatory modules (CRM) and are responsible for the formation of specific spatial and temporal gene expression patterns. These extended, ~1 KB, regions are found far from coding sequences and cannot be extracted from genome on the basis of their relative position to the coding regions. Results To explore the feasibility of CRM extraction from a genome, we generated an original training set, containing annotated sequence data for most of the known developmental CRMs from Drosophila. Based on this set of experimental data, we developed a strategy for statistical extraction of cis-regulatory modules from the genome, using exhaustive analysis of local word frequency (LWF). To assess the performance of our analysis, we measured the correlation between predictions generated by the LWF algorithm and the distribution of conserved non-coding regions in a number of Drosophila developmental genes. Conclusions In most of the cases tested, we observed high correlation (up to 0.6–0.8, measured on the entire gene locus) between the two independent techniques. We discuss computational strategies available for extraction of Drosophila CRMs and possible extensions of these methods.
Collapse
Affiliation(s)
- Anna G Nazina
- Department of Biology, New York University, New York, USA
| | | |
Collapse
|
43
|
Audit B, Ouzounis CA. From genes to genomes: universal scale-invariant properties of microbial chromosome organisation. J Mol Biol 2003; 332:617-33. [PMID: 12963371 DOI: 10.1016/s0022-2836(03)00811-8] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
The availability of complete genome sequences for a large variety of organisms is a major advance in understanding genome structure and function. One attribute of genome structure is chromosome organisation in terms of gene localisation and orientation. For example, bacterial operons, i.e. clusters of co-oriented genes that form transcription units, enable functionally related genes to be expressed simultaneously. The description of genome organisation was pioneered with the study of the distribution of genes of the Escherichia coli partial genetic map before the full genome sequence was known. Deploying powerful techniques from circular statistics and signal processing, we revisit the issue of gene localisation and orientation using 89 complete microbial chromosomes from the eubacterial and archaeal domains. We demonstrate that there is no characteristic size pertinent to the description of chromosome structure, e.g. there does not exist any single length appropriate to describe gene clustering. Our results show that, for all 89 chromosomes, gene positions and gene orientations share a common form of scale-invariant correlations known as "long-range correlations" that we can reveal for distances from the gene length, up to the chromosome size. This observation indicates that genes tend to assemble and to co-orient over any scale of observation greater than a few kilobases. This unexpected property of chromosome structure can be portrayed as an operon-like organisation at all scales and implies that a complete scale range extending over more than three orders of magnitudes of chromosome segment lengths is necessary to properly describe prokaryotic genome organisation. We propose that this pattern results from the effects of the superhelical context on gene expression coupled with the structure and dynamics of the nucleoid, possibly accommodating the diverse gene expression profiles needed during the different stages of cellular life.
Collapse
Affiliation(s)
- Benjamin Audit
- Wellcome Trust Genome Campus, Computational Genomics Group, The European Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge, CB10 1SD, UK
| | | |
Collapse
|
44
|
Holste D, Grosse I, Beirer S, Schieg P, Herzel H. Repeats and correlations in human DNA sequences. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2003; 67:061913. [PMID: 16241267 DOI: 10.1103/physreve.67.061913] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2003] [Indexed: 05/04/2023]
Abstract
We study the nucleotide-nucleotide mutual information function I(k) of the DNA sequences of the three completely sequenced human chromosomes 20, 21, and 22. We find in each human chromosome (i) the absence of the k=3 base pair (bp) sequence periodicity characteristic for protein coding regions, (ii) the absence of the k=10-11 bp sequence periodicity characteristic for both protein secondary structure and DNA bendability, and (iii) the presence of significant statistical dependencies at about k=135 bp and at about k=165 bp. We investigate to which degree the density and composition of interspersed repeats might explain these observed statistical patterns in all three human chromosomes. We use simple stochastic models to substitute known interspersed repeats and find by numerical studies that (iv) the presence of interspersed repeats dominates short-range correlations as measured by I(k) on the scale of several hundred base pairs in human chromosomes 20, 21, and 22. On the other hand, we find that (v) interspersed repeats contribute only weakly to long-range correlations due to the clustering of highly abundant Alu repeats.
Collapse
Affiliation(s)
- Dirk Holste
- Department of Biology, Massachusetts Institute of Technology, Cambridge 02139, USA.
| | | | | | | | | |
Collapse
|
45
|
Song J, Ware A, Liu SL. Wavelet to predict bacterial ori and ter: a tendency towards a physical balance. BMC Genomics 2003; 4:17. [PMID: 12732098 PMCID: PMC156607 DOI: 10.1186/1471-2164-4-17] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2003] [Accepted: 05/05/2003] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Chromosomal DNA replication in bacteria starts at the origin (ori) and the two replicores propagate in opposite directions up to the terminus (ter) region. We hypothesize that the two replicores need to reach ter at the same time to maintain a physical balance; DNA insertion would disrupt such a balance, requiring chromosomal rearrangements to restore the balance. To test this hypothesis, we needed to demonstrate that ori and ter are in a physical balance in bacterial chromosomes. Using wavelet analysis, we documented GC skew, AT skew, purine excess and keto excess on the published bacterial genomic sequences to locate the turning (minimum and maximum) points on the curves. Previously, the minimum point had been supposed to correlate with ori and the maximum to correlate with ter. RESULTS We observed a strong tendency of the bacterial chromosomes towards a physical balance, with the minima and maxima corresponding to the known or putative ori and ter and being about half chromosome separated in most of the bacteria studied. A nonparametric method based on wavelet transformation was employed to perform significance tests for the predicted loci. CONCLUSIONS The wavelet approach can reliably predict the ori and ter regions and the bacterial chromosomes have a strong tendency towards a physical balance between ori and ter.
Collapse
Affiliation(s)
- Jiuzhou Song
- Departments of Microbiology and Infectious Diseases, University of Calgary, Calgary, Canada
| | - Antony Ware
- Mathematics and Statistics, University of Calgary, Calgary, Canada
| | - Shu-Lin Liu
- Departments of Microbiology and Infectious Diseases, University of Calgary, Calgary, Canada
- Department of Microbiology, Peking University School of Basic Medical Sciences, Beijing, China
| |
Collapse
|
46
|
Abstract
The base distributions in coding DNA sequences (CDS) are investigated. We explore the scaling properties of the 4-dimensional directed random walk and compare them with that for the DNA sequences. Inference from these observation are, however, contradicted by alternate analysis using factorial moments. To resolve this conflict we look directly at the nucleotide base distributions. In all the cases the base distributions change from gaussian to non-gaussian as the scale size is increased. The CDS, therefore, have nucleotide distributions different from the random.
Collapse
Affiliation(s)
- A Som
- Department of Theoretical Physics, Indian Association for the Cultivation of Science, Jadavpur, Calcutta 700 032, India.
| | | | | |
Collapse
|
47
|
Vaillant C, Audit B, Thermes C, Arnéodo A. Influence of the sequence on elastic properties of long DNA chains. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2003; 67:032901. [PMID: 12689116 DOI: 10.1103/physreve.67.032901] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2002] [Revised: 12/17/2002] [Indexed: 05/24/2023]
Abstract
We revisit the results of single-molecule DNA stretching experiments using a rodlike chain (RLC) model that explicitly includes some intrinsic structural disorder induced by the sequence. The investigation of artificial and real genomic sequences shows that the wormlike chain model reproduces quite well the data but with an effective bend stiffness A(eff), which underestimates the true elastic bend stiffness A, independently of the elastic twist stiffness C. Mainly dominated by the amplitude of the structural disorder, this correction seems rather insensitive to the presence of long-range correlations. This RLC model is shown to remarkably fit the experimental data for lambda-DNA when considering A approximately 70+/-10 nm (>A(eff) approximately 50 nm), in good agreement with previous experimental estimates of the "dynamic" persistent length. From the analysis of large human contigs, we speculate about the possible dependence of A(eff) and/or A upon the (G+C) content of the considered sequence.
Collapse
Affiliation(s)
- C Vaillant
- Institut Bernoulli, EPFL, 1015 Lausanne, Switzerland
| | | | | | | |
Collapse
|
48
|
Abstract
We apply the random field theory tothe study of DNA chains which we assume tobe trajectories of a stochastic process. Weconstruct statistical potential betweennucleotides corresponding to theprobabilities of those trajectories thatcan be obtained from the DNA data basecontaining millions of sequences. It turnsout that this potential has aninterpretation in terms of quantitiesnaturally arrived at during the study ofevolution of species i.e. probabilities ofmutations of codons. Making use of recentlyperformed statistical investigations of DNAwe show that this potential has differentqualitative properties in coding andnoncoding parts of genes. We apply ourmodel to data for various organisms andobtain a good agreement with the resultsjust presented in the literature. We alsoargue that the coding/noncoding boundariescan corresponds to jumps of the potential.
Collapse
Affiliation(s)
- Janusz Szczepański
- Polish Academy of Science, Institute of Fundamental Technological Research, Świętokrzyska 21, 00–049 Warsaw, Poland
| | - Tomasz Michałek
- Polish Academy of Science, Institute of Fundamental Technological Research, Świętokrzyska 21, 00–049 Warsaw, Poland
| |
Collapse
|
49
|
Arnéodo A, Decoster N, Kestener P, Roux S. A wavelet-based method for multifractal image analysis: From theoretical concepts to experimental applications. ADVANCES IN IMAGING AND ELECTRON PHYSICS 2003. [DOI: 10.1016/s1076-5670(03)80014-9] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
|