1
|
Moqtaderi Z, Brown S, Bender W. Genome-wide oscillations in G + C density and sequence conservation. Genome Res 2021; 31:2050-2057. [PMID: 34649930 PMCID: PMC8559709 DOI: 10.1101/gr.274332.120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 09/01/2021] [Indexed: 11/25/2022]
Abstract
Eukaryotic genomes typically show a uniform G + C content among chromosomes, but on smaller scales, many species have a G + C density that fluctuates with a characteristic wavelength. This oscillation is evident in many insect species, with wavelengths ranging between 700 bp and 4 kb. Measures of evolutionary conservation oscillate in phase with G + C content, with conserved regions having higher G + C. Loci with large regulatory regions show more regular oscillations; coding sequences and heterochromatic regions show little or no oscillation. There is little oscillation in vertebrate genomes in regions with densely distributed mobile repetitive elements. However, species with few repeats show oscillation in both G + C density and sequence conservation. These oscillations may reflect optimal spacing of cis-regulatory elements.
Collapse
Affiliation(s)
- Zarmik Moqtaderi
- Department of Biological Chemistry and Molecular Pharmacology, Blavatnik Institute, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Susan Brown
- Department of Biology, Kansas State University, Manhattan, Kansas 66506, USA
| | - Welcome Bender
- Department of Biological Chemistry and Molecular Pharmacology, Blavatnik Institute, Harvard Medical School, Boston, Massachusetts 02115, USA
| |
Collapse
|
2
|
A Wavelet-Based Method for the Impact of Social Media on the Economic Situation: The Saudi Arabia 2030-Vision Case. MATHEMATICS 2021. [DOI: 10.3390/math9101117] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In the present paper, a wavelet method is proposed to study the impact of electronic media on economic situation. More precisely, wavelet techniques are applied versus classical methods to analyze economic indices in the market. The technique consists firstly of filtering the data from unprecise circumstances (noise) to construct next a wavelet denoised contingency table. Next, a thresholding procedure is applied to such a table to extract the essential information porters. The resulting table subject finally to correspondence analysis before and after thresholding. As a case of study, the KSA 2030-vision is considered in the empirical part based on electronic and social media. Effects of the electronic media texts about the trading 2030 vision on the Saudi and global economy has been studied. Recall that the Saudi market is the most important representative market in the GCC continent. It has both regional and worldwide influence on economies and besides, it is characterized by many political, economic and financial movements such as the worldwide economic NEOM project. The findings provided in the present paper may be applied to predict the future situation of markets in GCC region and may constitute therefore a guide for investors to decide about investing or not in these markets.
Collapse
|
3
|
Ghorbani M, Jonckheere EA, Bogdan P. Gene Expression Is Not Random: Scaling, Long-Range Cross-Dependence, and Fractal Characteristics of Gene Regulatory Networks. Front Physiol 2018; 9:1446. [PMID: 30459629 PMCID: PMC6232942 DOI: 10.3389/fphys.2018.01446] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2017] [Accepted: 09/24/2018] [Indexed: 11/30/2022] Open
Abstract
Gene expression is a vital process through which cells react to the environment and express functional behavior. Understanding the dynamics of gene expression could prove crucial in unraveling the physical complexities involved in this process. Specifically, understanding the coherent complex structure of transcriptional dynamics is the goal of numerous computational studies aiming to study and finally control cellular processes. Here, we report the scaling properties of gene expression time series in Escherichia coli and Saccharomyces cerevisiae. Unlike previous studies, which report the fractal and long-range dependency of DNA structure, we investigate the individual gene expression dynamics as well as the cross-dependency between them in the context of gene regulatory network. Our results demonstrate that the gene expression time series display fractal and long-range dependence characteristics. In addition, the dynamics between genes and linked transcription factors in gene regulatory networks are also fractal and long-range cross-correlated. The cross-correlation exponents in gene regulatory networks are not unique. The distribution of the cross-correlation exponents of gene regulatory networks for several types of cells can be interpreted as a measure of the complexity of their functional behavior.
Collapse
Affiliation(s)
| | | | - Paul Bogdan
- Electrical Engineering Department, University of Southern California, Los Angeles, CA, United States
| |
Collapse
|
4
|
ALUminating the Path of Atherosclerosis Progression: Chaos Theory Suggests a Role for Alu Repeats in the Development of Atherosclerotic Vascular Disease. Int J Mol Sci 2018; 19:ijms19061734. [PMID: 29895733 PMCID: PMC6032270 DOI: 10.3390/ijms19061734] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2018] [Revised: 06/04/2018] [Accepted: 06/09/2018] [Indexed: 12/12/2022] Open
Abstract
Atherosclerosis (ATH) and coronary artery disease (CAD) are chronic inflammatory diseases with an important genetic background; they derive from the cumulative effect of multiple common risk alleles, most of which are located in genomic noncoding regions. These complex diseases behave as nonlinear dynamical systems that show a high dependence on their initial conditions; thus, long-term predictions of disease progression are unreliable. One likely possibility is that the nonlinear nature of ATH could be dependent on nonlinear correlations in the structure of the human genome. In this review, we show how chaos theory analysis has highlighted genomic regions that have shared specific structural constraints, which could have a role in ATH progression. These regions were shown to be enriched with repetitive sequences of the Alu family, genomic parasites that have colonized the human genome, which show a particular secondary structure and are involved in the regulation of gene expression. Here, we show the impact of Alu elements on the mechanisms that regulate gene expression, especially highlighting the molecular mechanisms via which the Alu elements alter the inflammatory response. We devote special attention to their relationship with the long noncoding RNA (lncRNA); antisense noncoding RNA in the INK4 locus (ANRIL), a risk factor for ATH; their role as microRNA (miRNA) sponges; and their ability to interfere with the regulatory circuitry of the (nuclear factor kappa B) NF-κB response. We aim to characterize ATH as a nonlinear dynamic system, in which small initial alterations in the expression of a number of repetitive elements are somehow amplified to reach phenotypic significance.
Collapse
|
5
|
A combinatorial approach to the design of vaccines. J Math Biol 2014; 70:1327-58. [PMID: 24859149 DOI: 10.1007/s00285-014-0797-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2013] [Revised: 04/28/2014] [Indexed: 02/05/2023]
Abstract
We present two new problems of combinatorial optimization and discuss their applications to the computational design of vaccines. In the shortest λ-superstring problem, given a family S1,...,S(k) of strings over a finite alphabet, a set Τ of "target" strings over that alphabet, and an integer λ, the task is to find a string of minimum length containing, for each i, at least λ target strings as substrings of S(i). In the shortest λ-cover superstring problem, given a collection X1,...,X(n) of finite sets of strings over a finite alphabet and an integer λ, the task is to find a string of minimum length containing, for each i, at least λ elements of X(i) as substrings. The two problems are polynomially equivalent, and the shortest λ-cover superstring problem is a common generalization of two well known combinatorial optimization problems, the shortest common superstring problem and the set cover problem. We present two approaches to obtain exact or approximate solutions to the shortest λ-superstring and λ-cover superstring problems: one based on integer programming, and a hill-climbing algorithm. An application is given to the computational design of vaccines and the algorithms are applied to experimental data taken from patients infected by H5N1 and HIV-1.
Collapse
|
6
|
|
7
|
De la Fuente IM, Cortes JM, Pelta DA, Veguillas J. Attractor metabolic networks. PLoS One 2013; 8:e58284. [PMID: 23554883 PMCID: PMC3598861 DOI: 10.1371/journal.pone.0058284] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2012] [Accepted: 02/01/2013] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND The experimental observations and numerical studies with dissipative metabolic networks have shown that cellular enzymatic activity self-organizes spontaneously leading to the emergence of a Systemic Metabolic Structure in the cell, characterized by a set of different enzymatic reactions always locked into active states (metabolic core) while the rest of the catalytic processes are only intermittently active. This global metabolic structure was verified for Escherichia coli, Helicobacter pylori and Saccharomyces cerevisiae, and it seems to be a common key feature to all cellular organisms. In concordance with these observations, the cell can be considered a complex metabolic network which mainly integrates a large ensemble of self-organized multienzymatic complexes interconnected by substrate fluxes and regulatory signals, where multiple autonomous oscillatory and quasi-stationary catalytic patterns simultaneously emerge. The network adjusts the internal metabolic activities to the external change by means of flux plasticity and structural plasticity. METHODOLOGY/PRINCIPAL FINDINGS In order to research the systemic mechanisms involved in the regulation of the cellular enzymatic activity we have studied different catalytic activities of a dissipative metabolic network under different external stimuli. The emergent biochemical data have been analysed using statistical mechanic tools, studying some macroscopic properties such as the global information and the energy of the system. We have also obtained an equivalent Hopfield network using a Boltzmann machine. Our main result shows that the dissipative metabolic network can behave as an attractor metabolic network. CONCLUSIONS/SIGNIFICANCE We have found that the systemic enzymatic activities are governed by attractors with capacity to store functional metabolic patterns which can be correctly recovered from specific input stimuli. The network attractors regulate the catalytic patterns, modify the efficiency in the connection between the multienzymatic complexes, and stably retain these modifications. Here for the first time, we have introduced the general concept of attractor metabolic network, in which this dynamic behavior is observed.
Collapse
Affiliation(s)
- Ildefonso M De la Fuente
- Quantitative Biomedicine Unit, BioCruces Health Research Institute, Barakaldo, Basque Country, Spain.
| | | | | | | |
Collapse
|
8
|
Ye L, Chen H, Liu T, Wu Z, Li J, Zhou R. A WAVELET APPROACH FOR THE ANALYSIS OF FOLDING TRAJECTORY OF PROTEIN TRP-CAGE. J Bioinform Comput Biol 2011; 3:1351-70. [PMID: 16374911 DOI: 10.1142/s0219720005001594] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2005] [Revised: 08/10/2005] [Accepted: 08/15/2005] [Indexed: 11/18/2022]
Abstract
Understanding how protein folds into a functional native structure is arguably one of the most challenging problems remaining in computational biology. Currently, the protein folding mechanism is often characterized by calculating the free energy landscape in terms of various reaction coordinates such as the fraction of native contacts, the radius of gyration, the RMS deviation from the native and so on. In this paper, we present a wavelet approach towards understanding the global state changes during protein folding. The approach is based on the wavelet analysis on the trajectories of various reaction coordinates to identify the significant intermediate states or structural motifs in the folding process. We demonstrate through an example protein Trp-cage that this approach extracts crucial information about protein folding intermediate states as well as the time correlation among these states. Furthermore, the current approach reveals a meaningful structural pattern that had been overlooked in previous works, which provides a better understanding of the folding mechanism as well as the limitation of the current force fields.
Collapse
Affiliation(s)
- Lei Ye
- Department of Computer Science, Zhejiang University, Hangzhou, China
| | | | | | | | | | | |
Collapse
|
9
|
Moukhtar J, Vaillant C, Audit B, Arneodo A. Revisiting polymer statistical physics to account for the presence of long-range-correlated structural disorder in 2D DNA chains. THE EUROPEAN PHYSICAL JOURNAL. E, SOFT MATTER 2011; 34:119. [PMID: 22083495 DOI: 10.1140/epje/i2011-11119-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2011] [Accepted: 10/11/2011] [Indexed: 05/31/2023]
Abstract
We elaborate on a generalization of the 2D wormlike chain (WLC) model that accounts for the presence of long-range correlations (LRC) in the intrinsic curvature distribution of eukaryotic DNA. This model predicts some decrease of the DNA persistence length resulting from some large-scale intrinsic curvature induced by sequence-dependent persistent random distribution of local bending sites. When assisting exact analytical calculations by numerical DNA simulations, we show that the conjugated contributions of i) the thermal curvature fluctuations characterized by the "dynamic" persistence length ℓ(p)(d) = 2A, where A is the elastic bending modulus, and ii) the intrinsic LRC curvature disorder of amplitude σ(o) and Hurst exponent H > 1/2, characterized by a "static" persistence length ℓ(p)(H) = A(1/2H)σ(o)(-1/H) Γ(1/2H + 1), can be described by a continuum of generalized WLC (GWLC) models parametrized by the LRC exponent H. We use perturbation analysis to investigate the two limiting cases of weak static disorder (w(H) << 1 and weak dynamical fluctuations (1/w (H) << 1), where w(H) = l(p)(d)/l(p)(H) is a dimensionless parameter. From a quantitative point of view, our study demonstrates that even for a small value of the LRC (H approximately equal 0.6-0.8) static disorder amplitude σ(o) ~ 10(-2), as previously reported for genomic DNA, the decrease of the persistence length from the WLC prediction l(p)(d) can be very significant, up to twofold. The implications of these results on the first steps of compaction of DNA in eukaryotic cells are discussed.
Collapse
|
10
|
Chevereau G, Arneodo A, Vaillant C. Influence of the genomic sequence on the primary structure of chromatin. FRONTIERS IN LIFE SCIENCE 2011. [DOI: 10.1080/21553769.2012.708882] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
11
|
de la Fuente IM. Quantitative analysis of cellular metabolic dissipative, self-organized structures. Int J Mol Sci 2010; 11:3540-99. [PMID: 20957111 PMCID: PMC2956111 DOI: 10.3390/ijms11093540] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2010] [Revised: 09/11/2010] [Accepted: 09/12/2010] [Indexed: 11/16/2022] Open
Abstract
One of the most important goals of the postgenomic era is understanding the metabolic dynamic processes and the functional structures generated by them. Extensive studies during the last three decades have shown that the dissipative self-organization of the functional enzymatic associations, the catalytic reactions produced during the metabolite channeling, the microcompartmentalization of these metabolic processes and the emergence of dissipative networks are the fundamental elements of the dynamical organization of cell metabolism. Here we present an overview of how mathematical models can be used to address the properties of dissipative metabolic structures at different organizational levels, both for individual enzymatic associations and for enzymatic networks. Recent analyses performed with dissipative metabolic networks have shown that unicellular organisms display a singular global enzymatic structure common to all living cellular organisms, which seems to be an intrinsic property of the functional metabolism as a whole. Mathematical models firmly based on experiments and their corresponding computational approaches are needed to fully grasp the molecular mechanisms of metabolic dynamical processes. They are necessary to enable the quantitative and qualitative analysis of the cellular catalytic reactions and also to help comprehend the conditions under which the structural dynamical phenomena and biological rhythms arise. Understanding the molecular mechanisms responsible for the metabolic dissipative structures is crucial for unraveling the dynamics of cellular life.
Collapse
Affiliation(s)
- Ildefonso Martínez de la Fuente
- Institute of Parasitology and Biomedicine "López-Neyra" (CSIC), Parque Tecnológico de Ciencias de la Salud, Avenida del Conocimiento s/n, 18100 Armilla (Granada), Spain; E-Mail: ; Tel.: +34-958-18-16-21
| |
Collapse
|
12
|
De la Fuente IM, Vadillo F, Pérez-Samartín AL, Pérez-Pinilla MB, Bidaurrazaga J, Vera-López A. Global self-regulation of the cellular metabolic structure. PLoS One 2010; 5:e9484. [PMID: 20209156 PMCID: PMC2830472 DOI: 10.1371/journal.pone.0009484] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2009] [Accepted: 02/04/2010] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Different studies have shown that cellular enzymatic activities are able to self-organize spontaneously, forming a metabolic core of reactive processes that remain active under different growth conditions while the rest of the molecular catalytic reactions exhibit structural plasticity. This global cellular metabolic structure appears to be an intrinsic characteristic common to all cellular organisms. Recent work performed with dissipative metabolic networks has shown that the fundamental element for the spontaneous emergence of this global self-organized enzymatic structure could be the number of catalytic elements in the metabolic networks. METHODOLOGY/PRINCIPAL FINDINGS In order to investigate the factors that may affect the catalytic dynamics under a global metabolic structure characterized by the presence of metabolic cores we have studied different transitions in catalytic patterns belonging to a dissipative metabolic network. The data were analyzed using non-linear dynamics tools: power spectra, reconstructed attractors, long-term correlations, maximum Lyapunov exponent and Approximate Entropy; and we have found the emergence of self-regulation phenomena during the transitions in the metabolic activities. CONCLUSIONS/SIGNIFICANCE The analysis has also shown that the chaotic numerical series analyzed correspond to the fractional Brownian motion and they exhibit long-term correlations and low Approximate Entropy indicating a high level of predictability and information during the self-regulation of the metabolic transitions. The results illustrate some aspects of the mechanisms behind the emergence of the metabolic self-regulation processes, which may constitute an important property of the global structure of the cellular metabolism.
Collapse
|
13
|
Jost D, Everaers R. Genome wide application of DNA melting analysis. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2009; 21:034108. [PMID: 21817253 DOI: 10.1088/0953-8984/21/3/034108] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Correspondences between functional and thermodynamic melting properties in a genome are being increasingly employed for ab initio gene finding and for the interpretation of the evolution of genomes. Here we present the first systematic genome wide comparison between biologically coding domains and thermodynamically stable regions. In particular, we develop statistical methods to estimate the reliability of the resulting predictions. Not surprisingly, we find that the success of the approach depends on the difference in GC content between the coding and the non-coding parts of the genome and on the percentage of coding base-pairs in the sequence. These prerequisites vary strongly between species, where we observe no systematic differences between eukaryotes and prokaryotes. We find a number of organisms in which the strong correlation of coding domains and thermodynamically stable regions allows us to identify putative exons or genes to complement existing approaches. In contrast to previous investigations along these lines we have not employed the Poland-Scheraga (PS) model of DNA melting but use the earlier Zimm-Bragg (ZB) model. The Ising-like form of the ZB model can be viewed as an approximation to the PS model, with averaged loop entropies included into the cooperative factor [Formula: see text]. This results in a speed-up by a factor of 20-100 compared to the Fixman-Freire algorithm for the solution of the PS model. We show that for genomic sequences the resulting systematic errors are negligible compared to the parameterization uncertainty of the models. We argue that for limited computing resources, available CPU power is better invested in broadening the statistical base for genomic investigations than in marginal improvements of the description of the physical melting behavior.
Collapse
Affiliation(s)
- Daniel Jost
- Laboratoire de Physique de l'École Normale Supérieure de Lyon, Université de Lyon, CNRS UMR 5672, 46 Allée d'Italie 69364 Lyon Cedex 07, France
| | | |
Collapse
|
14
|
Global self-organization of the cellular metabolic structure. PLoS One 2008; 3:e3100. [PMID: 18769681 PMCID: PMC2519785 DOI: 10.1371/journal.pone.0003100] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2008] [Accepted: 07/21/2008] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Over many years, it has been assumed that enzymes work either in an isolated way, or organized in small catalytic groups. Several studies performed using "metabolic networks models" are helping to understand the degree of functional complexity that characterizes enzymatic dynamic systems. In a previous work, we used "dissipative metabolic networks" (DMNs) to show that enzymes can present a self-organized global functional structure, in which several sets of enzymes are always in an active state, whereas the rest of molecular catalytic sets exhibit dynamics of on-off changing states. We suggested that this kind of global metabolic dynamics might be a genuine and universal functional configuration of the cellular metabolic structure, common to all living cells. Later, a different group has shown experimentally that this kind of functional structure does, indeed, exist in several microorganisms. METHODOLOGY/PRINCIPAL FINDINGS Here we have analyzed around 2.500.000 different DMNs in order to investigate the underlying mechanism of this dynamic global configuration. The numerical analyses that we have performed show that this global configuration is an emergent property inherent to the cellular metabolic dynamics. Concretely, we have found that the existence of a high number of enzymatic subsystems belonging to the DMNs is the fundamental element for the spontaneous emergence of a functional reactive structure characterized by a metabolic core formed by several sets of enzymes always in an active state. CONCLUSIONS/SIGNIFICANCE This self-organized dynamic structure seems to be an intrinsic characteristic of metabolism, common to all living cellular organisms. To better understand cellular functionality, it will be crucial to structurally characterize these enzymatic self-organized global structures.
Collapse
|
15
|
Te Boekhorst R, Abnizova I, Nehaniv C. Discriminating coding, non-coding and regulatory regions using rescaled range and detrended fluctuation analysis. Biosystems 2007; 91:183-94. [PMID: 18029086 DOI: 10.1016/j.biosystems.2007.05.019] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2006] [Revised: 03/27/2007] [Accepted: 05/24/2007] [Indexed: 11/19/2022]
Abstract
In this paper we analyse the efficiency of two methods, rescaled range analysis and detrended fluctuation analysis, in distinguishing between coding DNA, regulatory DNA and non-coding non-regulatory DNA of Drosophila melanogaster. Both methods were used to estimate the degree of sequential dependence (or persistence) among nucleotides. We found that these three types of DNA can be discriminated by both methods, although rescaled range analysis performs slightly better than detrended fluctuation analysis. On average, non-coding, non-regulatory DNA has the highest degree of sequential persistence. Coding DNA could be characterised as being anti-persistent, which is in line with earlier findings of latent periodicity. Regulatory regions are shown to possess intermediate sequential dependency. Together with other available methods, rescaled range and detrended fluctuation analysis on the basis of a combined purine/pyrimidine and weak/strong classification of the nucleotides are useful tools for refined structural and functional segmentation of DNA.
Collapse
Affiliation(s)
- Rene Te Boekhorst
- School of Computer Science, University of Hertfordshire, College Lane, Hatfield, AL10 9AB Hertfordshire, UK
| | | | | |
Collapse
|
16
|
Thurman RE, Day N, Noble WS, Stamatoyannopoulos JA. Identification of higher-order functional domains in the human ENCODE regions. Genes Dev 2007; 17:917-27. [PMID: 17568007 PMCID: PMC1891350 DOI: 10.1101/gr.6081407] [Citation(s) in RCA: 96] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2006] [Accepted: 03/27/2007] [Indexed: 11/25/2022]
Abstract
It has long been posited that human and other large genomes are organized into higher-order (i.e., greater than gene-sized) functional domains. We hypothesized that diverse experimental data types generated by The ENCODE Project Consortium could be combined to delineate active and quiescent or repressed functional domains and thereby illuminate the higher-order functional architecture of the genome. To address this, we coupled wavelet analysis with hidden Markov models for unbiased discovery of "domain-level" behavior in high-resolution functional genomic data, including activating and repressive histone modifications, RNA output, and DNA replication timing. We find that higher-order patterns in these data types are largely concordant and may be analyzed collectively in the context of HeLa cells to delineate 53 active and 62 repressed functional domains within the ENCODE regions. Active domains comprise approximately 44% of the ENCODE regions but contain approximately 75%-80% of annotated genes, transcripts, and CpG islands. Repressed domains are enriched in certain classes of repetitive elements and, surprisingly, in evolutionarily conserved nonexonic sequences. The functional domain structure of the ENCODE regions appears to be largely stable across different cell types. Taken together, our results suggest that higher-order functional domains represent a fundamental organizing principle of human genome architecture.
Collapse
Affiliation(s)
- Robert E. Thurman
- Division of Medical Genetics, University of Washington, Seattle, Washington 98195, USA
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Nathan Day
- Department of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA
| | - William S. Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
- Department of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA
| | | |
Collapse
|
17
|
Allen TE, Price ND, Joyce AR, Palsson BØ. Long-range periodic patterns in microbial genomes indicate significant multi-scale chromosomal organization. PLoS Comput Biol 2006; 2:e2. [PMID: 16410829 PMCID: PMC1326223 DOI: 10.1371/journal.pcbi.0020002] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2005] [Accepted: 12/07/2005] [Indexed: 01/02/2023] Open
Abstract
Genome organization can be studied through analysis of chromosome position-dependent patterns in sequence-derived parameters. A comprehensive analysis of such patterns in prokaryotic sequences and genome-scale functional data has yet to be performed. We detected spatial patterns in sequence-derived parameters for 163 chromosomes occurring in 135 bacterial and 16 archaeal organisms using wavelet analysis. Pattern strength was found to correlate with organism-specific features such as genome size, overall GC content, and the occurrence of known motility and chromosomal binding proteins. Given additional functional data for Escherichia coli, we found significant correlations among chromosome position dependent patterns in numerous properties, some of which are consistent with previously experimentally identified chromosome macrodomains. These results demonstrate that the large-scale organization of most sequenced genomes is significantly nonrandom, and, moreover, that this organization is likely linked to genome size, nucleotide composition, and information transfer processes. Constraints on genome evolution and design are thus not solely dependent upon information content, but also upon an intricate multi-parameter, multi-length-scale organization of the chromosome.
Collapse
Affiliation(s)
- Timothy E Allen
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Nathan D Price
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Andrew R Joyce
- Bioinformatics Program, University of California San Diego, La Jolla, California, United States of America
| | - Bernhard Ø Palsson
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|