51
|
Paralkar VR, Taborda CC, Huang P, Yao Y, Kossenkov AV, Prasad R, Luan J, Davies JOJ, Hughes JR, Hardison RC, Blobel GA, Weiss MJ. Unlinking an lncRNA from Its Associated cis Element. Mol Cell 2016; 62:104-10. [PMID: 27041223 DOI: 10.1016/j.molcel.2016.02.029] [Citation(s) in RCA: 176] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2015] [Revised: 02/01/2016] [Accepted: 02/24/2016] [Indexed: 01/24/2023]
Abstract
Long non-coding (lnc) RNAs can regulate gene expression and protein functions. However, the proportion of lncRNAs with biological activities among the thousands expressed in mammalian cells is controversial. We studied Lockd (lncRNA downstream of Cdkn1b), a 434-nt polyadenylated lncRNA originating 4 kb 3' to the Cdkn1b gene. Deletion of the 25-kb Lockd locus reduced Cdkn1b transcription by approximately 70% in an erythroid cell line. In contrast, homozygous insertion of a polyadenylation cassette 80 bp downstream of the Lockd transcription start site reduced the entire lncRNA transcript level by >90% with no effect on Cdkn1b transcription. The Lockd promoter contains a DNase-hypersensitive site, binds numerous transcription factors, and physically associates with the Cdkn1b promoter in chromosomal conformation capture studies. Therefore, the Lockd gene positively regulates Cdkn1b transcription through an enhancer-like cis element, whereas the lncRNA itself is dispensable, which may be the case for other lncRNAs.
Collapse
|
52
|
Lichtenberg J, Heuston EF, Mishra T, Keller CA, Hardison RC, Bodine DM. SBR-Blood: systems biology repository for hematopoietic cells. Nucleic Acids Res 2015; 44:D925-31. [PMID: 26590403 PMCID: PMC4702891 DOI: 10.1093/nar/gkv1263] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2015] [Accepted: 11/04/2015] [Indexed: 12/14/2022] Open
Abstract
Extensive research into hematopoiesis (the development of blood cells) over several decades has generated large sets of expression and epigenetic profiles in multiple human and mouse blood cell types. However, there is no single location to analyze how gene regulatory processes lead to different mature blood cells. We have developed a new database framework called hematopoietic Systems Biology Repository (SBR-Blood), available online at http://sbrblood.nhgri.nih.gov, which allows user-initiated analyses for cell type correlations or gene-specific behavior during differentiation using publicly available datasets for array- and sequencing-based platforms from mouse hematopoietic cells. SBR-Blood organizes information by both cell identity and by hematopoietic lineage. The validity and usability of SBR-Blood has been established through the reproduction of workflows relevant to expression data, DNA methylation, histone modifications and transcription factor occupancy profiles.
Collapse
|
53
|
Jain D, Mishra T, Giardine BM, Keller CA, Morrissey CS, Magargee S, Dorman CM, Long M, Weiss MJ, Hardison RC. Dynamics of GATA1 binding and expression response in a GATA1-induced erythroid differentiation system. GENOMICS DATA 2015; 4:1-7. [PMID: 25729644 PMCID: PMC4338950 DOI: 10.1016/j.gdata.2015.01.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
During the maturation phase of mammalian erythroid differentiation, highly proliferative cells committed to the erythroid lineage undergo dramatic changes in morphology and function to produce circulating, enucleated erythrocytes. These changes are caused by equally dramatic alterations in gene expression, which in turn are driven by changes in the abundance and binding patterns of transcription factors such as GATA1. We have studied the dynamics of GATA1 binding by ChIP-seq and the global expression responses by RNA-seq in a GATA1-dependent mouse cell line model for erythroid maturation, in both cases examining seven progressive stages during differentiation. Analyses of these data should provide insights both into mechanisms of regulation (early versus late targets) and the consequences in cell physiology (e.g., distinctive categories of genes regulated at progressive stages of differentiation). The data are deposited in the Gene Expression Omnibus, series GSE36029, GSE40522, GSE49847, and GSE51338.
Collapse
|
54
|
Dogan N, Wu W, Morrissey CS, Chen KB, Stonestrom A, Long M, Keller CA, Cheng Y, Jain D, Visel A, Pennacchio LA, Weiss MJ, Blobel GA, Hardison RC. Occupancy by key transcription factors is a more accurate predictor of enhancer activity than histone modifications or chromatin accessibility. Epigenetics Chromatin 2015; 8:16. [PMID: 25984238 PMCID: PMC4432502 DOI: 10.1186/s13072-015-0009-5] [Citation(s) in RCA: 89] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2015] [Accepted: 04/02/2015] [Indexed: 12/12/2022] Open
Abstract
Background Regulated gene expression controls organismal development, and variation in regulatory patterns has been implicated in complex traits. Thus accurate prediction of enhancers is important for further understanding of these processes. Genome-wide measurement of epigenetic features, such as histone modifications and occupancy by transcription factors, is improving enhancer predictions, but the contribution of these features to prediction accuracy is not known. Given the importance of the hematopoietic transcription factor TAL1 for erythroid gene activation, we predicted candidate enhancers based on genomic occupancy by TAL1 and measured their activity. Contributions of multiple features to enhancer prediction were evaluated based on the results of these and other studies. Results TAL1-bound DNA segments were active enhancers at a high rate both in transient transfections of cultured cells (39 of 79, or 56%) and transgenic mice (43 of 66, or 65%). The level of binding signal for TAL1 or GATA1 did not help distinguish TAL1-bound DNA segments as active versus inactive enhancers, nor did the density of regulation-related histone modifications. A meta-analysis of results from this and other studies (273 tested predicted enhancers) showed that the presence of TAL1, GATA1, EP300, SMAD1, H3K4 methylation, H3K27ac, and CAGE tags at DNase hypersensitive sites gave the most accurate predictors of enhancer activity, with a success rate over 80% and a median threefold increase in activity. Chromatin accessibility assays and the histone modifications H3K4me1 and H3K27ac were sensitive for finding enhancers, but they have high false positive rates unless transcription factor occupancy is also included. Conclusions Occupancy by key transcription factors such as TAL1, GATA1, SMAD1, and EP300, along with evidence of transcription, improves the accuracy of enhancer predictions based on epigenetic features. Electronic supplementary material The online version of this article (doi:10.1186/s13072-015-0009-5) contains supplementary material, which is available to authorized users.
Collapse
|
55
|
Makova KD, Hardison RC. The effects of chromatin organization on variation in mutation rates in the genome. Nat Rev Genet 2015; 16:213-23. [PMID: 25732611 PMCID: PMC4500049 DOI: 10.1038/nrg3890] [Citation(s) in RCA: 145] [Impact Index Per Article: 16.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The variation in local rates of mutations can affect both the evolution of genes and their function in normal and cancer cells. Deciphering the molecular determinants of this variation will be aided by the elucidation of distinct types of mutations, as they differ in regional preferences and in associations with genomic features. Chromatin organization contributes to regional variation in mutation rates, but its contribution differs among mutation types. In both germline and somatic mutations, base substitutions are more abundant in regions of closed chromatin, perhaps reflecting error accumulation late in replication. By contrast, a distinctive mutational state with very high levels of insertions and deletions (indels) and substitutions is enriched in regions of open chromatin. These associations indicate an intricate interplay between the nucleotide sequence of DNA and its dynamic packaging into chromatin, and have important implications for current biomedical research. This Review focuses on recent studies showing associations between chromatin state and mutation rates, including pairwise and multivariate investigations of germline and somatic (particularly cancer) mutations.
Collapse
|
56
|
Byrska-Bishop M, VanDorn D, Campbell AE, Betensky M, Arca PR, Yao Y, Gadue P, Costa FF, Nemiroff RL, Blobel GA, French DL, Hardison RC, Weiss MJ, Chou ST. Pluripotent stem cells reveal erythroid-specific activities of the GATA1 N-terminus. J Clin Invest 2015; 125:993-1005. [PMID: 25621499 DOI: 10.1172/jci75714] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2014] [Accepted: 12/15/2014] [Indexed: 01/13/2023] Open
Abstract
Germline GATA1 mutations that result in the production of an amino-truncated protein termed GATA1s (where s indicates short) cause congenital hypoplastic anemia. In patients with trisomy 21, similar somatic GATA1s-producing mutations promote transient myeloproliferative disease and acute megakaryoblastic leukemia. Here, we demonstrate that induced pluripotent stem cells (iPSCs) from patients with GATA1-truncating mutations exhibit impaired erythroid potential, but enhanced megakaryopoiesis and myelopoiesis, recapitulating the major phenotypes of the associated diseases. Similarly, in developmentally arrested GATA1-deficient murine megakaryocyte-erythroid progenitors derived from murine embryonic stem cells (ESCs), expression of GATA1s promoted megakaryopoiesis, but not erythropoiesis. Transcriptome analysis revealed a selective deficiency in the ability of GATA1s to activate erythroid-specific genes within populations of hematopoietic progenitors. Although its DNA-binding domain was intact, chromatin immunoprecipitation studies showed that GATA1s binding at specific erythroid regulatory regions was impaired, while binding at many nonerythroid sites, including megakaryocytic and myeloid target genes, was normal. Together, these observations indicate that lineage-specific GATA1 cofactor associations are essential for normal chromatin occupancy and provide mechanistic insights into how GATA1s mutations cause human disease. More broadly, our studies underscore the value of ESCs and iPSCs to recapitulate and study disease phenotypes.
Collapse
|
57
|
Yue F, Cheng Y, Breschi A, Vierstra J, Wu W, Ryba T, Sandstrom R, Ma Z, Davis C, Pope BD, Shen Y, Pervouchine DD, Djebali S, Thurman RE, Kaul R, Rynes E, Kirilusha A, Marinov GK, Williams BA, Trout D, Amrhein H, Fisher-Aylor K, Antoshechkin I, DeSalvo G, See LH, Fastuca M, Drenkow J, Zaleski C, Dobin A, Prieto P, Lagarde J, Bussotti G, Tanzer A, Denas O, Li K, Bender MA, Zhang M, Byron R, Groudine MT, McCleary D, Pham L, Ye Z, Kuan S, Edsall L, Wu YC, Rasmussen MD, Bansal MS, Kellis M, Keller CA, Morrissey CS, Mishra T, Jain D, Dogan N, Harris RS, Cayting P, Kawli T, Boyle AP, Euskirchen G, Kundaje A, Lin S, Lin Y, Jansen C, Malladi VS, Cline MS, Erickson DT, Kirkup VM, Learned K, Sloan CA, Rosenbloom KR, Lacerda de Sousa B, Beal K, Pignatelli M, Flicek P, Lian J, Kahveci T, Lee D, Kent WJ, Ramalho Santos M, Herrero J, Notredame C, Johnson A, Vong S, Lee K, Bates D, Neri F, Diegel M, Canfield T, Sabo PJ, Wilken MS, Reh TA, Giste E, Shafer A, Kutyavin T, Haugen E, Dunn D, Reynolds AP, Neph S, Humbert R, Hansen RS, De Bruijn M, Selleri L, Rudensky A, Josefowicz S, Samstein R, Eichler EE, Orkin SH, Levasseur D, Papayannopoulou T, Chang KH, Skoultchi A, Gosh S, Disteche C, Treuting P, Wang Y, Weiss MJ, Blobel GA, Cao X, Zhong S, Wang T, Good PJ, Lowdon RF, Adams LB, Zhou XQ, Pazin MJ, Feingold EA, Wold B, Taylor J, Mortazavi A, Weissman SM, Stamatoyannopoulos JA, Snyder MP, Guigo R, Gingeras TR, Gilbert DM, Hardison RC, Beer MA, Ren B. A comparative encyclopedia of DNA elements in the mouse genome. Nature 2015; 515:355-64. [PMID: 25409824 PMCID: PMC4266106 DOI: 10.1038/nature13992] [Citation(s) in RCA: 1135] [Impact Index Per Article: 126.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Accepted: 10/24/2014] [Indexed: 12/11/2022]
Abstract
The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.
Collapse
|
58
|
Cheng Y, Ma Z, Kim BH, Wu W, Cayting P, Boyle AP, Sundaram V, Xing X, Dogan N, Li J, Euskirchen G, Lin S, Lin Y, Visel A, Kawli T, Yang X, Patacsil D, Keller CA, Giardine B, Kundaje A, Wang T, Pennacchio LA, Weng Z, Hardison RC, Snyder MP. Principles of regulatory information conservation between mouse and human. Nature 2015; 515:371-375. [PMID: 25409826 PMCID: PMC4343047 DOI: 10.1038/nature13985] [Citation(s) in RCA: 189] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2014] [Accepted: 10/21/2014] [Indexed: 11/09/2022]
Abstract
To broaden our understanding of the evolution of gene regulation mechanisms, we generated occupancy profiles for 34 orthologous transcription factors (TFs) in human-mouse erythroid progenitor, lymphoblast and embryonic stem-cell lines. By combining the genome-wide transcription factor occupancy repertoires, associated epigenetic signals, and co-association patterns, here we deduce several evolutionary principles of gene regulatory features operating since the mouse and human lineages diverged. The genomic distribution profiles, primary binding motifs, chromatin states, and DNA methylation preferences are well conserved for TF-occupied sequences. However, the extent to which orthologous DNA segments are bound by orthologous TFs varies both among TFs and with genomic location: binding at promoters is more highly conserved than binding at distal elements. Notably, occupancy-conserved TF-occupied sequences tend to be pleiotropic; they function in several tissues and also co-associate with many TFs. Single nucleotide variants at sites with potential regulatory functions are enriched in occupancy-conserved TF-occupied sequences.
Collapse
|
59
|
Hsiung CCS, Morrissey CS, Udugama M, Frank CL, Keller CA, Baek S, Giardine B, Crawford GE, Sung MH, Hardison RC, Blobel GA. Genome accessibility is widely preserved and locally modulated during mitosis. Genome Res 2014; 25:213-25. [PMID: 25373146 PMCID: PMC4315295 DOI: 10.1101/gr.180646.114] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Mitosis entails global alterations to chromosome structure and nuclear architecture, concomitant with transient silencing of transcription. How cells transmit transcriptional states through mitosis remains incompletely understood. While many nuclear factors dissociate from mitotic chromosomes, the observation that certain nuclear factors and chromatin features remain associated with individual loci during mitosis originated the hypothesis that such mitotically retained molecular signatures could provide transcriptional memory through mitosis. To understand the role of chromatin structure in mitotic memory, we performed the first genome-wide comparison of DNase I sensitivity of chromatin in mitosis and interphase, using a murine erythroblast model. Despite chromosome condensation during mitosis visible by microscopy, the landscape of chromatin accessibility at the macromolecular level is largely unaltered. However, mitotic chromatin accessibility is locally dynamic, with individual loci maintaining none, some, or all of their interphase accessibility. Mitotic reduction in accessibility occurs primarily within narrow, highly DNase hypersensitive sites that frequently coincide with transcription factor binding sites, whereas broader domains of moderate accessibility tend to be more stable. In mitosis, proximal promoters generally maintain their accessibility more strongly, whereas distal regulatory elements tend to lose accessibility. Large domains of DNA hypomethylation mark a subset of promoters that retain accessibility during mitosis and across many cell types in interphase. Erythroid transcription factor GATA1 exerts site-specific changes in interphase accessibility that are most pronounced at distal regulatory elements, but has little influence on mitotic accessibility. We conclude that features of open chromatin are remarkably stable through mitosis, but are modulated at the level of individual genes and regulatory elements.
Collapse
|
60
|
Pimkin M, Kossenkov AV, Mishra T, Morrissey CS, Wu W, Keller CA, Blobel GA, Lee D, Beer MA, Hardison RC, Weiss MJ. Divergent functions of hematopoietic transcription factors in lineage priming and differentiation during erythro-megakaryopoiesis. Genome Res 2014; 24:1932-44. [PMID: 25319996 PMCID: PMC4248311 DOI: 10.1101/gr.164178.113] [Citation(s) in RCA: 74] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Combinatorial actions of relatively few transcription factors control hematopoietic differentiation. To investigate this process in erythro-megakaryopoiesis, we correlated the genome-wide chromatin occupancy signatures of four master hematopoietic transcription factors (GATA1, GATA2, TAL1, and FLI1) and three diagnostic histone modification marks with the gene expression changes that occur during development of primary cultured megakaryocytes (MEG) and primary erythroblasts (ERY) from murine fetal liver hematopoietic stem/progenitor cells. We identified a robust, genome-wide mechanism of MEG-specific lineage priming by a previously described stem/progenitor cell-expressed transcription factor heptad (GATA2, LYL1, TAL1, FLI1, ERG, RUNX1, LMO2) binding to MEG-associated cis-regulatory modules (CRMs) in multipotential progenitors. This is followed by genome-wide GATA factor switching that mediates further induction of MEG-specific genes following lineage commitment. Interaction between GATA and ETS factors appears to be a key determinant of these processes. In contrast, ERY-specific lineage priming is biased toward GATA2-independent mechanisms. In addition to its role in MEG lineage priming, GATA2 plays an extensive role in late megakaryopoiesis as a transcriptional repressor at loci defined by a specific DNA signature. Our findings reveal important new insights into how ERY and MEG lineages arise from a common bipotential progenitor via overlapping and divergent functions of shared hematopoietic transcription factors.
Collapse
|
61
|
Wu W, Morrissey CS, Keller CA, Mishra T, Pimkin M, Blobel GA, Weiss MJ, Hardison RC. Dynamic shifts in occupancy by TAL1 are guided by GATA factors and drive large-scale reprogramming of gene expression during hematopoiesis. Genome Res 2014; 24:1945-62. [PMID: 25319994 PMCID: PMC4248312 DOI: 10.1101/gr.164830.113] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
We used mouse ENCODE data along with complementary data from other laboratories to study the dynamics of occupancy and the role in gene regulation of the transcription factor TAL1, a critical regulator of hematopoiesis, at multiple stages of hematopoietic differentiation. We combined ChIP-seq and RNA-seq data in six mouse cell types representing a progression from multilineage precursors to differentiated erythroblasts and megakaryocytes. We found that sites of occupancy shift dramatically during commitment to the erythroid lineage, vary further during terminal maturation, and are strongly associated with changes in gene expression. In multilineage progenitors, the likely target genes are enriched for hematopoietic growth and functions associated with the mature cells of specific daughter lineages (such as megakaryocytes). In contrast, target genes in erythroblasts are specifically enriched for red cell functions. Furthermore, shifts in TAL1 occupancy during erythroid differentiation are associated with gene repression (dissociation) and induction (co-occupancy with GATA1). Based on both enrichment for transcription factor binding site motifs and co-occupancy determined by ChIP-seq, recruitment by GATA transcription factors appears to be a stronger determinant of TAL1 binding to chromatin than the canonical E-box binding site motif. Studies of additional proteins lead to the model that TAL1 regulates expression after being directed to a distinct subset of genomic binding sites in each cell type via its association with different complexes containing master regulators such as GATA2, ERG, and RUNX1 in multilineage cells and the lineage-specific master regulator GATA1 in erythroblasts.
Collapse
|
62
|
Chang GS, Chen XA, Park B, Rhee HS, Li P, Han KH, Mishra T, Chan-Salis KY, Li Y, Hardison RC, Wang Y, Pugh BF. A comprehensive and high-resolution genome-wide response of p53 to stress. Cell Rep 2014; 8:514-27. [PMID: 25043190 DOI: 10.1016/j.celrep.2014.06.030] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2014] [Revised: 05/22/2014] [Accepted: 06/18/2014] [Indexed: 12/22/2022] Open
Abstract
Tumor suppressor p53 regulates transcription of stress-response genes. Many p53 targets remain undiscovered because of uncertainty as to where p53 binds in the genome and the fact that few genes reside near p53-bound recognition elements (REs). Using chromatin immunoprecipitation followed by exonuclease treatment (ChIP-exo), we associated p53 with 2,183 unsplit REs. REs were positionally constrained with other REs and other regulatory elements, which may reflect structurally organized p53 interactions. Surprisingly, stress resulted in increased occupancy of transcription factor IIB (TFIIB) and RNA polymerase (Pol) II near REs, which was reduced when p53 was present. A subset associated with antisense RNA near stress-response genes. The combination of high-confidence locations for p53/REs, TFIIB/Pol II, and their changes in response to stress allowed us to identify 151 high-confidence p53-regulated genes, substantially increasing the number of p53 targets. These genes composed a large portion of a predefined DNA-damage stress-response network. Thus, p53 plays a comprehensive role in regulating the stress-response network, including regulating noncoding transcription.
Collapse
|
63
|
Lee Y, Ghosh D, Hardison RC, Zhang Y. MRHMMs: multivariate regression hidden Markov models and the variantS. ACTA ACUST UNITED AC 2014; 30:1755-6. [PMID: 24558116 DOI: 10.1093/bioinformatics/btu070] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
SUMMARY Hidden Markov models (HMMs) are flexible and widely used in scientific studies. Particularly in genomics and genetics, there are multiple distinct regimes in the genome within each of which the relationships among multivariate features are distinct. Examples include differential gene regulation depending on gene functions and experimental conditions, and varying combinatorial patterns of multiple transcription factors. We developed a software package called MRHMMs (Multivariate Regression Hidden Markov Models and the variantS) that accommodates a variety of HMMs that can be flexibly applied to many biological studies and beyond. MRHMMs supplements existing HMM software packages in two aspects. First, MRHMMs provides a diverse set of emission probability structures, including mixture of multivariate normal distributions and (logistic) regression models. Second, MRHMMs is computationally efficient for analyzing large data-sets generated in current genome-wide studies. Especially, the software is written in C for the speed advantage and further amenable to implement alternative models to meet users' own purposes. AVAILABILITY AND IMPLEMENTATION http://sourceforge.net/projects/mrhmms/
Collapse
|
64
|
|
65
|
Zheng R, Rebolledo-Jaramillo B, Zong Y, Wang L, Russo P, Hancock W, Stanger BZ, Hardison RC, Blobel GA. Function of GATA factors in the adult mouse liver. PLoS One 2013; 8:e83723. [PMID: 24367609 PMCID: PMC3867416 DOI: 10.1371/journal.pone.0083723] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2013] [Accepted: 11/06/2013] [Indexed: 11/24/2022] Open
Abstract
GATA transcription factors and their Friend of Gata (FOG) cofactors control the development of diverse tissues. GATA4 and GATA6 are essential for the expansion of the embryonic liver bud, but their expression patterns and functions in the adult liver are unclear. We characterized the expression of GATA and FOG factors in whole mouse liver and purified hepatocytes. GATA4, GATA6, and FOG1 are the most prominently expressed family members in whole liver and hepatocytes. GATA4 chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq) identified 4409 occupied sites, associated with genes enriched in ontologies related to liver function, including lipid and glucose metabolism. However, hepatocyte-specific excision of Gata4 had little impact on gross liver architecture and function, even under conditions of regenerative stress, and, despite the large number of GATA4 occupied genes, resulted in relatively few changes in gene expression. To address possible redundancy between GATA4 and GATA6, both factors were conditionally excised. Surprisingly, combined Gata4,6 loss did not exacerbate the phenotype resulting from Gata4 loss alone. This points to the presence of an unusually robust transcriptional network in adult hepatocytes that ensures the maintenance of liver function.
Collapse
|
66
|
Giardine B, Borg J, Viennas E, Pavlidis C, Moradkhani K, Joly P, Bartsakoulia M, Riemer C, Miller W, Tzimas G, Wajcman H, Hardison RC, Patrinos GP. Updates of the HbVar database of human hemoglobin variants and thalassemia mutations. Nucleic Acids Res 2013; 42:D1063-9. [PMID: 24137000 PMCID: PMC3964999 DOI: 10.1093/nar/gkt911] [Citation(s) in RCA: 318] [Impact Index Per Article: 28.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
HbVar (http://globin.bx.psu.edu/hbvar) is one of the oldest and most appreciated locus-specific databases launched in 2001 by a multi-center academic effort to provide timely information on the genomic alterations leading to hemoglobin variants and all types of thalassemia and hemoglobinopathies. Database records include extensive phenotypic descriptions, biochemical and hematological effects, associated pathology and ethnic occurrence, accompanied by mutation frequencies and references. Here, we report updates to >600 HbVar entries, inclusion of population-specific data for 28 populations and 27 ethnic groups for α-, and β-thalassemias and additional querying options in the HbVar query page. HbVar content was also inter-connected with two other established genetic databases, namely FINDbase (http://www.findbase.org) and Leiden Open-Access Variation database (http://www.lovd.nl), which allows comparative data querying and analysis. HbVar data content has contributed to the realization of two collaborative projects to identify genomic variants that lie on different globin paralogs. Most importantly, HbVar data content has contributed to demonstrate the microattribution concept in practice. These updates significantly enriched the database content and querying potential, enhanced the database profile and data quality and broadened the inter-relation of HbVar with other databases, which should increase the already high impact of this resource to the globin and genetic database community.
Collapse
|
67
|
Abstract
Genetic and epigenetic studies of gene variants reveal a potential genomic target for treating hemoglobin disorders.
[Also see Report by
Bauer
et al.
]
Collapse
|
68
|
Blobel GA, Hardison RC. A cluster to remember. Cell 2013; 154:718-20. [PMID: 23953105 PMCID: PMC3878159 DOI: 10.1016/j.cell.2013.07.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Based on a massive transcription factor location analysis within a single cell type, in this issue Yan et al. find that the great majority of occupancies occur within dense clusters of up to 100 factors that almost invariably contain cohesins. Retention of cohesins at cluster sites during mitosis raises the possibility that they contribute to transcriptional memory during the cell cycle.
Collapse
|
69
|
Su MY, Steiner LA, Bogardus H, Mishra T, Schulz VP, Hardison RC, Gallagher PG. Identification of biologically relevant enhancers in human erythroid cells. J Biol Chem 2013; 288:8433-8444. [PMID: 23341446 DOI: 10.1074/jbc.m112.413260] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Identification of cell type-specific enhancers is important for understanding the regulation of programs controlling cellular development and differentiation. Enhancers are typically marked by the co-transcriptional activator protein p300 or by groups of cell-expressed transcription factors. We hypothesized that a unique set of enhancers regulates gene expression in human erythroid cells, a highly specialized cell type evolved to provide adequate amounts of oxygen throughout the body. Using chromatin immunoprecipitation followed by massively parallel sequencing, genome-wide maps of candidate enhancers were constructed for p300 and four transcription factors, GATA1, NF-E2, KLF1, and SCL, using primary human erythroid cells. These data were combined with gene expression analyses, and candidate enhancers were identified. Consistent with their predicted function as candidate enhancers, there was statistically significant enrichment of p300 and combinations of co-localizing erythroid transcription factors within 1-50 kb of the transcriptional start site (TSS) of genes highly expressed in erythroid cells. Candidate enhancers were also enriched near genes with known erythroid cell function or phenotype. Candidate enhancers exhibited moderate conservation with mouse and minimal conservation with nonplacental vertebrates. Candidate enhancers were mapped to a set of erythroid-associated, biologically relevant, SNPs from the genome-wide association studies (GWAS) catalogue of NHGRI, National Institutes of Health. Fourteen candidate enhancers, representing 10 genetic loci, mapped to sites associated with biologically relevant erythroid traits. Fragments from these loci directed statistically significant expression in reporter gene assays. Identification of enhancers in human erythroid cells will allow a better understanding of erythroid cell development, differentiation, structure, and function and provide insights into inherited and acquired hematologic disease.
Collapse
|
70
|
Hoffman MM, Ernst J, Wilder SP, Kundaje A, Harris RS, Libbrecht M, Giardine B, Ellenbogen PM, Bilmes JA, Birney E, Hardison RC, Dunham I, Kellis M, Noble WS. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res 2012; 41:827-41. [PMID: 23221638 PMCID: PMC3553955 DOI: 10.1093/nar/gks1284] [Citation(s) in RCA: 357] [Impact Index Per Article: 29.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
The ENCODE Project has generated a wealth of experimental information mapping diverse chromatin properties in several human cell lines. Although each such data track is independently informative toward the annotation of regulatory elements, their interrelations contain much richer information for the systematic annotation of regulatory elements. To uncover these interrelations and to generate an interpretable summary of the massive datasets of the ENCODE Project, we apply unsupervised learning methodologies, converting dozens of chromatin datasets into discrete annotation maps of regulatory regions and other chromatin elements across the human genome. These methods rediscover and summarize diverse aspects of chromatin architecture, elucidate the interplay between chromatin activity and RNA transcription, and reveal that a large proportion of the genome lies in a quiescent state, even across multiple cell types. The resulting annotation of non-coding regulatory elements correlate strongly with mammalian evolutionary constraint, and provide an unbiased approach for evaluating metrics of evolutionary constraint in human. Lastly, we use the regulatory annotations to revisit previously uncharacterized disease-associated loci, resulting in focused, testable hypotheses through the lens of the chromatin landscape.
Collapse
|
71
|
Abstract
Insights into the evolution of hemoglobins and their genes are an abundant source of ideas regarding hemoglobin function and regulation of globin gene expression. This article presents the multiple genes and gene families encoding human globins, summarizes major events in the evolution of the hemoglobin gene clusters, and discusses how these studies provide insights into regulation of globin genes. Although the genes in and around the α-like globin gene complex are relatively stable, the β-like globin gene clusters are more dynamic, showing evidence of transposition to a new locus and frequent lineage-specific expansions and deletions. The cis-regulatory modules controlling levels and timing of gene expression are a mix of conserved and lineage-specific DNA, perhaps reflecting evolutionary constraint on core regulatory functions shared broadly in mammals and adaptive fine-tuning in different orders of mammals.
Collapse
|
72
|
Kadauke S, Udugama MI, Pawlicki JM, Achtman JC, Jain DP, Cheng Y, Hardison RC, Blobel GA. Tissue-specific mitotic bookmarking by hematopoietic transcription factor GATA1. Cell 2012; 150:725-37. [PMID: 22901805 DOI: 10.1016/j.cell.2012.06.038] [Citation(s) in RCA: 176] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2011] [Revised: 03/11/2012] [Accepted: 06/04/2012] [Indexed: 12/21/2022]
Abstract
Tissue-specific transcription patterns are preserved throughout cell divisions to maintain lineage fidelity. We investigated whether transcription factor GATA1 plays a role in transmitting hematopoietic gene expression programs through mitosis when transcription is transiently silenced. Live-cell imaging revealed that a fraction of GATA1 is retained focally within mitotic chromatin. ChIP-seq of highly purified mitotic cells uncovered that key hematopoietic regulatory genes are occupied by GATA1 in mitosis. The GATA1 coregulators FOG1 and TAL1 dissociate from mitotic chromatin, suggesting that GATA1 functions as platform for their postmitotic recruitment. Mitotic GATA1 target genes tend to reactivate more rapidly upon entry into G1 than genes from which GATA1 dissociates. Mitosis-specific destruction of GATA1 delays reactivation selectively of genes that retain GATA1 during mitosis. These studies suggest a requirement of mitotic "bookmarking" by GATA1 for the faithful propagation of cell-type-specific transcription programs through cell division.
Collapse
|
73
|
Hardison RC. Genome-wide epigenetic data facilitate understanding of disease susceptibility association studies. J Biol Chem 2012; 287:30932-40. [PMID: 22952232 PMCID: PMC3438926 DOI: 10.1074/jbc.r112.352427] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Complex traits such as susceptibility to diseases are determined in part by variants at multiple genetic loci. Genome-wide association studies can identify these loci, but most phenotype-associated variants lie distal to protein-coding regions and are likely involved in regulating gene expression. Understanding how these genetic variants affect complex traits depends on the ability to predict and test the function of the genomic elements harboring them. Community efforts such as the ENCODE Project provide a wealth of data about epigenetic features associated with gene regulation. These data enable the prediction of testable functions for many phenotype-associated variants.
Collapse
|
74
|
Song G, Riemer C, Dickins B, Kim HL, Zhang L, Zhang Y, Hsu CH, Hardison RC, Nisc Comparative Sequencing Program, Green ED, Miller W. Revealing mammalian evolutionary relationships by comparative analysis of gene clusters. Genome Biol Evol 2012; 4:586-601. [PMID: 22454131 PMCID: PMC3342878 DOI: 10.1093/gbe/evs032] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/19/2012] [Indexed: 12/13/2022] Open
Abstract
Many software tools for comparative analysis of genomic sequence data have been released in recent decades. Despite this, it remains challenging to determine evolutionary relationships in gene clusters due to their complex histories involving duplications, deletions, inversions, and conversions. One concept describing these relationships is orthology. Orthologs derive from a common ancestor by speciation, in contrast to paralogs, which derive from duplication. Discriminating orthologs from paralogs is a necessary step in most multispecies sequence analyses, but doing so accurately is impeded by the occurrence of gene conversion events. We propose a refined method of orthology assignment based on two paradigms for interpreting its definition: by genomic context or by sequence content. X-orthology (based on context) traces orthology resulting from speciation and duplication only, while N-orthology (based on content) includes the influence of conversion events. We developed a computational method for automatically mapping both types of orthology on a per-nucleotide basis in gene cluster regions studied by comparative sequencing, and we make this mapping accessible by visualizing the output. All of these steps are incorporated into our newly extended CHAP 2 package. We evaluate our method using both simulated data and real gene clusters (including the well-characterized α-globin and β-globin clusters). We also illustrate use of CHAP 2 by analyzing four more loci: CCL (chemokine ligand), IFN (interferon), CYP2abf (part of cytochrome P450 family 2), and KIR (killer cell immunoglobulin-like receptors). These new methods facilitate and extend our understanding of evolution at these and other loci by adding automated accurate evolutionary inference to the biologist's toolkit. The CHAP 2 package is freely available from http://www.bx.psu.edu/miller_lab.
Collapse
|
75
|
Abstract
Many evolutionary studies over the past decade have estimated α(sel), the proportion of all nucleotides in the human genome that are subject to purifying selection because of their biological function. Most of these studies have estimated the nucleotide substitution rates from genome sequence alignments across many diverse mammals. Some α(sel) estimates will be affected by the heterogeneity of substitution rates in neutral sequence across the genome. Most will also be inaccurate if change in the functional sequence repertoire occurs rapidly relative to the separation of lineages that are being compared. Evidence gathered from both evolutionary and experimental analyses now indicate that rates of "turnover" of functional, predominantly noncoding, sequence are, indeed, high. They are sufficiently high that an estimated 50% of mouse constrained noncoding sequence is predicted not to be shared with rat, a closely related rodent. The rapidity of turnover results in, at least, a twofold underestimate of α(sel) by analyses that measure constraint across the eutherian phylogeny. Approaches that take account of turnover estimate that the steady-state value of α(sel) lies between 10% and 15%. Experimental studies corroborate the predicted rates of loss and gain of noncoding functional sites. These studies show the limitations inherent in the use of deep sequence conservation for identifying functional sequence. Experimental investigations focusing on lineage-specific, noncoding, and functional sequence are now essential if we are to appreciate the complete functional repertoire of the human genome.
Collapse
|