1
|
Weinand K, Sakaue S, Nathan A, Jonsson AH, Zhang F, Watts GFM, Al Suqri M, Zhu Z, Rao DA, Anolik JH, Brenner MB, Donlin LT, Wei K, Raychaudhuri S. The chromatin landscape of pathogenic transcriptional cell states in rheumatoid arthritis. Nat Commun 2024; 15:4650. [PMID: 38821936 PMCID: PMC11143375 DOI: 10.1038/s41467-024-48620-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 05/02/2024] [Indexed: 06/02/2024] Open
Abstract
Synovial tissue inflammation is a hallmark of rheumatoid arthritis (RA). Recent work has identified prominent pathogenic cell states in inflamed RA synovial tissue, such as T peripheral helper cells; however, the epigenetic regulation of these states has yet to be defined. Here, we examine genome-wide open chromatin at single-cell resolution in 30 synovial tissue samples, including 12 samples with transcriptional data in multimodal experiments. We identify 24 chromatin classes and predict their associated transcription factors, including a CD8 + GZMK+ class associated with EOMES and a lining fibroblast class associated with AP-1. By integrating with an RA tissue transcriptional atlas, we propose that these chromatin classes represent 'superstates' corresponding to multiple transcriptional cell states. Finally, we demonstrate the utility of this RA tissue chromatin atlas through the associations between disease phenotypes and chromatin class abundance, as well as the nomination of classes mediating the effects of putatively causal RA genetic variants.
Collapse
Affiliation(s)
- Kathryn Weinand
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Saori Sakaue
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Aparna Nathan
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Anna Helena Jonsson
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Fan Zhang
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine Division of Rheumatology and Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO, USA
| | - Gerald F M Watts
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Majd Al Suqri
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Zhu Zhu
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Deepak A Rao
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Jennifer H Anolik
- Division of Allergy, Immunology and Rheumatology, Department of Medicine, University of Rochester Medical Center, Rochester, NY, USA
| | - Michael B Brenner
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Laura T Donlin
- Hospital for Special Surgery, New York, NY, USA
- Weill Cornell Medicine, New York, NY, USA
| | - Kevin Wei
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Soumya Raychaudhuri
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Versus Arthritis Centre for Genetics and Genomics, Centre for Musculoskeletal Research, Manchester Academic Health Science Centre, The University of Manchester, Manchester, UK.
| |
Collapse
|
2
|
Dong J, Scott TG, Mukherjee R, Guertin MJ. ZNF143 binds DNA and stimulates transcripstion initiation to activate and repress direct target genes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.13.594008. [PMID: 38798607 PMCID: PMC11118474 DOI: 10.1101/2024.05.13.594008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Transcription factors bind to sequence motifs and act as activators or repressors. Transcription factors interface with a constellation of accessory cofactors to regulate distinct mechanistic steps to regulate transcription. We rapidly degraded the essential and ubiquitously expressed transcription factor ZNF143 to determine its function in the transcription cycle. ZNF143 facilitates RNA Polymerase initiation and activates gene expression. ZNF143 binds the promoter of nearly all its activated target genes. ZNF143 also binds near the site of genic transcription initiation to directly repress a subset of genes. Although ZNF143 stimulates initiation at ZNF143-repressed genes (i.e. those that increase expression upon ZNF143 depletion), the molecular context of binding leads to cis repression. ZNF143 competes with other more efficient activators for promoter access, physically occludes transcription initiation sites and promoter-proximal sequence elements, and acts as a molecular roadblock to RNA Polymerases during early elongation. The term context specific is often invoked to describe transcription factors that have both activation and repression functions. We define the context and molecular mechanisms of ZNF143-mediated cis activation and repression.
Collapse
Affiliation(s)
- Jinhong Dong
- Center for Cell Analysis and Modeling, University of Connecticut, Farmington, Connecticut, United States of America
| | - Thomas G Scott
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, Virginia, United States of America
| | - Rudradeep Mukherjee
- Center for Cell Analysis and Modeling, University of Connecticut, Farmington, Connecticut, United States of America
| | - Michael J Guertin
- Center for Cell Analysis and Modeling, University of Connecticut, Farmington, Connecticut, United States of America
- Department of Genetics and Genome Sciences, University of Connecticut, Farmington, Connecticut, United States of America
| |
Collapse
|
3
|
Sunitha Kumary VUN, Venters BJ, Raman K, Sen S, Estève PO, Cowles MW, Keogh MC, Pradhan S. Emerging Approaches to Profile Accessible Chromatin from Formalin-Fixed Paraffin-Embedded Sections. EPIGENOMES 2024; 8:20. [PMID: 38804369 PMCID: PMC11130958 DOI: 10.3390/epigenomes8020020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Accepted: 05/06/2024] [Indexed: 05/29/2024] Open
Abstract
Nucleosomes are non-uniformly distributed across eukaryotic genomes, with stretches of 'open' chromatin strongly associated with transcriptionally active promoters and enhancers. Understanding chromatin accessibility patterns in normal tissue and how they are altered in pathologies can provide critical insights to development and disease. With the advent of high-throughput sequencing, a variety of strategies have been devised to identify open regions across the genome, including DNase-seq, MNase-seq, FAIRE-seq, ATAC-seq, and NicE-seq. However, the broad application of such methods to FFPE (formalin-fixed paraffin-embedded) tissues has been curtailed by the major technical challenges imposed by highly fixed and often damaged genomic material. Here, we review the most common approaches for mapping open chromatin regions, recent optimizations to overcome the challenges of working with FFPE tissue, and a brief overview of a typical data pipeline with analysis considerations.
Collapse
Affiliation(s)
| | - Bryan J. Venters
- EpiCypher Inc., Durham, NC 27709, USA; (V.U.N.S.K.); (B.J.V.); (M.W.C.)
| | - Karthikeyan Raman
- Genome Biology Division, New England Biolabs, Ipswich, MA 01983, USA; (K.R.); (S.S.); (P.-O.E.)
| | - Sagnik Sen
- Genome Biology Division, New England Biolabs, Ipswich, MA 01983, USA; (K.R.); (S.S.); (P.-O.E.)
| | - Pierre-Olivier Estève
- Genome Biology Division, New England Biolabs, Ipswich, MA 01983, USA; (K.R.); (S.S.); (P.-O.E.)
| | - Martis W. Cowles
- EpiCypher Inc., Durham, NC 27709, USA; (V.U.N.S.K.); (B.J.V.); (M.W.C.)
| | | | - Sriharsa Pradhan
- Genome Biology Division, New England Biolabs, Ipswich, MA 01983, USA; (K.R.); (S.S.); (P.-O.E.)
| |
Collapse
|
4
|
Scott TG, Sathyan KM, Gioeli D, Guertin MJ. TRPS1 modulates chromatin accessibility to regulate estrogen receptor alpha (ER) binding and ER target gene expression in luminal breast cancer cells. PLoS Genet 2024; 20:e1011159. [PMID: 38377146 PMCID: PMC10906895 DOI: 10.1371/journal.pgen.1011159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 03/01/2024] [Accepted: 01/30/2024] [Indexed: 02/22/2024] Open
Abstract
Common genetic variants in the repressive GATA-family transcription factor (TF) TRPS1 locus are associated with breast cancer risk, and luminal breast cancer cell lines are particularly sensitive to TRPS1 knockout. We introduced an inducible degron tag into the native TRPS1 locus within a luminal breast cancer cell line to identify the direct targets of TRPS1 and determine how TRPS1 mechanistically regulates gene expression. We acutely deplete over 80 percent of TRPS1 from chromatin within 30 minutes of inducing degradation. We find that TRPS1 regulates transcription of hundreds of genes, including those related to estrogen signaling. TRPS1 directly regulates chromatin structure, which causes estrogen receptor alpha (ER) to redistribute in the genome. ER redistribution leads to both repression and activation of dozens of ER target genes. Downstream from these primary effects, TRPS1 depletion represses cell cycle-related gene sets and reduces cell doubling rate. Finally, we show that high TRPS1 activity, calculated using a gene expression signature defined by primary TRPS1-regulated genes, is associated with worse breast cancer patient prognosis. Taken together, these data suggest a model in which TRPS1 modulates the genomic distribution of ER, both activating and repressing transcription of genes related to cancer cell fitness.
Collapse
Affiliation(s)
- Thomas G. Scott
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, Virginia, United States of America
| | - Kizhakke Mattada Sathyan
- Center for Cell Analysis and Modeling, University of Connecticut, Farmington, Connecticut, United States of America
- Department of Genetics and Genome Sciences, University of Connecticut, Farmington, Connecticut, United States of America
| | - Daniel Gioeli
- Department of Microbiology, Immunology, and Cancer, University of Virginia, Charlottesville, Virginia, United States of America
- Cancer Center Member, University of Virginia, Charlottesville, Virginia, United States of America
| | - Michael J. Guertin
- Center for Cell Analysis and Modeling, University of Connecticut, Farmington, Connecticut, United States of America
- Department of Genetics and Genome Sciences, University of Connecticut, Farmington, Connecticut, United States of America
| |
Collapse
|
5
|
Scott TG, Sathyan KM, Gioeli D, Guertin MJ. TRPS1 modulates chromatin accessibility to regulate estrogen receptor (ER) binding and ER target gene expression in luminal breast cancer cells. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.03.547524. [PMID: 37461612 PMCID: PMC10349936 DOI: 10.1101/2023.07.03.547524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/27/2023]
Abstract
Breast cancer is the most frequently diagnosed cancer in women. The most common subtype is luminal breast cancer, which is typically driven by the estrogen receptor α (ER), a transcription factor (TF) that activates many genes required for proliferation. Multiple effective therapies target this pathway, but individuals often develop resistance. Thus, there is a need to identify additional targets that regulate ER activity and contribute to breast tumor progression. TRPS1 is a repressive GATA-family TF that is overexpressed in breast tumors. Common genetic variants in the TRPS1 locus are associated with breast cancer risk, and luminal breast cancer cell lines are particularly sensitive to TRPS1 knockout. However, we do not know how TRPS1 regulates target genes to mediate these breast cancer patient and cellular outcomes. We introduced an inducible degron tag into the native TRPS1 locus within a luminal breast cancer cell line to identify the direct targets of TRPS1 and determine how TRPS1 mechanistically regulates gene expression. We acutely deplete over eighty percent of TRPS1 from chromatin within 30 minutes of inducing degradation. We find that TRPS1 regulates transcription of hundreds of genes, including those related to estrogen signaling. TRPS1 directly regulates chromatin structure, which causes ER to redistribute in the genome. ER redistribution leads to both repression and activation of dozens of ER target genes. Downstream from these primary effects, TRPS1 depletion represses cell cycle-related gene sets and reduces cell doubling rate. Finally, we show that high TRPS1 activity, calculated using a gene expression signature defined by primary TRPS1-regulated genes, is associated with worse breast cancer patient prognosis. Taken together, these data suggest a model in which TRPS1 modulates the activity of other TFs, both activating and repressing transcription of genes related to cancer cell fitness.
Collapse
Affiliation(s)
- Thomas G Scott
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, Virginia, United States of America
| | - Kizhakke Mattada Sathyan
- Center for Cell Analysis and Modeling, University of Connecticut, Farmington, Connecticut, United States of America
- Department of Genetics and Genome Sciences, University of Connecticut, Farmington, Connecticut, United States of America
| | - Daniel Gioeli
- Department of Microbiology, Immunology, and Cancer, University of Virginia, Charlottesville, Virginia, United States of America
| | - Michael J Guertin
- Center for Cell Analysis and Modeling, University of Connecticut, Farmington, Connecticut, United States of America
- Department of Genetics and Genome Sciences, University of Connecticut, Farmington, Connecticut, United States of America
| |
Collapse
|
6
|
Wolpe JB, Martins AL, Guertin MJ. Correction of transposase sequence bias in ATAC-seq data with rule ensemble modeling. NAR Genom Bioinform 2023; 5:lqad054. [PMID: 37274120 PMCID: PMC10236359 DOI: 10.1093/nargab/lqad054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 04/02/2023] [Accepted: 05/19/2023] [Indexed: 06/06/2023] Open
Abstract
Chromatin accessibility assays have revolutionized the field of transcription regulation by providing single-nucleotide resolution measurements of regulatory features such as promoters and transcription factor binding sites. ATAC-seq directly measures how well the Tn5 transposase accesses chromatinized DNA. Tn5 has a complex sequence bias that is not effectively scaled with traditional bias-correction methods. We model this complex bias using a rule ensemble machine learning approach that integrates information from many input k-mers proximal to the ATAC sequence reads. We effectively characterize and correct single-nucleotide sequence biases and regional sequence biases of the Tn5 enzyme. Correction of enzymatic sequence bias is an important step in interpreting chromatin accessibility assays that aim to infer transcription factor binding and regulatory activity of elements in the genome.
Collapse
Affiliation(s)
- Jacob B Wolpe
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA, USA
| | - André L Martins
- Center for Cell Analysis and Modeling, University of Connecticut, Farmington, CT, USA
- Department of Genetics and Genome Sciences, University of Connecticut, Farmington, CT, USA
| | - Michael J Guertin
- Center for Cell Analysis and Modeling, University of Connecticut, Farmington, CT, USA
- Department of Genetics and Genome Sciences, University of Connecticut, Farmington, CT, USA
| |
Collapse
|
7
|
Weinand K, Sakaue S, Nathan A, Jonsson AH, Zhang F, Watts GFM, Zhu Z, Rao DA, Anolik JH, Brenner MB, Donlin LT, Wei K, Raychaudhuri S. The Chromatin Landscape of Pathogenic Transcriptional Cell States in Rheumatoid Arthritis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.07.536026. [PMID: 37066336 PMCID: PMC10104143 DOI: 10.1101/2023.04.07.536026] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Synovial tissue inflammation is the hallmark of rheumatoid arthritis (RA). Recent work has identified prominent pathogenic cell states in inflamed RA synovial tissue, such as T peripheral helper cells; however, the epigenetic regulation of these states has yet to be defined. We measured genome-wide open chromatin at single cell resolution from 30 synovial tissue samples, including 12 samples with transcriptional data in multimodal experiments. We identified 24 chromatin classes and predicted their associated transcription factors, including a CD8+ GZMK+ class associated with EOMES and a lining fibroblast class associated with AP-1. By integrating an RA tissue transcriptional atlas, we found that the chromatin classes represented 'superstates' corresponding to multiple transcriptional cell states. Finally, we demonstrated the utility of this RA tissue chromatin atlas through the associations between disease phenotypes and chromatin class abundance as well as the nomination of classes mediating the effects of putatively causal RA genetic variants.
Collapse
Affiliation(s)
- Kathryn Weinand
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Saori Sakaue
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Aparna Nathan
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Anna Helena Jonsson
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Fan Zhang
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Rheumatology and the Center for Health Artificial Intelligence, University of Colorado School of Medicine, Aurora, CO, USA
| | - Gerald F. M. Watts
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Zhu Zhu
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | | | - Deepak A. Rao
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Jennifer H. Anolik
- Division of Allergy, Immunology and Rheumatology; Department of Medicine, University of Rochester Medical Center, Rochester, NY, USA
| | - Michael B. Brenner
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Laura T. Donlin
- Hospital for Special Surgery, New York, NY, USA
- Weill Cornell Medicine, New York, NY, USA
| | - Kevin Wei
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Soumya Raychaudhuri
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Versus Arthritis Centre for Genetics and Genomics, Centre for Musculoskeletal Research, Manchester Academic Health Science Centre, The University of Manchester, Manchester, UK
| |
Collapse
|
8
|
Wang L, Feng Y, Wang J, Jin X, Zhang Q, Ackah M, Wang Y, Xu D, Zhao W. ATAC-seq exposes differences in chromatin accessibility leading to distinct leaf shapes in mulberry. PLANT DIRECT 2022; 6:e464. [PMID: 36540416 PMCID: PMC9755926 DOI: 10.1002/pld3.464] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Accepted: 10/30/2022] [Indexed: 06/17/2023]
Abstract
Mulberry leaf shape is an important agronomic trait indicating yield, growth, development, and habitat variation. China was the earliest country in the world to grow mulberry for sericulture, and it is also one of the great contributions of the Chinese nation to human civilization. ATAC-seq (Assay for Transposase Accessible Chromatin using sequencing) is a recently developed technique for genome-wide analysis of chromatin accessibility. The samples used for ATAC sequencing in this study were divided into two groups of whole leaves (CK-1 and CK-2) and lobed leaves (HL-1 and HL-2), with two replicates in each group. The related motif analysis, differential expression motif screening, and functional annotation of mulberry leaf shape differences were performed by raw letter analysis to finally obtain the transcription factors (TFs) that lead to the production of heteromorphic leaves. These transcription factors are common in plants, especially the TCP family, shown to be associated with leaf development and growth in other woody plants and are a potential transcription factor responsible for leaf shape differences in mulberry. Dissecting the regulatory mechanisms of leaf shape of different forms of mulberry leaves by ATAC-seq is an important way to protect mulberry germplasm resources and improve mulberry yield. It is conducive to cultivating mulberry varieties with high resistance to adversity, promoting the sustainable development of sericulture, and protecting and improving the ecological environment.
Collapse
Affiliation(s)
- Lei Wang
- School of Biology and TechnologyJiangsu University of Science and TechnologyZhenjiangChina
| | - Yuming Feng
- School of Biology and TechnologyJiangsu University of Science and TechnologyZhenjiangChina
| | - Jiangying Wang
- Leisure Agriculture LaboratoryLianyungang Academy of Agricultural SciencesLianyungangChina
| | - Xin Jin
- School of Biology and TechnologyJiangsu University of Science and TechnologyZhenjiangChina
| | - Qiaonan Zhang
- School of Biology and TechnologyJiangsu University of Science and TechnologyZhenjiangChina
| | - Michael Ackah
- School of Biology and TechnologyJiangsu University of Science and TechnologyZhenjiangChina
| | - Yuhua Wang
- School of Biology and TechnologyJiangsu University of Science and TechnologyZhenjiangChina
| | - Dayong Xu
- Leisure Agriculture LaboratoryLianyungang Academy of Agricultural SciencesLianyungangChina
| | - Weiguo Zhao
- School of Biology and TechnologyJiangsu University of Science and TechnologyZhenjiangChina
| |
Collapse
|
9
|
Intrinsic bias estimation for improved analysis of bulk and single-cell chromatin accessibility profiles using SELMA. Nat Commun 2022; 13:5533. [PMID: 36130957 PMCID: PMC9492688 DOI: 10.1038/s41467-022-33194-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Accepted: 09/08/2022] [Indexed: 11/25/2022] Open
Abstract
Genome-wide profiling of chromatin accessibility by DNase-seq or ATAC-seq has been widely used to identify regulatory DNA elements and transcription factor binding sites. However, enzymatic DNA cleavage exhibits intrinsic sequence biases that confound chromatin accessibility profiling data analysis. Existing computational tools are limited in their ability to account for such intrinsic biases and not designed for analyzing single-cell data. Here, we present Simplex Encoded Linear Model for Accessible Chromatin (SELMA), a computational method for systematic estimation of intrinsic cleavage biases from genomic chromatin accessibility profiling data. We demonstrate that SELMA yields accurate and robust bias estimation from both bulk and single-cell DNase-seq and ATAC-seq data. SELMA can utilize internal mitochondrial DNA data to improve bias estimation. We show that transcription factor binding inference from DNase footprints can be improved by incorporating estimated biases using SELMA. Furthermore, we show strong effects of intrinsic biases in single-cell ATAC-seq data, and develop the first single-cell ATAC-seq intrinsic bias correction model to improve cell clustering. SELMA can enhance the performance of existing bioinformatics tools and improve the analysis of both bulk and single-cell chromatin accessibility sequencing data. Genome-wide profiling of chromatin accessibility by DNase-seq or ATAC-seq has been widely used to identify regulatory DNA elements and transcription factor binding sites. Here the authors develop a computational model, SELMA, to estimate and correct enzymatic cleavage biases in chromatin accessibility profiling data.
Collapse
|
10
|
Luo K, Zhong J, Safi A, Hong LK, Tewari AK, Song L, Reddy TE, Ma L, Crawford GE, Hartemink AJ. Profiling the quantitative occupancy of myriad transcription factors across conditions by modeling chromatin accessibility data. Genome Res 2022; 32:1183-1198. [PMID: 35609992 PMCID: PMC9248881 DOI: 10.1101/gr.272203.120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 05/06/2022] [Indexed: 11/24/2022]
Abstract
Over a thousand different transcription factors (TFs) bind with varying occupancy across the human genome. Chromatin immunoprecipitation (ChIP) can assay occupancy genome-wide, but only one TF at a time, limiting our ability to comprehensively observe the TF occupancy landscape, let alone quantify how it changes across conditions. We developed TF occupancy profiler (TOP), a Bayesian hierarchical regression framework, to profile genome-wide quantitative occupancy of numerous TFs using data from a single chromatin accessibility experiment (DNase- or ATAC-seq). TOP is supervised, and its hierarchical structure allows it to predict the occupancy of any sequence-specific TF, even those never assayed with ChIP. We used TOP to profile the quantitative occupancy of hundreds of sequence-specific TFs at sites throughout the genome and examined how their occupancies changed in multiple contexts: in approximately 200 human cell types, through 12 h of exposure to different hormones, and across the genetic backgrounds of 70 individuals. TOP enables cost-effective exploration of quantitative changes in the landscape of TF binding.
Collapse
Affiliation(s)
- Kaixuan Luo
- Computational Biology & Bioinformatics Graduate Program, Duke University, Durham, North Carolina 27708, USA
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA
- Department of Computer Science, Duke University, Durham, North Carolina 27708, USA
- Department of Human Genetics, The University of Chicago, Chicago, Illinois 60637, USA
| | - Jianling Zhong
- Computational Biology & Bioinformatics Graduate Program, Duke University, Durham, North Carolina 27708, USA
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA
- Department of Computer Science, Duke University, Durham, North Carolina 27708, USA
| | - Alexias Safi
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA
- Department of Pediatrics, Duke University Medical Center, Durham, North Carolina 27710, USA
| | - Linda K Hong
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA
- Department of Pediatrics, Duke University Medical Center, Durham, North Carolina 27710, USA
| | - Alok K Tewari
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
| | - Lingyun Song
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA
- Department of Pediatrics, Duke University Medical Center, Durham, North Carolina 27710, USA
| | - Timothy E Reddy
- Computational Biology & Bioinformatics Graduate Program, Duke University, Durham, North Carolina 27708, USA
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA
- Department of Biostatistics and Bioinformatics, Durham, North Carolina 27710, USA
- Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina 27710, USA
- Department of Biomedical Engineering, Duke University, Durham, North Carolina 27708, USA
| | - Li Ma
- Computational Biology & Bioinformatics Graduate Program, Duke University, Durham, North Carolina 27708, USA
- Department of Statistical Science, Duke University, Durham, North Carolina 27708, USA
| | - Gregory E Crawford
- Computational Biology & Bioinformatics Graduate Program, Duke University, Durham, North Carolina 27708, USA
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA
- Department of Pediatrics, Duke University Medical Center, Durham, North Carolina 27710, USA
| | - Alexander J Hartemink
- Computational Biology & Bioinformatics Graduate Program, Duke University, Durham, North Carolina 27708, USA
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA
- Department of Computer Science, Duke University, Durham, North Carolina 27708, USA
- Department of Biology, Duke University, Durham, North Carolina 27708, USA
| |
Collapse
|
11
|
Grandi FC, Modi H, Kampman L, Corces MR. Chromatin accessibility profiling by ATAC-seq. Nat Protoc 2022; 17:1518-1552. [PMID: 35478247 DOI: 10.1038/s41596-022-00692-9] [Citation(s) in RCA: 97] [Impact Index Per Article: 48.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 02/22/2022] [Indexed: 12/13/2022]
Abstract
The assay for transposase-accessible chromatin using sequencing (ATAC-seq) provides a simple and scalable way to detect the unique chromatin landscape associated with a cell type and how it may be altered by perturbation or disease. ATAC-seq requires a relatively small number of input cells and does not require a priori knowledge of the epigenetic marks or transcription factors governing the dynamics of the system. Here we describe an updated and optimized protocol for ATAC-seq, called Omni-ATAC, that is applicable across a broad range of cell and tissue types. The ATAC-seq workflow has five main steps: sample preparation, transposition, library preparation, sequencing and data analysis. This protocol details the steps to generate and sequence ATAC-seq libraries, with recommendations for sample preparation and downstream bioinformatic analysis. ATAC-seq libraries for roughly 12 samples can be generated in 10 h by someone familiar with basic molecular biology, and downstream sequencing analysis can be implemented using benchmarked pipelines by someone with basic bioinformatics skills and with access to a high-performance computing environment.
Collapse
Affiliation(s)
- Fiorella C Grandi
- Gladstone Institute of Neurological Disease, San Francisco, CA, USA.,Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA.,Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - Hailey Modi
- Gladstone Institute of Neurological Disease, San Francisco, CA, USA.,Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA.,Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - Lucas Kampman
- Gladstone Institute of Neurological Disease, San Francisco, CA, USA.,Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA.,Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - M Ryan Corces
- Gladstone Institute of Neurological Disease, San Francisco, CA, USA. .,Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA. .,Department of Neurology, University of California San Francisco, San Francisco, CA, USA.
| |
Collapse
|
12
|
Zhang H, Lu T, Liu S, Yang J, Sun G, Cheng T, Xu J, Chen F, Yen K. Comprehensive understanding of Tn5 insertion preference improves transcription regulatory element identification. NAR Genom Bioinform 2021; 3:lqab094. [PMID: 34729473 PMCID: PMC8557372 DOI: 10.1093/nargab/lqab094] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Revised: 09/20/2021] [Accepted: 09/29/2021] [Indexed: 12/11/2022] Open
Abstract
Tn5 transposase, which can efficiently tagment the genome, has been widely adopted as a molecular tool in next-generation sequencing, from short-read sequencing to more complex methods such as assay for transposase-accessible chromatin using sequencing (ATAC-seq). Here, we systematically map Tn5 insertion characteristics across several model organisms, finding critical parameters that affect its insertion. On naked genomic DNA, we found that Tn5 insertion is not uniformly distributed or random. To uncover drivers of these biases, we used a machine learning framework, which revealed that DNA shape cooperatively works with DNA motif to affect Tn5 insertion preference. These intrinsic insertion preferences can be modeled using nucleotide dependence information from DNA sequences, and we developed a computational pipeline to correct for these biases in ATAC-seq data. Using our pipeline, we show that bias correction improves the overall performance of ATAC-seq peak detection, recovering many potential false-negative peaks. Furthermore, we found that these peaks are bound by transcription factors, underscoring the biological relevance of capturing this additional information. These findings highlight the benefits of an improved understanding and precise correction of Tn5 insertion preference.
Collapse
Affiliation(s)
- Houyu Zhang
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Ting Lu
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Shan Liu
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Jianyu Yang
- Department of Developmental Biology, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Guohuan Sun
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Tao Cheng
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| | - Jin Xu
- Division of Cell, Developmental and Integrative Biology, School of Medicine, South China University of Technology, Guangzhou 510006, China
| | - Fangyao Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Xi'an Jiaotong University Health Science Center, Xi'an, Shaanxi 710061, China
| | - Kuangyu Yen
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China
| |
Collapse
|
13
|
Cavalli M, Diamanti K, Dang Y, Xing P, Pan G, Chen X, Wadelius C. The Thioesterase ACOT1 as a Regulator of Lipid Metabolism in Type 2 Diabetes Detected in a Multi-Omics Study of Human Liver. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2021; 25:652-659. [PMID: 34520261 PMCID: PMC8812507 DOI: 10.1089/omi.2021.0093] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Type 2 diabetes (T2D) is characterized by pathophysiological alterations in lipid metabolism. One strategy to understand the molecular mechanisms behind these abnormalities is to identify cis-regulatory elements (CREs) located in chromatin-accessible regions of the genome that regulate key genes. In this study we integrated assay for transposase-accessible chromatin followed by sequencing (ATAC-seq) data, widely used to decode chromatin accessibility, with multi-omics data and publicly available CRE databases to identify candidate CREs associated with T2D for further experimental validations. We performed high-sensitive ATAC-seq in nine human liver samples from normal and T2D donors, and identified a set of differentially accessible regions (DARs). We identified seven DARs including a candidate enhancer for the ACOT1 gene that regulates the balance of acyl-CoA and free fatty acids (FFAs) in the cytoplasm. The relevance of ACOT1 regulation in T2D was supported by the analysis of transcriptomics and proteomics data in liver tissue. Long-chain acyl-CoA thioesterases (ACOTs) are a group of enzymes that hydrolyze acyl-CoA esters to FFAs and coenzyme A. ACOTs have been associated with regulation of triglyceride levels, fatty acid oxidation, mitochondrial function, and insulin signaling, linking their regulation to the pathogenesis of T2D. Our strategy integrating chromatin accessibility with DNA binding and other types of omics provides novel insights on the role of genetic regulation in T2D and is extendable to other complex multifactorial diseases.
Collapse
Affiliation(s)
- Marco Cavalli
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Klev Diamanti
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Yonglong Dang
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Pengwei Xing
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Gang Pan
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Xingqi Chen
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Claes Wadelius
- Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| |
Collapse
|
14
|
Asma H, Halfon MS. Annotating the Insect Regulatory Genome. INSECTS 2021; 12:591. [PMID: 34209769 PMCID: PMC8305585 DOI: 10.3390/insects12070591] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 06/23/2021] [Accepted: 06/25/2021] [Indexed: 11/17/2022]
Abstract
An ever-growing number of insect genomes is being sequenced across the evolutionary spectrum. Comprehensive annotation of not only genes but also regulatory regions is critical for reaping the full benefits of this sequencing. Driven by developments in sequencing technologies and in both empirical and computational discovery strategies, the past few decades have witnessed dramatic progress in our ability to identify cis-regulatory modules (CRMs), sequences such as enhancers that play a major role in regulating transcription. Nevertheless, providing a timely and comprehensive regulatory annotation of newly sequenced insect genomes is an ongoing challenge. We review here the methods being used to identify CRMs in both model and non-model insect species, and focus on two tools that we have developed, REDfly and SCRMshaw. These resources can be paired together in a powerful combination to facilitate insect regulatory annotation over a broad range of species, with an accuracy equal to or better than that of other state-of-the-art methods.
Collapse
Affiliation(s)
- Hasiba Asma
- Program in Genetics, Genomics, and Bioinformatics, University at Buffalo-State University of New York, Buffalo, NY 14203, USA;
| | - Marc S. Halfon
- Program in Genetics, Genomics, and Bioinformatics, University at Buffalo-State University of New York, Buffalo, NY 14203, USA;
- Department of Biochemistry, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
- Department of Biomedical Informatics, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
- Department of Biological Sciences, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
- NY State Center of Excellence in Bioinformatics & Life Sciences, Buffalo, NY 14203, USA
| |
Collapse
|
15
|
Li H, Guan Y. Fast decoding cell type-specific transcription factor binding landscape at single-nucleotide resolution. Genome Res 2021; 31:721-731. [PMID: 33741685 PMCID: PMC8015851 DOI: 10.1101/gr.269613.120] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Accepted: 02/17/2021] [Indexed: 01/22/2023]
Abstract
Decoding the cell type-specific transcription factor (TF) binding landscape at single-nucleotide resolution is crucial for understanding the regulatory mechanisms underlying many fundamental biological processes and human diseases. However, limits on time and resources restrict the high-resolution experimental measurements of TF binding profiles of all possible TF-cell type combinations. Previous computational approaches either cannot distinguish the cell context-dependent TF binding profiles across diverse cell types or can only provide a relatively low-resolution prediction. Here we present a novel deep learning approach, Leopard, for predicting TF binding sites at single-nucleotide resolution, achieving the average area under receiver operating characteristic curve (AUROC) of 0.982 and the average area under precision recall curve (AUPRC) of 0.208. Our method substantially outperformed the state-of-the-art methods Anchor and FactorNet, improving the predictive AUPRC by 19% and 27%, respectively, when evaluated at 200-bp resolution. Meanwhile, by leveraging a many-to-many neural network architecture, Leopard features a hundredfold to thousandfold speedup compared with current many-to-one machine learning methods.
Collapse
Affiliation(s)
- Hongyang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| |
Collapse
|
16
|
Jenull S, Tscherner M, Mair T, Kuchler K. ATAC-Seq Identifies Chromatin Landscapes Linked to the Regulation of Oxidative Stress in the Human Fungal Pathogen Candida albicans. J Fungi (Basel) 2020; 6:jof6030182. [PMID: 32967096 PMCID: PMC7559329 DOI: 10.3390/jof6030182] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 09/15/2020] [Accepted: 09/15/2020] [Indexed: 12/12/2022] Open
Abstract
Human fungal pathogens often encounter fungicidal stress upon host invasion, but they can swiftly adapt by transcriptional reprogramming that enables pathogen survival. Fungal immune evasion is tightly connected to chromatin regulation. Hence, fungal chromatin modifiers pose alternative treatment options to combat fungal infections. Here, we present an assay for transposase-accessible chromatin using sequencing (ATAC-seq) protocol adapted for the opportunistic pathogen Candida albicans to gain further insight into the interplay of chromatin accessibility and gene expression mounted during fungal adaptation to oxidative stress. The ATAC-seq workflow not only facilitates the robust detection of genomic regions with accessible chromatin but also allows for the precise modeling of nucleosome positions in C. albicans. Importantly, the data reveal genes with altered chromatin accessibility in upstream regulatory regions, which correlate with transcriptional regulation during oxidative stress. Interestingly, many genes show increased chromatin accessibility without change in gene expression upon stress exposure. Such chromatin signatures could predict yet unknown regulatory factors under highly dynamic transcriptional control. Additionally, de novo motif analysis in genomic regions with increased chromatin accessibility upon H2O2 treatment shows significant enrichment for Cap1 binding sites, a major factor of oxidative stress responses in C. albicans. Taken together, the ATAC-seq workflow enables the identification of chromatin signatures and highlights the dynamics of regulatory mechanisms mediating environmental adaptation of C. albicans.
Collapse
|
17
|
Moudgil A, Wilkinson MN, Chen X, He J, Cammack AJ, Vasek MJ, Lagunas T, Qi Z, Lalli MA, Guo C, Morris SA, Dougherty JD, Mitra RD. Self-Reporting Transposons Enable Simultaneous Readout of Gene Expression and Transcription Factor Binding in Single Cells. Cell 2020; 182:992-1008.e21. [PMID: 32710817 PMCID: PMC7510185 DOI: 10.1016/j.cell.2020.06.037] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 04/14/2020] [Accepted: 06/23/2020] [Indexed: 12/13/2022]
Abstract
Cellular heterogeneity confounds in situ assays of transcription factor (TF) binding. Single-cell RNA sequencing (scRNA-seq) deconvolves cell types from gene expression, but no technology links cell identity to TF binding sites (TFBS) in those cell types. We present self-reporting transposons (SRTs) and use them in single-cell calling cards (scCC), a novel assay for simultaneously measuring gene expression and mapping TFBS in single cells. The genomic locations of SRTs are recovered from mRNA, and SRTs deposited by exogenous, TF-transposase fusions can be used to map TFBS. We then present scCC, which map SRTs from scRNA-seq libraries, simultaneously identifying cell types and TFBS in those same cells. We benchmark multiple TFs with this technique. Next, we use scCC to discover BRD4-mediated cell-state transitions in K562 cells. Finally, we map BRD4 binding sites in the mouse cortex at single-cell resolution, establishing a new method for studying TF biology in situ.
Collapse
Affiliation(s)
- Arnav Moudgil
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA; Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA; Medical Scientist Training Program, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA
| | - Michael N Wilkinson
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA; Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA
| | - Xuhua Chen
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA; Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA
| | - June He
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA; Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA
| | - Alexander J Cammack
- Department of Neurology, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA
| | - Michael J Vasek
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA
| | - Tomás Lagunas
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA
| | - Zongtai Qi
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA; Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA
| | - Matthew A Lalli
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA
| | - Chuner Guo
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA; Medical Scientist Training Program, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA; Department of Developmental Biology, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA
| | - Samantha A Morris
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA; Department of Developmental Biology, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA; Center of Regenerative Medicine, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA
| | - Joseph D Dougherty
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA; Department of Psychiatry, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA
| | - Robi D Mitra
- Department of Genetics, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA; Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA.
| |
Collapse
|
18
|
Smith JP, Sheffield NC. Analytical Approaches for ATAC-seq Data Analysis. CURRENT PROTOCOLS IN HUMAN GENETICS 2020; 106:e101. [PMID: 32543102 PMCID: PMC8191135 DOI: 10.1002/cphg.101] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
ATAC-seq, the assay for transposase-accessible chromatin using sequencing, is a quick and efficient approach to investigating the chromatin accessibility landscape. Investigating chromatin accessibility has broad utility for answering many biological questions, such as mapping nucleosomes, identifying transcription factor binding sites, and measuring differential activity of DNA regulatory elements. Because the ATAC-seq protocol is both simple and relatively inexpensive, there has been a rapid increase in the availability of chromatin accessibility data. Furthermore, advances in ATAC-seq protocols are rapidly extending its breadth to additional experimental conditions, cell types, and species. Accompanying the increase in data, there has also been an explosion of new tools and analytical approaches for analyzing it. Here, we explain the fundamentals of ATAC-seq data processing, summarize common analysis approaches, and review computational tools to provide recommendations for different research questions. This primer provides a starting point and a reference for analysis of ATAC-seq data. © 2020 Wiley Periodicals LLC.
Collapse
Affiliation(s)
- Jason P. Smith
- Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, Virginia
| | - Nathan C. Sheffield
- Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, Virginia
- Department of Public Health Sciences, University of Virginia, Charlottesville, Virginia
- Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia
| |
Collapse
|
19
|
Nordström KJV, Schmidt F, Gasparoni N, Salhab A, Gasparoni G, Kattler K, Müller F, Ebert P, Costa IG, Pfeifer N, Lengauer T, Schulz MH, Walter J. Unique and assay specific features of NOMe-, ATAC- and DNase I-seq data. Nucleic Acids Res 2020; 47:10580-10596. [PMID: 31584093 PMCID: PMC6847574 DOI: 10.1093/nar/gkz799] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Revised: 08/31/2019] [Accepted: 09/11/2019] [Indexed: 01/01/2023] Open
Abstract
Chromatin accessibility maps are important for the functional interpretation of the genome. Here, we systematically analysed assay specific differences between DNase I-seq, ATAC-seq and NOMe-seq in a side by side experimental and bioinformatic setup. We observe that most prominent nucleosome depleted regions (NDRs, e.g. in promoters) are roboustly called by all three or at least two assays. However, we also find a high proportion of assay specific NDRs that are often ‘called’ by only one of the assays. We show evidence that these assay specific NDRs are indeed genuine open chromatin sites and contribute important information for accurate gene expression prediction. While technically ATAC-seq and DNase I-seq provide a superb high NDR calling rate for relatively low sequencing costs in comparison to NOMe-seq, NOMe-seq singles out for its genome-wide coverage allowing to not only detect NDRs but also endogenous DNA methylation and as we show here genome wide segmentation into heterochromatic B domains and local phasing of nucleosomes outside of NDRs. In summary, our comparisons strongly suggest to consider assay specific differences for the experimental design and for generalized and comparative functional interpretations.
Collapse
Affiliation(s)
| | - Florian Schmidt
- Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, 66123 Saarbrücken, Germany.,Excellence Cluster on Multimodal Computing and Interaction, Saarland University, 66123 Saarbrücken, Germany
| | - Nina Gasparoni
- Department of Genetics, Saarland University, 66123 Saarbrücken, Germany
| | | | - Gilles Gasparoni
- Department of Genetics, Saarland University, 66123 Saarbrücken, Germany
| | - Kathrin Kattler
- Department of Genetics, Saarland University, 66123 Saarbrücken, Germany
| | - Fabian Müller
- Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, 66123 Saarbrücken, Germany
| | - Peter Ebert
- Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, 66123 Saarbrücken, Germany
| | - Ivan G Costa
- Institute for Computational Genomics, Joint Research Center for Computational Biomedicine, RWTH Aachen University Medical School, 52074 Aachen, Germany
| | | | - Nico Pfeifer
- Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, 66123 Saarbrücken, Germany
| | - Thomas Lengauer
- Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, 66123 Saarbrücken, Germany
| | - Marcel H Schulz
- Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, 66123 Saarbrücken, Germany.,Excellence Cluster on Multimodal Computing and Interaction, Saarland University, 66123 Saarbrücken, Germany
| | - Jörn Walter
- Department of Genetics, Saarland University, 66123 Saarbrücken, Germany
| |
Collapse
|
20
|
Reske JJ, Wilson MR, Chandler RL. ATAC-seq normalization method can significantly affect differential accessibility analysis and interpretation. Epigenetics Chromatin 2020; 13:22. [PMID: 32321567 PMCID: PMC7178746 DOI: 10.1186/s13072-020-00342-y] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Accepted: 04/11/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Chromatin dysregulation is associated with developmental disorders and cancer. Numerous methods for measuring genome-wide chromatin accessibility have been developed in the genomic era to interrogate the function of chromatin regulators. A recent technique which has gained widespread use due to speed and low input requirements with native chromatin is the Assay for Transposase-Accessible Chromatin, or ATAC-seq. Biologists have since used this method to compare chromatin accessibility between two cellular conditions. However, approaches for calculating differential accessibility can yield conflicting results, and little emphasis is placed on choice of normalization method during differential ATAC-seq analysis, especially when global chromatin alterations might be expected. RESULTS Using an in vivo ATAC-seq data set generated in our recent report, we observed differences in chromatin accessibility patterns depending on the data normalization method used to calculate differential accessibility. This observation was further verified on published ATAC-seq data from yeast. We propose a generalized workflow for differential accessibility analysis using ATAC-seq data. We further show this workflow identifies sites of differential chromatin accessibility that correlate with gene expression and is sensitive to differential analysis using negative controls. CONCLUSIONS We argue that researchers should systematically compare multiple normalization methods before continuing with differential accessibility analysis. ATAC-seq users should be aware of the interpretations of potential bias within experimental data and the assumptions of the normalization method implemented.
Collapse
Affiliation(s)
- Jake J Reske
- Department of Obstetrics, Gynecology and Reproductive Biology, College of Human Medicine, Michigan State University, Grand Rapids, MI, 49503, USA
| | - Mike R Wilson
- Department of Obstetrics, Gynecology and Reproductive Biology, College of Human Medicine, Michigan State University, Grand Rapids, MI, 49503, USA
| | - Ronald L Chandler
- Department of Obstetrics, Gynecology and Reproductive Biology, College of Human Medicine, Michigan State University, Grand Rapids, MI, 49503, USA. .,Center for Epigenetics, Van Andel Research Institute, Grand Rapids, MI, 49503, USA.
| |
Collapse
|
21
|
Yan F, Powell DR, Curtis DJ, Wong NC. From reads to insight: a hitchhiker's guide to ATAC-seq data analysis. Genome Biol 2020; 21:22. [PMID: 32014034 PMCID: PMC6996192 DOI: 10.1186/s13059-020-1929-3] [Citation(s) in RCA: 196] [Impact Index Per Article: 49.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Accepted: 01/08/2020] [Indexed: 12/16/2022] Open
Abstract
Assay of Transposase Accessible Chromatin sequencing (ATAC-seq) is widely used in studying chromatin biology, but a comprehensive review of the analysis tools has not been completed yet. Here, we discuss the major steps in ATAC-seq data analysis, including pre-analysis (quality check and alignment), core analysis (peak calling), and advanced analysis (peak differential analysis and annotation, motif enrichment, footprinting, and nucleosome position analysis). We also review the reconstruction of transcriptional regulatory networks with multiomics data and highlight the current challenges of each step. Finally, we describe the potential of single-cell ATAC-seq and highlight the necessity of developing ATAC-seq specific analysis tools to obtain biologically meaningful insights.
Collapse
Affiliation(s)
- Feng Yan
- Australian Centre for Blood Diseases, Central Clinical School, Monash University, Melbourne, VIC, Australia
| | - David R Powell
- Monash Bioinformatics Platform, Monash University, Melbourne, VIC, Australia
| | - David J Curtis
- Australian Centre for Blood Diseases, Central Clinical School, Monash University, Melbourne, VIC, Australia.,Department of Clinical Haematology, Alfred Health, Melbourne, VIC, Australia
| | - Nicholas C Wong
- Australian Centre for Blood Diseases, Central Clinical School, Monash University, Melbourne, VIC, Australia. .,Monash Bioinformatics Platform, Monash University, Melbourne, VIC, Australia.
| |
Collapse
|
22
|
Tarbell ED, Liu T. HMMRATAC: a Hidden Markov ModeleR for ATAC-seq. Nucleic Acids Res 2019; 47:e91. [PMID: 31199868 PMCID: PMC6895260 DOI: 10.1093/nar/gkz533] [Citation(s) in RCA: 49] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2018] [Revised: 05/23/2019] [Accepted: 06/04/2019] [Indexed: 12/12/2022] Open
Abstract
ATAC-seq has been widely adopted to identify accessible chromatin regions across the genome. However, current data analysis still utilizes approaches initially designed for ChIP-seq or DNase-seq, without considering the transposase digested DNA fragments that contain additional nucleosome positioning information. We present the first dedicated ATAC-seq analysis tool, a semi-supervised machine learning approach named HMMRATAC. HMMRATAC splits a single ATAC-seq dataset into nucleosome-free and nucleosome-enriched signals, learns the unique chromatin structure around accessible regions, and then predicts accessible regions across the entire genome. We show that HMMRATAC outperforms the popular peak-calling algorithms on published human ATAC-seq datasets. We find that single-end sequenced or size-selected ATAC-seq datasets result in a loss of sensitivity compared to paired-end datasets without size-selection.
Collapse
Affiliation(s)
- Evan D Tarbell
- Department of Biochemistry, University at Buffalo, Buffalo, NY 14203, USA.,Enhanced Pharmacodynamics LLC, Buffalo, NY 14203, USA
| | - Tao Liu
- Department of Biochemistry, University at Buffalo, Buffalo, NY 14203, USA.,Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY 14263, USA
| |
Collapse
|
23
|
Owens N, Papadopoulou T, Festuccia N, Tachtsidi A, Gonzalez I, Dubois A, Vandormael-Pournin S, Nora EP, Bruneau BG, Cohen-Tannoudji M, Navarro P. CTCF confers local nucleosome resiliency after DNA replication and during mitosis. eLife 2019; 8:e47898. [PMID: 31599722 PMCID: PMC6844645 DOI: 10.7554/elife.47898] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Accepted: 10/09/2019] [Indexed: 12/20/2022] Open
Abstract
The access of Transcription Factors (TFs) to their cognate DNA binding motifs requires a precise control over nucleosome positioning. This is especially important following DNA replication and during mitosis, both resulting in profound changes in nucleosome organization over TF binding regions. Using mouse Embryonic Stem (ES) cells, we show that the TF CTCF displaces nucleosomes from its binding site and locally organizes large and phased nucleosomal arrays, not only in interphase steady-state but also immediately after replication and during mitosis. Correlative analyses suggest this is associated with fast gene reactivation following replication and mitosis. While regions bound by other TFs (Oct4/Sox2), display major rearrangement, the post-replication and mitotic nucleosome positioning activity of CTCF is not unique: Esrrb binding regions are also characterized by persistent nucleosome positioning. Therefore, selected TFs such as CTCF and Esrrb act as resilient TFs governing the inheritance of nucleosome positioning at regulatory regions throughout the cell-cycle.
Collapse
Affiliation(s)
- Nick Owens
- Epigenomics, Proliferation, and the Identity of Cells, Department of Developmental and Stem Cell BiologyInstitut Pasteur, CNRS UMR3738ParisFrance
- Equipe Labellisée LIGUE Contre le CancerParisFrance
| | - Thaleia Papadopoulou
- Epigenomics, Proliferation, and the Identity of Cells, Department of Developmental and Stem Cell BiologyInstitut Pasteur, CNRS UMR3738ParisFrance
- Equipe Labellisée LIGUE Contre le CancerParisFrance
| | - Nicola Festuccia
- Epigenomics, Proliferation, and the Identity of Cells, Department of Developmental and Stem Cell BiologyInstitut Pasteur, CNRS UMR3738ParisFrance
- Equipe Labellisée LIGUE Contre le CancerParisFrance
| | - Alexandra Tachtsidi
- Epigenomics, Proliferation, and the Identity of Cells, Department of Developmental and Stem Cell BiologyInstitut Pasteur, CNRS UMR3738ParisFrance
- Equipe Labellisée LIGUE Contre le CancerParisFrance
- Sorbonne Université, Collège DoctoralParisFrance
| | - Inma Gonzalez
- Epigenomics, Proliferation, and the Identity of Cells, Department of Developmental and Stem Cell BiologyInstitut Pasteur, CNRS UMR3738ParisFrance
- Equipe Labellisée LIGUE Contre le CancerParisFrance
| | - Agnes Dubois
- Epigenomics, Proliferation, and the Identity of Cells, Department of Developmental and Stem Cell BiologyInstitut Pasteur, CNRS UMR3738ParisFrance
- Equipe Labellisée LIGUE Contre le CancerParisFrance
| | - Sandrine Vandormael-Pournin
- Epigenomics, Proliferation, and the Identity of Cells, Department of Developmental and Stem Cell BiologyInstitut Pasteur, CNRS UMR3738ParisFrance
- Early Mammalian Development and Stem Cell Biology, Department of Developmental and Stem Cell BiologyInstitut Pasteur, CNRS UMR 3738ParisFrance
| | - Elphège P Nora
- Gladstone InstitutesSan FranciscoUnited States
- Cardiovascular Research InstituteUniversity of California, San FranciscoSan FranciscoUnited States
| | - Benoit G Bruneau
- Gladstone InstitutesSan FranciscoUnited States
- Cardiovascular Research InstituteUniversity of California, San FranciscoSan FranciscoUnited States
- Department of PediatricsUniversity of California, San FranciscoSan FranciscoUnited States
| | - Michel Cohen-Tannoudji
- Epigenomics, Proliferation, and the Identity of Cells, Department of Developmental and Stem Cell BiologyInstitut Pasteur, CNRS UMR3738ParisFrance
- Early Mammalian Development and Stem Cell Biology, Department of Developmental and Stem Cell BiologyInstitut Pasteur, CNRS UMR 3738ParisFrance
| | - Pablo Navarro
- Epigenomics, Proliferation, and the Identity of Cells, Department of Developmental and Stem Cell BiologyInstitut Pasteur, CNRS UMR3738ParisFrance
- Equipe Labellisée LIGUE Contre le CancerParisFrance
| |
Collapse
|
24
|
Sathyan KM, McKenna BD, Anderson WD, Duarte FM, Core L, Guertin MJ. An improved auxin-inducible degron system preserves native protein levels and enables rapid and specific protein depletion. Genes Dev 2019; 33:1441-1455. [PMID: 31467088 PMCID: PMC6771385 DOI: 10.1101/gad.328237.119] [Citation(s) in RCA: 61] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Accepted: 07/18/2019] [Indexed: 12/16/2022]
Abstract
Rapid perturbation of protein function permits the ability to define primary molecular responses while avoiding downstream cumulative effects of protein dysregulation. The auxin-inducible degron (AID) system was developed as a tool to achieve rapid and inducible protein degradation in nonplant systems. However, tagging proteins at their endogenous loci results in chronic auxin-independent degradation by the proteasome. To correct this deficiency, we expressed the auxin response transcription factor (ARF) in an improved inducible degron system. ARF is absent from previously engineered AID systems but is a critical component of native auxin signaling. In plants, ARF directly interacts with AID in the absence of auxin, and we found that expression of the ARF PB1 (Phox and Bem1) domain suppresses constitutive degradation of AID-tagged proteins. Moreover, the rate of auxin-induced AID degradation is substantially faster in the ARF-AID system. To test the ARF-AID system in a quantitative and sensitive manner, we measured genome-wide changes in nascent transcription after rapidly depleting the ZNF143 transcription factor. Transcriptional profiling indicates that ZNF143 activates transcription in cis and regulates promoter-proximal paused RNA polymerase density. Rapidly inducible degradation systems that preserve the target protein's native expression levels and patterns will revolutionize the study of biological systems by enabling specific and temporally defined protein dysregulation.
Collapse
Affiliation(s)
- Kizhakke Mattada Sathyan
- Biochemistry and Molecular Genetics Department, University of Virginia, Charlottesville, Virginia 22908, USA
| | - Brian D McKenna
- Biochemistry and Molecular Genetics Department, University of Virginia, Charlottesville, Virginia 22908, USA
| | - Warren D Anderson
- Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia 22908, USA
| | - Fabiana M Duarte
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Leighton Core
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, Connecticut 06269, USA
| | - Michael J Guertin
- Biochemistry and Molecular Genetics Department, University of Virginia, Charlottesville, Virginia 22908, USA.,Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia 22908, USA.,Cancer Center, University of Virginia, Charlottesville, Virginia 22908, USA
| |
Collapse
|
25
|
Li Z, Schulz MH, Look T, Begemann M, Zenke M, Costa IG. Identification of transcription factor binding sites using ATAC-seq. Genome Biol 2019; 20:45. [PMID: 30808370 PMCID: PMC6391789 DOI: 10.1186/s13059-019-1642-2] [Citation(s) in RCA: 233] [Impact Index Per Article: 46.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2018] [Accepted: 01/25/2019] [Indexed: 01/07/2023] Open
Abstract
Transposase-Accessible Chromatin followed by sequencing (ATAC-seq) is a simple protocol for detection of open chromatin. Computational footprinting, the search for regions with depletion of cleavage events due to transcription factor binding, is poorly understood for ATAC-seq. We propose the first footprinting method considering ATAC-seq protocol artifacts. HINT-ATAC uses a position dependency model to learn the cleavage preferences of the transposase. We observe strand-specific cleavage patterns around transcription factor binding sites, which are determined by local nucleosome architecture. By incorporating all these biases, HINT-ATAC is able to significantly outperform competing methods in the prediction of transcription factor binding sites with footprints.
Collapse
Affiliation(s)
- Zhijian Li
- Institute for Computational Genomics, Joint Research Center for Computational Biomedicine, RWTH Aachen University Medical School, Aachen, 52074 Germany
- Department of Cell Biology, Institute of Biomedical Engineering, RWTH Aachen University Medical School, Aachen, 52074 Germany
| | - Marcel H. Schulz
- Cluster of Excellence for Multimodal Computing and Interaction, Saarland Informatics Campus, Saarland University, Saarbrücken, Germany
- Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarbrücken, Germany
- Institute for Cardiovascular Regeneration, Goethe University, Frankfurt am Main, Germany
- German Centre for Cardiovascular Research (DZHK), Partner site RheinMain, Frankfurt am Main, Germany
| | - Thomas Look
- Department of Cell Biology, Institute of Biomedical Engineering, RWTH Aachen University Medical School, Aachen, 52074 Germany
- Helmholtz Institute for Biomedical Engineering, RWTH Aachen University, Aachen, Germany
| | - Matthias Begemann
- Institute of Human Genetics, RWTH Aachen University Medical School, Aachen, Germany
| | - Martin Zenke
- Department of Cell Biology, Institute of Biomedical Engineering, RWTH Aachen University Medical School, Aachen, 52074 Germany
- Helmholtz Institute for Biomedical Engineering, RWTH Aachen University, Aachen, Germany
| | - Ivan G. Costa
- Institute for Computational Genomics, Joint Research Center for Computational Biomedicine, RWTH Aachen University Medical School, Aachen, 52074 Germany
- Helmholtz Institute for Biomedical Engineering, RWTH Aachen University, Aachen, Germany
| |
Collapse
|
26
|
Karabacak Calviello A, Hirsekorn A, Wurmus R, Yusuf D, Ohler U. Reproducible inference of transcription factor footprints in ATAC-seq and DNase-seq datasets using protocol-specific bias modeling. Genome Biol 2019; 20:42. [PMID: 30791920 PMCID: PMC6385462 DOI: 10.1186/s13059-019-1654-y] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Accepted: 02/13/2019] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND DNase-seq and ATAC-seq are broadly used methods to assay open chromatin regions genome-wide. The single nucleotide resolution of DNase-seq has been further exploited to infer transcription factor binding sites (TFBSs) in regulatory regions through footprinting. Recent studies have demonstrated the sequence bias of DNase I and its adverse effects on footprinting efficiency. However, footprinting and the impact of sequence bias have not been extensively studied for ATAC-seq. RESULTS Here, we undertake a systematic comparison of the two methods and show that a modification to the ATAC-seq protocol increases its yield and its agreement with DNase-seq data from the same cell line. We demonstrate that the two methods have distinct sequence biases and correct for these protocol-specific biases when performing footprinting. Despite the differences in footprint shapes, the locations of the inferred footprints in ATAC-seq and DNase-seq are largely concordant. However, the protocol-specific sequence biases in conjunction with the sequence content of TFBSs impact the discrimination of footprint from the background, which leads to one method outperforming the other for some TFs. Finally, we address the depth required for reproducible identification of open chromatin regions and TF footprints. CONCLUSIONS We demonstrate that the impact of bias correction on footprinting performance is greater for DNase-seq than for ATAC-seq and that DNase-seq footprinting leads to better performance. It is possible to infer concordant footprints by using replicates, highlighting the importance of reproducibility assessment. The results presented here provide an overview of the advantages and limitations of footprinting analyses using ATAC-seq and DNase-seq.
Collapse
Affiliation(s)
- Aslıhan Karabacak Calviello
- Max Delbrück Center for Molecular Medicine, Berlin Institute for Medical Systems Biology, Berlin, Germany
- Department of Biology, Humboldt University, Berlin, Germany
| | - Antje Hirsekorn
- Max Delbrück Center for Molecular Medicine, Berlin Institute for Medical Systems Biology, Berlin, Germany
| | - Ricardo Wurmus
- Max Delbrück Center for Molecular Medicine, Berlin Institute for Medical Systems Biology, Berlin, Germany
| | - Dilmurat Yusuf
- Max Delbrück Center for Molecular Medicine, Berlin Institute for Medical Systems Biology, Berlin, Germany
| | - Uwe Ohler
- Max Delbrück Center for Molecular Medicine, Berlin Institute for Medical Systems Biology, Berlin, Germany.
- Department of Biology, Humboldt University, Berlin, Germany.
- Department of Computer Science, Humboldt University, Berlin, Germany.
| |
Collapse
|
27
|
Li H, Quang D, Guan Y. Anchor: trans-cell type prediction of transcription factor binding sites. Genome Res 2019; 29:281-292. [PMID: 30567711 PMCID: PMC6360811 DOI: 10.1101/gr.237156.118] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 12/13/2018] [Indexed: 12/16/2022]
Abstract
The ENCyclopedia of DNA Elements (ENCODE) consortium has generated transcription factor (TF) binding ChIP-seq data covering hundreds of TF proteins and cell types; however, due to limits on time and resources, only a small fraction of all possible TF-cell type pairs have been profiled. One solution is to build machine learning models trained on currently available epigenomic data sets that can be applied to the remaining missing pairs. A major challenge is that TF binding sites are cell-type-specific, which can be attributed to cellular contexts such as chromatin accessibility. Meanwhile, indirect TF-DNA binding and interactions between TFs complicate this regulatory process. Technical issues such as sequencing biases and batch effects render the prediction task even more challenging. Many pioneering efforts have been made to predict TF binding profiles based on DNA sequence and DNase-seq footprints, but to what extent a model can be generalized to completely untested cell conditions remains unknown. In this study, we describe our first place solution to the 2017 ENCODE-DREAM in vivo TF binding site prediction challenge. By carefully addressing multisource biases and information imbalance across cell types, we created a pipeline that significantly outperforms the current state-of-the-art methods. The proposed method is sufficiently complex enough to model nonlinear interactions between TF binding motifs and chromatin accessibility information up to 1500 bp from the genomic region of interest.
Collapse
Affiliation(s)
- Hongyang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Daniel Quang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| |
Collapse
|
28
|
Marom S, Blumberg A, Kundaje A, Mishmar D. mtDNA Chromatin-like Organization Is Gradually Established during Mammalian Embryogenesis. iScience 2019; 12:141-151. [PMID: 30684873 PMCID: PMC6352746 DOI: 10.1016/j.isci.2018.12.032] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Revised: 11/08/2018] [Accepted: 12/27/2018] [Indexed: 02/06/2023] Open
Abstract
Unlike the nuclear genome, the mammalian mitochondrial genome (mtDNA) is thought to be coated solely by mitochondrial transcription factor A (TFAM), whose binding sequence preferences are debated. Therefore, higher-order mtDNA organization is considered much less regulated than both the bacterial nucleoid and the nuclear chromatin. However, our recently identified conserved DNase footprinting pattern in human mtDNA, which co-localizes with regulatory elements and responds to physiological conditions, likely reflects a structured higher-order mtDNA organization. We hypothesized that this pattern emerges during embryogenesis. To test this hypothesis, we analyzed assay for transposase-accessible chromatin sequencing (ATAC-seq) results collected during the course of mouse and human early embryogenesis. Our results reveal, for the first time, a gradual and dynamic emergence of the adult mtDNA footprinting pattern during embryogenesis of both mammals. Taken together, our findings suggest that the structured adult chromatin-like mtDNA organization is gradually formed during mammalian embryogenesis. Mouse and human mtDNA ATAC-seq footprinting patterns are formed during embryogenesis mtDNA footprinting sites were either occupied in preimplantation or appeared later mtDNA footprinting associates with regulatory elements and protein-binding sites The mtDNA footprinting sites tend to harbor secondary structures
Collapse
Affiliation(s)
- Shani Marom
- Department of Life Sciences, Faculty of Natural Sciences, Ben-Gurion University of the Negev, Beer-Sheva 8410501, Israel
| | - Amit Blumberg
- Department of Life Sciences, Faculty of Natural Sciences, Ben-Gurion University of the Negev, Beer-Sheva 8410501, Israel
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, CA, USA; Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Dan Mishmar
- Department of Life Sciences, Faculty of Natural Sciences, Ben-Gurion University of the Negev, Beer-Sheva 8410501, Israel.
| |
Collapse
|
29
|
Abstract
ChIP-seq and ChIP-exo identify where proteins bind along any genome in vivo. Although ChIP-seq is widely adopted in academic research, it has inherently high noise. In contrast, ChIP-exo has relatively low noise and achieves near-base pair resolution. Consequently, and unlike other genomic assays, ChIP-exo provides structural information on genome-wide binding proteins. Construction of ChIP-exo libraries is technically difficult. Here we describe greatly simplified ChIP-exo methods, each with use-specific advantages. This is achieved through assay optimization and use of Tn5 tagmentation and/or single-stranded DNA ligation. Greater library yields, lower processing time, and lower costs are achieved. In comparing assays, we reveal substantial limitations in other ChIP-based assays. Importantly, the new ChIP-exo assays allow high-resolution detection of some protein-DNA interactions in organs and in as few as 27,000 cells. It is suitable for high-throughput parallelization. The simplicity of ChIP-exo now makes it a highly appropriate substitute for ChIP-seq, and for broader adoption. While ChIP-exo is low noise and highly informative regarding genome-wide binding proteins, libraries are difficult to construct. Here the authors present a simplified ChIP-exo method for high-resolution detection of interactions.
Collapse
|
30
|
Liu Y, Walavalkar NM, Dozmorov MG, Rich SS, Civelek M, Guertin MJ. Identification of breast cancer associated variants that modulate transcription factor binding. PLoS Genet 2017; 13:e1006761. [PMID: 28957321 PMCID: PMC5619690 DOI: 10.1371/journal.pgen.1006761] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Accepted: 04/12/2017] [Indexed: 01/11/2023] Open
Abstract
Genome-wide association studies (GWAS) have discovered thousands loci associated with disease risk and quantitative traits, yet most of the variants responsible for risk remain uncharacterized. The majority of GWAS-identified loci are enriched for non-coding single-nucleotide polymorphisms (SNPs) and defining the molecular mechanism of risk is challenging. Many non-coding causal SNPs are hypothesized to alter transcription factor (TF) binding sites as the mechanism by which they affect organismal phenotypes. We employed an integrative genomics approach to identify candidate TF binding motifs that confer breast cancer-specific phenotypes identified by GWAS. We performed de novo motif analysis of regulatory elements, analyzed evolutionary conservation of identified motifs, and assayed TF footprinting data to identify sequence elements that recruit TFs and maintain chromatin landscape in breast cancer-relevant tissue and cell lines. We identified candidate causal SNPs that are predicted to alter TF binding within breast cancer-relevant regulatory regions that are in strong linkage disequilibrium with significantly associated GWAS SNPs. We confirm that the TFs bind with predicted allele-specific preferences using CTCF ChIP-seq data. We used The Cancer Genome Atlas breast cancer patient data to identify ANKLE1 and ZNF404 as the target genes of candidate TF binding site SNPs in the 19p13.11 and 19q13.31 GWAS-identified loci. These SNPs are associated with the expression of ZNF404 and ANKLE1 in breast tissue. This integrative analysis pipeline is a general framework to identify candidate causal variants within regulatory regions and TF binding sites that confer phenotypic variation and disease risk.
Collapse
Affiliation(s)
- Yunxian Liu
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, Virginia, United States of America
| | - Ninad M. Walavalkar
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, Virginia, United States of America
| | - Mikhail G. Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, Virginia, United States of America
| | - Stephen S. Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia, United States of America
| | - Mete Civelek
- Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia, United States of America
- Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia, United Statess of America
| | - Michael J. Guertin
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, Virginia, United States of America
- Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia, United States of America
| |
Collapse
|