1
|
Li Y, Wang Y, Wang C, Ma A, Ma Q, Liu B. A weighted two-stage sequence alignment framework to identify motifs from ChIP-exo data. Patterns (N Y) 2024; 5:100927. [PMID: 38487805 PMCID: PMC10935504 DOI: 10.1016/j.patter.2024.100927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 08/18/2023] [Accepted: 01/10/2024] [Indexed: 03/17/2024]
Abstract
In this study, we introduce TESA (weighted two-stage alignment), an innovative motif prediction tool that refines the identification of DNA-binding protein motifs, essential for deciphering transcriptional regulatory mechanisms. Unlike traditional algorithms that rely solely on sequence data, TESA integrates the high-resolution chromatin immunoprecipitation (ChIP) signal, specifically from ChIP-exonuclease (ChIP-exo), by assigning weights to sequence positions, thereby enhancing motif discovery. TESA employs a nuanced approach combining a binomial distribution model with a graph model, further supported by a "bookend" model, to improve the accuracy of predicting motifs of varying lengths. Our evaluation, utilizing an extensive compilation of 90 prokaryotic ChIP-exo datasets from proChIPdb and 167 H. sapiens datasets, compared TESA's performance against seven established tools. The results indicate TESA's improved precision in motif identification, suggesting its valuable contribution to the field of genomic research.
Collapse
Affiliation(s)
- Yang Li
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Yizhong Wang
- School of Mathematics, Shandong University, Jinan, Shandong 250100, China
| | - Cankun Wang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Anjun Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, Shandong 250100, China
| |
Collapse
|
2
|
Lee SM, Le HT, Taizhanova A, Nong LK, Park JY, Lee EJ, Palsson BO, Kim D. Experimental promoter identification of a foodborne pathogen Salmonella enterica subsp. enterica serovar Typhimurium with near single base-pair resolution. Front Microbiol 2024; 14:1271121. [PMID: 38239730 PMCID: PMC10794520 DOI: 10.3389/fmicb.2023.1271121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 12/01/2023] [Indexed: 01/22/2024] Open
Abstract
Salmonella enterica serovar Typhimurium (S. Typhimurium) is a common foodborne pathogen which is frequently used as the reference strain for Salmonella. Investigating the sigma factor network and protomers is crucial to understand the genomic and transcriptomic properties of the bacterium. Its promoters were identified using various methods such as dRNA-seq, ChIP-chip, or ChIP-Seq. However, validation using ChIP-exo, which exhibits higher-resolution performance compared to conventional ChIP, has not been conducted to date. In this study, using the representative strain S. Typhimurium LT2 (LT2), the ChIP-exo experiment was conducted to accurately determine the binding sites of catalytic RNA polymerase subunit RpoB and major sigma factors (RpoD, RpoN, RpoS, and RpoE) during exponential phase. Integrated with the results of RNA-Seq, promoters and sigmulons for the sigma factors and their association with RpoB have been discovered. Notably, the overlapping regions among binding sites of each alternative sigma factor were found. Furthermore, comparative analysis with Escherichia coli str. K-12 substr. MG1655 (MG1655) revealed conserved binding sites of RpoD and RpoN across different species. In the case of small RNAs (sRNAs), 50 sRNAs observed their expression during the exponential growth of LT2. Collectively, the integration of ChIP-exo and RNA-Seq enables genome-scale promoter mapping with high resolution and facilitates the characterization of binding events of alternative sigma factors, enabling a comprehensive understanding of the bacterial sigma factor network and condition-specific active promoters.
Collapse
Affiliation(s)
- Sang-Mok Lee
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, Republic of Korea
| | - Hoa Thi Le
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, Republic of Korea
| | - Assiya Taizhanova
- Department of Genetic Engineering and Graduate School of Biotechnology, College of Life Sciences, Kyung Hee University, Yongin, Republic of Korea
| | - Linh Khanh Nong
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, Republic of Korea
| | - Joon Young Park
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, Republic of Korea
| | - Eun-Jin Lee
- Department of Life Sciences, College of Life Sciences and Biotechnology, Korea University, Seoul, Republic of Korea
| | - Bernhard O. Palsson
- Department of Bioengineering, University of California San Diego, La Jolla, CA, United States
| | - Donghyuk Kim
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, Republic of Korea
| |
Collapse
|
3
|
Bang I, Lee SM, Park S, Park JY, Nong LK, Gao Y, Palsson BO, Kim D. Deep-learning optimized DEOCSU suite provides an iterable pipeline for accurate ChIP-exo peak calling. Brief Bioinform 2023; 24:7005164. [PMID: 36702751 DOI: 10.1093/bib/bbad024] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 01/02/2023] [Accepted: 01/08/2023] [Indexed: 01/28/2023] Open
Abstract
Recognizing binding sites of DNA-binding proteins is a key factor for elucidating transcriptional regulation in organisms. ChIP-exo enables researchers to delineate genome-wide binding landscapes of DNA-binding proteins with near single base-pair resolution. However, the peak calling step hinders ChIP-exo application since the published algorithms tend to generate false-positive and false-negative predictions. Here, we report the development of DEOCSU (DEep-learning Optimized ChIP-exo peak calling SUite), a novel machine learning-based ChIP-exo peak calling suite. DEOCSU entails the deep convolutional neural network model which was trained with curated ChIP-exo peak data to distinguish the visualized data of bona fide peaks from false ones. Performance validation of the trained deep-learning model indicated its high accuracy, high precision and high recall of over 95%. Applying the new suite to both in-house and publicly available ChIP-exo datasets obtained from bacteria, eukaryotes and archaea revealed an accurate prediction of peaks containing canonical motifs, highlighting the versatility and efficiency of DEOCSU. Furthermore, DEOCSU can be executed on a cloud computing platform or the local environment. With visualization software included in the suite, adjustable options such as the threshold of peak probability, and iterable updating of the pre-trained model, DEOCSU can be optimized for users' specific needs.
Collapse
Affiliation(s)
- Ina Bang
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea
| | - Sang-Mok Lee
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea
| | - Seojoung Park
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea
| | - Joon Young Park
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea
| | - Linh Khanh Nong
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea
| | - Ye Gao
- Department of Bioengineering, University of California San Diego, La Jolla CA 92093, USA
| | - Bernhard O Palsson
- Department of Bioengineering, University of California San Diego, La Jolla CA 92093, USA
- Department of Pediatrics, University of California San Diego, La Jolla CA 92093, USA
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs. Lyngby, Denmark
| | - Donghyuk Kim
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea
| |
Collapse
|
4
|
Yeh SY, Rhee HS. The ChIP-Exo Method to Identify Genomic Locations of DNA-Binding Proteins at Near Single Base-Pair Resolution. Methods Mol Biol 2023; 2599:33-48. [PMID: 36427141 DOI: 10.1007/978-1-0716-2847-8_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Chromatin immunoprecipitation (ChIP) is a technique to determine whether a protein interacts with a specific DNA sequence. ChIP-sequencing (ChIP-seq) is one of the most widely used methods to identify genome-wide DNA-binding sites of nuclear proteins. Here, we describe the ChIP-exo method, which is a refined version of ChIP-seq combined with lambda exonuclease digestion. ChIP-exo can identify genomic locations of DNA-binding proteins at a near single base-pair (bp) resolution. It removes most of the background DNA signals. ChIP-exo has emerged as a powerful technique to study the genome-wide organization of DNA-binding proteins.
Collapse
Affiliation(s)
- Ssu-Yu Yeh
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada
- Department of Biology, University of Toronto, Mississauga, ON, Canada
| | - Ho Sung Rhee
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada.
- Department of Biology, University of Toronto, Mississauga, ON, Canada.
| |
Collapse
|
5
|
Bang I, Khanh Nong L, Young Park J, Thi Le H, Mok Lee S, Kim D. ChEAP: ChIP-exo analysis pipeline and the investigation of Escherichia coli RpoN protein-DNA interactions. Comput Struct Biotechnol J 2022; 21:99-104. [PMID: 36544470 PMCID: PMC9735260 DOI: 10.1016/j.csbj.2022.11.053] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 11/25/2022] [Accepted: 11/25/2022] [Indexed: 12/03/2022] Open
Abstract
Genome-scale studies of the bacterial regulatory network have been leveraged by declining sequencing cost and advances in ChIP (chromatin immunoprecipitation) methods. Of which, ChIP-exo has proven competent with its near-single base-pair resolution. While several algorithms and programs have been developed for different analytical steps in ChIP-exo data processing, there is a lack of effort in incorporating them into a convenient bioinformatics pipeline that is intuitive and publicly available. In this paper, we developed ChIP-exo Analysis Pipeline (ChEAP) that executes the one-step process, starting from trimming and aligning raw sequencing reads to visualization of ChIP-exo results. The pipeline was implemented on the interactive web-based Python development environment - Jupyter Notebook, which is compatible with the Google Colab cloud platform to facilitate the sharing of codes and collaboration among researchers. Additionally, users could exploit the free GPU and CPU resources allocated by Colab to carry out computing tasks regardless of the performance of their local machines. The utility of ChEAP was demonstrated with the ChIP-exo datasets of RpoN sigma factor in E. coli K-12 MG1655. To analyze two raw data files, ChEAP runtime was 2 min and 25 s. Subsequent analyses identified 113 RpoN binding sites showing a conserved RpoN binding pattern in the motif search. ChEAP application in ChIP-exo data analysis is extensive and flexible for the parallel processing of data from various organisms.
Collapse
Affiliation(s)
- Ina Bang
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Linh Khanh Nong
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Joon Young Park
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Hoa Thi Le
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Sang- Mok Lee
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea
| | - Donghyuk Kim
- School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea,Schools of Life Sciences, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea,Corresponding author at: School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Republic of Korea.
| |
Collapse
|
6
|
Tierrafría VH, Rioualen C, Salgado H, Lara P, Gama-Castro S, Lally P, Gómez-Romero L, Peña-Loredo P, López-Almazo AG, Alarcón-Carranza G, Betancourt-Figueroa F, Alquicira-Hernández S, Polanco-Morelos JE, García-Sotelo J, Gaytan-Nuñez E, Méndez-Cruz CF, Muñiz LJ, Bonavides-Martínez C, Moreno-Hagelsieb G, Galagan JE, Wade JT, Collado-Vides J. RegulonDB 11.0: Comprehensive high-throughput datasets on transcriptional regulation in Escherichia coli K-12. Microb Genom 2022; 8. [PMID: 35584008 PMCID: PMC9465075 DOI: 10.1099/mgen.0.000833] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Genomics has set the basis for a variety of methodologies that produce high-throughput datasets identifying the different players that define gene regulation, particularly regulation of transcription initiation and operon organization. These datasets are available in public repositories, such as the Gene Expression Omnibus, or ArrayExpress. However, accessing and navigating such a wealth of data is not straightforward. No resource currently exists that offers all available high and low-throughput data on transcriptional regulation in Escherichia coli K-12 to easily use both as whole datasets, or as individual interactions and regulatory elements. RegulonDB (https://regulondb.ccg.unam.mx) began gathering high-throughput dataset collections in 2009, starting with transcription start sites, then adding ChIP-seq and gSELEX in 2012, with up to 99 different experimental high-throughput datasets available in 2019. In this paper we present a radical upgrade to more than 2000 high-throughput datasets, processed to facilitate their comparison, introducing up-to-date collections of transcription termination sites, transcription units, as well as transcription factor binding interactions derived from ChIP-seq, ChIP-exo, gSELEX and DAP-seq experiments, besides expression profiles derived from RNA-seq experiments. For ChIP-seq experiments we offer both the data as presented by the authors, as well as data uniformly processed in-house, enhancing their comparability, as well as the traceability of the methods and reproducibility of the results. Furthermore, we have expanded the tools available for browsing and visualization across and within datasets. We include comparisons against previously existing knowledge in RegulonDB from classic experiments, a nucleotide-resolution genome viewer, and an interface that enables users to browse datasets by querying their metadata. A particular effort was made to automatically extract detailed experimental growth conditions by implementing an assisted curation strategy applying Natural language processing and machine learning. We provide summaries with the total number of interactions found in each experiment, as well as tools to identify common results among different experiments. This is a long-awaited resource to make use of such wealth of knowledge and advance our understanding of the biology of the model bacterium E. coli K-12.
Collapse
Affiliation(s)
- Víctor H Tierrafría
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, Mexico.,Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215, USA
| | - Claire Rioualen
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, Mexico
| | - Heladia Salgado
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, Mexico
| | - Paloma Lara
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, Mexico
| | - Socorro Gama-Castro
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, Mexico
| | - Patrick Lally
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215, USA
| | - Laura Gómez-Romero
- Instituto Nacional de Medicina Genómica, INMEGEN, Periférico Sur 4809, Arenal Tepepan, Tlalpan 14610, CDMX, Mexico
| | - Pablo Peña-Loredo
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, Mexico
| | - Andrés G López-Almazo
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, Mexico
| | - Gabriel Alarcón-Carranza
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, Mexico
| | - Felipe Betancourt-Figueroa
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, Mexico
| | - Shirley Alquicira-Hernández
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, Mexico
| | - J Enrique Polanco-Morelos
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, Mexico
| | - Jair García-Sotelo
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Querétaro 76230, Querétaro, Mexico
| | - Estefani Gaytan-Nuñez
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, Mexico
| | - Carlos-Francisco Méndez-Cruz
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, Mexico
| | - Luis J Muñiz
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, Mexico
| | - César Bonavides-Martínez
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, Mexico
| | - Gabriel Moreno-Hagelsieb
- Department of Biology, Wilfrid Laurier University, 75 University Ave W, Waterloo, ON N2L 3C5, Canada
| | - James E Galagan
- Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215, USA
| | - Joseph T Wade
- Wadsworth Center, New York State Department of Health, Albany, NY, USA.,Department of Biomedical Sciences, University at Albany, SUNY, Albany, NY, USA
| | - Julio Collado-Vides
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca 62210, Morelos, Mexico.,Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA 02215, USA.,Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Universitat Pompeu Fabra(UPF), Barcelona, Spain
| |
Collapse
|
7
|
Rodriguez-Martinez A, Vuorinen EM, Shcherban A, Uusi-Mäkelä J, Rajala NKM, Nykter M, Kallioniemi A. Novel ZNF414 activity characterized by integrative analysis of ChIP-exo, ATAC-seq and RNA-seq data. Biochim Biophys Acta Gene Regul Mech 2022; 1865:194811. [PMID: 35318951 DOI: 10.1016/j.bbagrm.2022.194811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 03/05/2022] [Accepted: 03/08/2022] [Indexed: 06/14/2023]
Abstract
Transcription factor binding to DNA is a central mechanism regulating gene expression. Thus, thorough characterization of this process is essential for understanding cellular biology in both health and disease. We combined data from three sequencing-based methods to unravel the DNA binding function of the novel ZNF414 protein in cells representing two tumor types. ChIP-exo served to map protein binding sites, ATAC-seq allowed identification of open chromatin, and RNA-seq examined the transcriptome. We show that ZNF414 is a DNA-binding protein that both induces and represses gene expression. This transcriptional response has an impact on cellular processes related to proliferation and other malignancy-associated functions, such as cell migration and DNA repair. Approximately 20% of the differentially expressed genes harbored ZNF414 binding sites in their promoters in accessible chromatin, likely representing direct targets of ZNF414. De novo motif discovery revealed several putative ZNF414 binding sequences, one of which was validated using EMSA. In conclusion, this study illustrates a highly efficient integrative approach for the characterization of the DNA binding and transcriptional activity of transcription factors.
Collapse
Affiliation(s)
- Alejandra Rodriguez-Martinez
- Prostate Cancer Research Center, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland; Tays Cancer Center, Tampere University Hospital, Tampere, Finland; BioMediTech, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.
| | - Elisa M Vuorinen
- Prostate Cancer Research Center, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland; Tays Cancer Center, Tampere University Hospital, Tampere, Finland; BioMediTech, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
| | - Anastasia Shcherban
- Tays Cancer Center, Tampere University Hospital, Tampere, Finland; BioMediTech, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland; Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Joonas Uusi-Mäkelä
- Prostate Cancer Research Center, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland; Tays Cancer Center, Tampere University Hospital, Tampere, Finland; BioMediTech, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
| | - Nina K M Rajala
- BioMediTech, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
| | - Matti Nykter
- Prostate Cancer Research Center, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland; Tays Cancer Center, Tampere University Hospital, Tampere, Finland; BioMediTech, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
| | - Anne Kallioniemi
- Tays Cancer Center, Tampere University Hospital, Tampere, Finland; BioMediTech, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland; Fimlab Laboratories, Tampere, Finland
| |
Collapse
|
8
|
Blombach F, Smollett KL, Werner F. ChIP-Seq Occupancy Mapping of the Archaeal Transcription Machinery. Methods Mol Biol 2022; 2522:209-222. [PMID: 36125752 DOI: 10.1007/978-1-0716-2445-6_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Genome-wide occupancy studies for RNA polymerases and their basal transcription factors deliver information about transcription dynamics and the recruitment of transcription elongation and termination factors in eukaryotes and prokaryotes. The primary method to determine genome-wide occupancies is chromatin immunoprecipitation combined with deep sequencing (ChIP-seq). Archaea possess a transcription machinery that is evolutionarily closer related to its eukaryotic counterpart but it operates in a prokaryotic cellular context. Studies on archaeal transcription brought insight into the evolution of transcription machineries and the universality of transcription mechanisms. Because of the limited resolution of ChIP-seq, the close spacing of promoters and transcription units found in archaeal genomes pose a challenge for ChIP-seq and the ensuing data analysis. The extreme growth temperature of many established archaeal model organisms necessitates further adaptations. This chapter describes a version of ChIP-seq adapted for the basal transcription machinery of thermophilic archaea and some modifications to the data analysis.
Collapse
Affiliation(s)
- Fabian Blombach
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London, UK.
| | - Kathy L Smollett
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London, UK
| | - Finn Werner
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London, UK.
| |
Collapse
|
9
|
Kim GB, Gao Y, Palsson BO, Lee SY. DeepTFactor: A deep learning-based tool for the prediction of transcription factors. Proc Natl Acad Sci U S A 2021; 118:e2021171118. [PMID: 33372147 DOI: 10.1073/pnas.2021171118] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
A transcription factor (TF) is a sequence-specific DNA-binding protein that modulates the transcription of a set of particular genes, and thus regulates gene expression in the cell. TFs have commonly been predicted by analyzing sequence homology with the DNA-binding domains of TFs already characterized. Thus, TFs that do not show homologies with the reported ones are difficult to predict. Here we report the development of a deep learning-based tool, DeepTFactor, that predicts whether a protein in question is a TF. DeepTFactor uses a convolutional neural network to extract features of a protein. It showed high performance in predicting TFs of both eukaryotic and prokaryotic origins, resulting in F1 scores of 0.8154 and 0.8000, respectively. Analysis of the gradients of prediction score with respect to input suggested that DeepTFactor detects DNA-binding domains and other latent features for TF prediction. DeepTFactor predicted 332 candidate TFs in Escherichia coli K-12 MG1655. Among them, 84 candidate TFs belong to the y-ome, which is a collection of genes that lack experimental evidence of function. We experimentally validated the results of DeepTFactor prediction by further characterizing genome-wide binding sites of three predicted TFs, YqhC, YiaU, and YahB. Furthermore, we made available the list of 4,674,808 TFs predicted from 73,873,012 protein sequences in 48,346 genomes. DeepTFactor will serve as a useful tool for predicting TFs, which is necessary for understanding the regulatory systems of organisms of interest. We provide DeepTFactor as a stand-alone program, available at https://bitbucket.org/kaistsystemsbiology/deeptfactor.
Collapse
|
10
|
Vera Alvarez R, Pongor L, Mariño-Ramírez L, Landsman D. PM4NGS, a project management framework for next-generation sequencing data analysis. Gigascience 2021; 10:giaa141. [PMID: 33410471 PMCID: PMC7788391 DOI: 10.1093/gigascience/giaa141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Revised: 10/14/2020] [Accepted: 11/16/2020] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND FAIR (Findability, Accessibility, Interoperability, and Reusability) next-generation sequencing (NGS) data analysis relies on complex computational biology workflows and pipelines to guarantee reproducibility, portability, and scalability. Moreover, workflow languages, managers, and container technologies have helped address the problem of data analysis pipeline execution across multiple platforms in scalable ways. FINDINGS Here, we present a project management framework for NGS data analysis called PM4NGS. This framework is composed of an automatic creation of a standard organizational structure of directories and files, bioinformatics tool management using Docker or Bioconda, and data analysis pipelines in CWL format. Pre-configured Jupyter notebooks with minimum Python code are included in PM4NGS to produce a project report and publication-ready figures. We present 3 pipelines for demonstration purposes including the analysis of RNA-Seq, ChIP-Seq, and ChIP-exo datasets. CONCLUSIONS PM4NGS is an open source framework that creates a standard organizational structure for NGS data analysis projects. PM4NGS is easy to install, configure, and use by non-bioinformaticians on personal computers and laptops. It permits execution of the NGS data analysis on Windows 10 with the Windows Subsystem for Linux feature activated. The framework aims to reduce the gap between researcher in experimental laboratories producing NGS data and workflows for data analysis. PM4NGS documentation can be accessed at https://pm4ngs.readthedocs.io/.
Collapse
Affiliation(s)
- Roberto Vera Alvarez
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, 8900 Rockville Pike, NIH, Bethesda, MD 20894, USA
| | - Lorinc Pongor
- Developmental Therapeutics Branch and Laboratory of Molecular Pharmacology, Center for Cancer Research, National Cancer Institute, 8900 Rockville Pike, NIH, Bethesda, MD 20894, USA
| | - Leonardo Mariño-Ramírez
- Division of Intramural Research, National Institute on Minority Health and Health Disparities, 8900 Rockville Pike, NIH, Bethesda, MD 20894, USA
| | - David Landsman
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, 8900 Rockville Pike, NIH, Bethesda, MD 20894, USA
| |
Collapse
|
11
|
Choudhary KS, Kleinmanns JA, Decker K, Sastry AV, Gao Y, Szubin R, Seif Y, Palsson BO. Elucidation of Regulatory Modes for Five Two-Component Systems in Escherichia coli Reveals Novel Relationships. mSystems 2020; 5:e00980-20. [PMID: 33172971 PMCID: PMC7657598 DOI: 10.1128/msystems.00980-20] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Accepted: 10/20/2020] [Indexed: 11/27/2022] Open
Abstract
Escherichia coli uses two-component systems (TCSs) to respond to environmental signals. TCSs affect gene expression and are parts of E. coli's global transcriptional regulatory network (TRN). Here, we identified the regulons of five TCSs in E. coli MG1655: BaeSR and CpxAR, which were stimulated by ethanol stress; KdpDE and PhoRB, induced by limiting potassium and phosphate, respectively; and ZraSR, stimulated by zinc. We analyzed RNA-seq data using independent component analysis (ICA). ChIP-exo data were used to validate condition-specific target gene binding sites. Based on these data, we do the following: (i) identify the target genes for each TCS; (ii) show how the target genes are transcribed in response to stimulus; and (iii) reveal novel relationships between TCSs, which indicate noncognate inducers for various response regulators, such as BaeR to iron starvation, CpxR to phosphate limitation, and PhoB and ZraR to cell envelope stress. Our understanding of the TRN in E. coli is thus notably expanded.IMPORTANCE E. coli is a common commensal microbe found in the human gut microenvironment; however, some strains cause diseases like diarrhea, urinary tract infections, and meningitis. E. coli's two-component systems (TCSs) modulate target gene expression, especially related to virulence, pathogenesis, and antimicrobial peptides, in response to environmental stimuli. Thus, it is of utmost importance to understand the transcriptional regulation of TCSs to infer bacterial environmental adaptation and disease pathogenicity. Utilizing a combinatorial approach integrating RNA sequencing (RNA-seq), independent component analysis, chromatin immunoprecipitation coupled with exonuclease treatment (ChIP-exo), and data mining, we suggest five different modes of TCS transcriptional regulation. Our data further highlight noncognate inducers of TCSs, which emphasizes the cross-regulatory nature of TCSs in E. coli and suggests that TCSs may have a role beyond their cognate functionalities. In summary, these results can lead to an understanding of the metabolic capabilities of bacteria and correctly predict complex phenotype under diverse conditions, especially when further incorporated with genome-scale metabolic models.
Collapse
Affiliation(s)
- Kumari Sonal Choudhary
- Department of Bioengineering, University of California, San Diego, San Diego, California, USA
| | - Julia A Kleinmanns
- Department of Bioengineering, University of California, San Diego, San Diego, California, USA
| | - Katherine Decker
- Department of Bioengineering, University of California, San Diego, San Diego, California, USA
| | - Anand V Sastry
- Department of Bioengineering, University of California, San Diego, San Diego, California, USA
| | - Ye Gao
- Department of Bioengineering, University of California, San Diego, San Diego, California, USA
| | - Richard Szubin
- Department of Bioengineering, University of California, San Diego, San Diego, California, USA
| | - Yara Seif
- Department of Bioengineering, University of California, San Diego, San Diego, California, USA
| | - Bernhard O Palsson
- Department of Bioengineering, University of California, San Diego, San Diego, California, USA
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
12
|
Sharma V, Majumdar S. Comparative analysis of ChIP-exo peak-callers: impact of data quality, read duplication and binding subtypes. BMC Bioinformatics 2020; 21:65. [PMID: 32085702 PMCID: PMC7035708 DOI: 10.1186/s12859-020-3403-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Accepted: 02/10/2020] [Indexed: 01/26/2023] Open
Abstract
Background ChIP (Chromatin immunoprecipitation)-exo has emerged as an important and versatile improvement over conventional ChIP-seq as it reduces the level of noise, maps the transcription factor (TF) binding location in a very precise manner, upto single base-pair resolution, and enables binding mode prediction. Availability of numerous peak-callers for analyzing ChIP-exo reads has motivated the need to assess their performance and report which tool executes reasonably well for the task. Results This study has focussed on comparing peak-callers that report direct binding events with those that report indirect binding events. The effect of strandedness of reads and duplication of data on the performance of peak-callers has been investigated. The number of peaks reported by each peak-caller is compared followed by a comparison of the annotated motifs present in the reported peaks. The significance of peaks is assessed based on the presence of a motif in top peaks. Indirect binding tools have been compared on the basis of their ability to identify annotated motifs and predict mode of protein-DNA interaction. Conclusion By studying the output of the peak-callers investigated in this study, it is concluded that the tools that use self-learning algorithms, i.e. the tools that estimate all the essential parameters from the aligned reads, perform better than the algorithms which require formation of peak-pairs. The latest tools that account for indirect binding of TFs appear to be an upgrade over the available tools, as they are able to reveal valuable information about the mode of binding in addition to direct binding. Furthermore, the quality of ChIP-exo reads have important consequences on the output of data analysis.
Collapse
Affiliation(s)
- Vasudha Sharma
- Discipline of Biological Engineering, Indian Institute of Technology Gandhinagar, Palaj, Gujarat, 382355, India
| | - Sharmistha Majumdar
- Discipline of Biological Engineering, Indian Institute of Technology Gandhinagar, Palaj, Gujarat, 382355, India.
| |
Collapse
|
13
|
Yamada N, Kuntala PK, Pugh BF, Mahony S. ChExMix: A Method for Identifying and Classifying Protein-DNA Interaction Subtypes. J Comput Biol 2020; 27:429-435. [PMID: 32023130 DOI: 10.1089/cmb.2019.0466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Regulatory proteins can employ multiple direct and indirect modes of interaction with the genome. The ChIP-exo mixture model (ChExMix) provides a principled approach to detecting multiple protein-DNA interaction modes in a single ChIP-exo experiment. ChExMix discovers and characterizes binding event subtypes in ChIP-exo data by leveraging both protein-DNA cross-linking signatures and DNA motifs. In this study, we present a summary of the major features and applications of ChExMix. We demonstrate that ChExMix does not require high-resolution protein-DNA binding assay data to detect binding event subtypes. Specifically, we apply ChExMix to analyze 393 ChIP-seq data profiles in K562 cells. Similar binding event subtypes are discovered across multiple proteins, suggesting the existence of colocalized regulatory protein modules that are recruited to DNA through a particular sequence-specific transcription factor. Our results thus suggest that ChExMix can characterize protein-DNA binding interaction modes using data from multiple types of protein-DNA interaction assays.
Collapse
Affiliation(s)
- Naomi Yamada
- Center for Eukaryotic Gene Regulation, Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania
| | - Prashant Kumar Kuntala
- Center for Eukaryotic Gene Regulation, Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania
| | - B Franklin Pugh
- Center for Eukaryotic Gene Regulation, Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania
| | - Shaun Mahony
- Center for Eukaryotic Gene Regulation, Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania
| |
Collapse
|
14
|
Ma T, Ye Z, Wang L. Genome Wide Approaches to Identify Protein-DNA Interactions. Curr Med Chem 2020; 26:7641-7654. [PMID: 29848263 DOI: 10.2174/0929867325666180530115711] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2017] [Revised: 02/27/2018] [Accepted: 05/11/2018] [Indexed: 12/15/2022]
Abstract
BACKGROUND Transcription factors are DNA-binding proteins that play key roles in many fundamental biological processes. Unraveling their interactions with DNA is essential to identify their target genes and understand the regulatory network. Genome-wide identification of their binding sites became feasible thanks to recent progress in experimental and computational approaches. ChIP-chip, ChIP-seq, and ChIP-exo are three widely used techniques to demarcate genome-wide transcription factor binding sites. OBJECTIVE This review aims to provide an overview of these three techniques including their experiment procedures, computational approaches, and popular analytic tools. CONCLUSION ChIP-chip, ChIP-seq, and ChIP-exo have been the major techniques to study genome- wide in vivo protein-DNA interaction. Due to the rapid development of next-generation sequencing technology, array-based ChIP-chip is deprecated and ChIP-seq has become the most widely used technique to identify transcription factor binding sites in genome-wide. The newly developed ChIP-exo further improves the spatial resolution to single nucleotide. Numerous tools have been developed to analyze ChIP-chip, ChIP-seq and ChIP-exo data. However, different programs may employ different mechanisms or underlying algorithms thus each will inherently include its own set of statistical assumption and bias. So choosing the most appropriate analytic program for a given experiment needs careful considerations. Moreover, most programs only have command line interface so their installation and usage will require basic computation expertise in Unix/Linux.
Collapse
Affiliation(s)
- Tao Ma
- Division of Biomedical Statistics and Informatics, Mayo Clinic College of Medicine, Rochester, MN 55905, United States
| | - Zhenqing Ye
- Division of Biomedical Statistics and Informatics, Mayo Clinic College of Medicine, Rochester, MN 55905, United States
| | - Liguo Wang
- Division of Biomedical Statistics and Informatics, Mayo Clinic College of Medicine, Rochester, MN 55905, United States
| |
Collapse
|
15
|
Börlin CS, Bergenholm D, Holland P, Nielsen J. A bioinformatic pipeline to analyze ChIP-exo datasets. Biol Methods Protoc 2019; 4:bpz011. [PMID: 32395628 PMCID: PMC7200897 DOI: 10.1093/biomethods/bpz011] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Revised: 07/17/2019] [Accepted: 07/24/2019] [Indexed: 11/16/2022] Open
Abstract
The decrease of sequencing cost in the recent years has made genome-wide studies of transcription factor (TF) binding through chromatin immunoprecipitation methods like ChIP-seq and chromatin immunoprecipitation with lambda exonuclease (ChIP-exo) more accessible to a broader group of users. Especially with ChIP-exo, it is now possible to map TF binding sites in more detail and with less noise than previously possible. These improvements came at the cost of making the analysis of the data more challenging, which is further complicated by the fact that to this date no complete pipeline is publicly available. Here we present a workflow developed specifically for ChIP-exo data and demonstrate its capabilities for data analysis. The pipeline, which is completely publicly available on GitHub, includes all necessary analytical steps to obtain a high confidence list of TF targets starting from raw sequencing reads. During the pipeline development, we emphasized the inclusion of different quality control measurements and we show how to use these so users can have confidence in their obtained results.
Collapse
Affiliation(s)
- Christoph S Börlin
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, SE-41296, Sweden
| | - David Bergenholm
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, SE-41296, Sweden
| | - Petter Holland
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, SE-41296, Sweden
| | - Jens Nielsen
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, SE-41296, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Chalmers University of Technology, Gothenburg, SE-41296, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kgs. Lyngby, DK-2800, Denmark
- BioInnovation Institute, Ole Maaløes Vej 3, DK2200, Copenhagen N, Denmark
| |
Collapse
|
16
|
Chow KT, Driscoll C, Loo YM, Knoll M, Gale M. IRF5 regulates unique subset of genes in dendritic cells during West Nile virus infection. J Leukoc Biol 2018; 105:411-425. [PMID: 30457675 DOI: 10.1002/jlb.ma0318-136rrr] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Revised: 10/14/2018] [Accepted: 10/17/2018] [Indexed: 01/08/2023] Open
Abstract
Pathogen recognition receptor (PRR) signaling is critical for triggering innate immune activation and the expression of immune response genes, including genes that impart restriction against virus replication. RIG-I-like receptors and TLRs are PRRs that signal immune activation and drive the expression of antiviral genes and the production of type I IFN leading to induction of IFN-stimulated genes, in part through the interferon regulatory factor (IRF) family of transcription factors. Previous studies with West Nile virus (WNV) showed that IRF3 and IRF7 regulate IFN expression in fibroblasts and neurons, whereas macrophages and dendritic cells (DCs) retained the ability to induce IFN-β in the absence of IRF3 and IRF7 in a manner implicating IRF5 in PRR signaling actions. Here we assessed the contribution of IRF5 to immune gene induction in response to WNV infection in DCs. We examined IRF5-dependent gene expression and found that loss of IRF5 in mice resulted in modest and subtle changes in the expression of WNV-regulated genes. Anti-IRF5 chromatin immunoprecipitation with next-generation sequencing of genomic DNA coupled with mRNA analysis revealed unique IRF5 binding motifs within the mouse genome that are distinct from the canonical IRF binding motif and that link with IRF5-target gene expression. Using integrative bioinformatics analyses, we identified new IRF5 primary target genes in DCs in response to virus infection. This study provides novel insights into the distinct and unique innate immune and immune gene regulatory program directed by IRF5.
Collapse
Affiliation(s)
- Kwan T Chow
- Department of Immunology, Center for Innate Immunity and Immune Disease, University of Washington, Seattle, Washington, USA.,Department of Biomedical Sciences, City University of Hong Kong, Hong Kong SAR, China
| | - Connor Driscoll
- Department of Immunology, Center for Innate Immunity and Immune Disease, University of Washington, Seattle, Washington, USA
| | - Yueh-Ming Loo
- Department of Immunology, Center for Innate Immunity and Immune Disease, University of Washington, Seattle, Washington, USA
| | - Megan Knoll
- Department of Immunology, Center for Innate Immunity and Immune Disease, University of Washington, Seattle, Washington, USA
| | - Michael Gale
- Department of Immunology, Center for Innate Immunity and Immune Disease, University of Washington, Seattle, Washington, USA
| |
Collapse
|
17
|
Bergenholm D, Liu G, Holland P, Nielsen J. Reconstruction of a Global Transcriptional Regulatory Network for Control of Lipid Metabolism in Yeast by Using Chromatin Immunoprecipitation with Lambda Exonuclease Digestion. mSystems 2018; 3:e00215-17. [PMID: 30073202 DOI: 10.1128/mSystems.00215-17] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Accepted: 07/04/2018] [Indexed: 11/20/2022] Open
Abstract
To build transcription regulatory networks, transcription factor binding must be analyzed in cells grown under different conditions because their responses and targets differ depending on environmental conditions. We performed whole-genome analysis of the DNA binding of five Saccharomyces cerevisiae transcription factors involved in lipid metabolism, Ino2, Ino4, Hap1, Oaf1, and Pip2, in response to four different environmental conditions in chemostat cultures, which allowed us to keep the specific growth rate constant. Chromatin immunoprecipitation with lambda exonuclease digestion (ChIP-exo) enabled the detection of binding events at a high resolution. We discovered a large number of unidentified targets and thus expanded functions for each transcription factor (e.g., glutamate biosynthesis as a target of Oaf1 and Pip2). Moreover, condition-dependent binding of transcription factors in response to cell metabolic state (e.g., differential binding of Ino2 between fermentative and respiratory metabolic conditions) was clearly suggested. Combining the new binding data with previously published data from transcription factor deletion studies revealed the high complexity of the transcriptional regulatory network for lipid metabolism in yeast, which involves the combinatorial and complementary regulation by multiple transcription factors. We anticipate that our work will provide insights into transcription factor binding dynamics that will prove useful for the understanding of transcription regulatory networks. IMPORTANCE Transcription factors play a crucial role in the regulation of gene expression and adaptation to different environments. To better understand the underlying roles of these adaptations, we performed experiments that give us high-resolution binding of transcription factors to their targets. We investigated five transcription factors involved in lipid metabolism in yeast, and we discovered multiple novel targets and condition-specific responses that allow us to draw a better regulatory map of the lipid metabolism.
Collapse
|
18
|
García-Molinero V, García-Martínez J, Reja R, Furió-Tarí P, Antúnez O, Vinayachandran V, Conesa A, Pugh BF, Pérez-Ortín JE, Rodríguez-Navarro S. The SAGA/TREX-2 subunit Sus1 binds widely to transcribed genes and affects mRNA turnover globally. Epigenetics Chromatin 2018; 11:13. [PMID: 29598828 PMCID: PMC5875001 DOI: 10.1186/s13072-018-0184-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Accepted: 03/23/2018] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Eukaryotic transcription is regulated through two complexes, the general transcription factor IID (TFIID) and the coactivator Spt-Ada-Gcn5 acetyltransferase (SAGA). Recent findings confirm that both TFIID and SAGA contribute to the synthesis of nearly all transcripts and are recruited genome-wide in yeast. However, how this broad recruitment confers selectivity under specific conditions remains an open question. RESULTS Here we find that the SAGA/TREX-2 subunit Sus1 associates with upstream regulatory regions of many yeast genes and that heat shock drastically changes Sus1 binding. While Sus1 binding to TFIID-dominated genes is not affected by temperature, its recruitment to SAGA-dominated genes and RP genes is significantly disturbed under heat shock, with Sus1 relocated to environmental stress-responsive genes in these conditions. Moreover, in contrast to recent results showing that SAGA deubiquitinating enzyme Ubp8 is dispensable for RNA synthesis, genomic run-on experiments demonstrate that Sus1 contributes to synthesis and stability of a wide range of transcripts. CONCLUSIONS Our study provides support for a model in which SAGA/TREX-2 factor Sus1 acts as a global transcriptional regulator in yeast but has differential activity at yeast genes as a function of their transcription rate or during stress conditions.
Collapse
Affiliation(s)
- Varinia García-Molinero
- Gene Expression and RNA Metabolism Laboratory, Centro de Investigación Príncipe Felipe (CIPF), Eduardo Primo Yúfera 3, 46012, Valencia, Spain.,Inserm Avenir: 'Biology of Repetitive Sequences'-Institute of Human Genetics, CNRS UPR1142, Montpellier, France
| | - José García-Martínez
- Departamento de Genética and E.R.I. Biotecmed, Facultad de Biología, Universitat de València, C/Dr. Moliner 50, 46100, Burjassot, Spain
| | - Rohit Reja
- Department of Biochemistry and Molecular Biology, Center for Eukaryotic Gene Regulation, The Pennsylvania State University, Pennsylvania, PA, 16802, USA.,Genentech Inc., South San Francisco, CA, USA
| | - Pedro Furió-Tarí
- Genomics of Gene Expression Laboratory, Centro de Investigación Príncipe Felipe (CIPF), Eduardo Primo Yúfera 3, 46012, Valencia, Spain
| | - Oreto Antúnez
- Departamento de Bioquímica y Biología Molecular and E.R.I. Biotecmed, Facultad de Biología, Universitat de València, C/Dr. Moliner 50, 46100, Burjassot, Spain
| | - Vinesh Vinayachandran
- Department of Biochemistry and Molecular Biology, Center for Eukaryotic Gene Regulation, The Pennsylvania State University, Pennsylvania, PA, 16802, USA
| | - Ana Conesa
- Genomics of Gene Expression Laboratory, Centro de Investigación Príncipe Felipe (CIPF), Eduardo Primo Yúfera 3, 46012, Valencia, Spain.,Microbiology and Cell Science Department, Institute for Food and Agricultural Sciences, University of Florida, P.O. Box 110700, Gainesville, FL, 32611-0700, USA.,Genetics Institute, University of Florida, 2033 Mowry Road, Gainesville, FL, 32610, USA
| | - B Franklin Pugh
- Department of Biochemistry and Molecular Biology, Center for Eukaryotic Gene Regulation, The Pennsylvania State University, Pennsylvania, PA, 16802, USA
| | - José E Pérez-Ortín
- Departamento de Bioquímica y Biología Molecular and E.R.I. Biotecmed, Facultad de Biología, Universitat de València, C/Dr. Moliner 50, 46100, Burjassot, Spain
| | - Susana Rodríguez-Navarro
- Gene Expression and RNA Metabolism Laboratory, Instituto de Biomedicina de Valencia, Consejo Superior de Investigaciones Científicas (CSIC), Jaime Roig 11, 46010, Valencia, Spain. .,Gene Expression and RNA Metabolism Laboratory, Centro de Investigación Príncipe Felipe (CIPF), Eduardo Primo Yúfera 3, 46012, Valencia, Spain.
| |
Collapse
|
19
|
Zhou X, Yan Q, Wang N. Deciphering the regulon of a GntR family regulator via transcriptome and ChIP-exo analyses and its contribution to virulence in Xanthomonas citri. Mol Plant Pathol 2017; 18:249-262. [PMID: 26972728 PMCID: PMC6638223 DOI: 10.1111/mpp.12397] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2015] [Revised: 02/08/2016] [Accepted: 03/07/2016] [Indexed: 05/14/2023]
Abstract
Xanthomonas contains a large group of plant-associated species, many of which cause severe diseases on important crops worldwide. Six gluconate-operon repressor (GntR) family transcriptional regulators are predicted in Xanthomonas, one of which, belonging to the YtrA subfamily, plays a prominent role in bacterial virulence. However, the direct targets and comprehensive regulatory profile of YtrA remain unknown. Here, we performed microarray and high-resolution chromatin immunoprecipitation-exonuclease (ChIP-exo) experiments to identify YtrA direct targets and its DNA binding motif in X. citri ssp. citri (Xac), the causal agent of citrus canker. Integrative microarray and ChIP-exo data analysis revealed that YtrA directly regulates three operons by binding to a palindromic motif GGTG-N16 -CACC at the promoter region. A similar palindromic motif and YtrA homologues were also identified in many other bacteria, including Stenotrophomonas, Pseudoxanthomonas and Frateuria, indicating a widespread phenomenon. Deletion of ytrA in Xac abolishes bacterial virulence and induction of the hypersensitive response (HR). We found that YtrA regulates the expression of hrp/hrc genes encoding the bacterial type III secretion system (T3SS) and controls multiple biological processes, including motility and adhesion, oxidative stress, extracellular enzyme production and iron uptake. YtrA represses the expression of its direct targets in artificial medium or in planta. Importantly, over-expression of yro3, one of the YtrA directly regulated operons which contains trmL and XAC0231, induced weaker canker symptoms and down-regulation of hrp/hrc gene expression, suggesting a negative regulation in Xac virulence and T3SS. Our study has significantly advanced the mechanistic understanding of YtrA regulation and its contribution to bacterial virulence.
Collapse
Affiliation(s)
- Xiaofeng Zhou
- Citrus Research and Education CenterDepartment of Microbiology and Cell Science, IFAS, University of Florida700 Experiment Station RoadLake AlfredFL33850USA
| | - Qing Yan
- Citrus Research and Education CenterDepartment of Microbiology and Cell Science, IFAS, University of Florida700 Experiment Station RoadLake AlfredFL33850USA
| | - Nian Wang
- Citrus Research and Education CenterDepartment of Microbiology and Cell Science, IFAS, University of Florida700 Experiment Station RoadLake AlfredFL33850USA
| |
Collapse
|
20
|
Rhee HS, Closser M, Guo Y, Bashkirova EV, Tan GC, Gifford DK, Wichterle H. Expression of Terminal Effector Genes in Mammalian Neurons Is Maintained by a Dynamic Relay of Transient Enhancers. Neuron 2016; 92:1252-1265. [PMID: 27939581 PMCID: PMC5193225 DOI: 10.1016/j.neuron.2016.11.037] [Citation(s) in RCA: 57] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2016] [Revised: 10/31/2016] [Accepted: 11/17/2016] [Indexed: 11/22/2022]
Abstract
Generic spinal motor neuron identity is established by cooperative binding of programming transcription factors (TFs), Isl1 and Lhx3, to motor-neuron-specific enhancers. How expression of effector genes is maintained following downregulation of programming TFs in maturing neurons remains unknown. High-resolution exonuclease (ChIP-exo) mapping revealed that the majority of enhancers established by programming TFs are rapidly deactivated following Lhx3 downregulation in stem-cell-derived hypaxial motor neurons. Isl1 is released from nascent motor neuron enhancers and recruited to new enhancers bound by clusters of Onecut1 in maturing neurons. Synthetic enhancer reporter assays revealed that Isl1 operates as an integrator factor, translating the density of Lhx3 or Onecut1 binding sites into transient enhancer activity. Importantly, independent Isl1/Lhx3- and Isl1/Onecut1-bound enhancers contribute to sustained expression of motor neuron effector genes, demonstrating that outwardly stable expression of terminal effector genes in postmitotic neurons is controlled by a dynamic relay of stage-specific enhancers.
Collapse
Affiliation(s)
- Ho Sung Rhee
- Departments of Pathology and Cell Biology, Neuroscience, and Neurology, Center for Motor Neuron Biology and Disease, Columbia Stem Cell Initiative, Columbia University Medical Center, New York, NY 10032, USA
| | - Michael Closser
- Departments of Pathology and Cell Biology, Neuroscience, and Neurology, Center for Motor Neuron Biology and Disease, Columbia Stem Cell Initiative, Columbia University Medical Center, New York, NY 10032, USA
| | - Yuchun Guo
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Elizaveta V Bashkirova
- Departments of Pathology and Cell Biology, Neuroscience, and Neurology, Center for Motor Neuron Biology and Disease, Columbia Stem Cell Initiative, Columbia University Medical Center, New York, NY 10032, USA
| | - G Christopher Tan
- Departments of Pathology and Cell Biology, Neuroscience, and Neurology, Center for Motor Neuron Biology and Disease, Columbia Stem Cell Initiative, Columbia University Medical Center, New York, NY 10032, USA
| | - David K Gifford
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Hynek Wichterle
- Departments of Pathology and Cell Biology, Neuroscience, and Neurology, Center for Motor Neuron Biology and Disease, Columbia Stem Cell Initiative, Columbia University Medical Center, New York, NY 10032, USA.
| |
Collapse
|
21
|
Hansen P, Hecht J, Ibn-Salem J, Menkuec BS, Roskosch S, Truss M, Robinson PN. Q-nexus: a comprehensive and efficient analysis pipeline designed for ChIP-nexus. BMC Genomics 2016; 17:873. [PMID: 27814676 PMCID: PMC5097360 DOI: 10.1186/s12864-016-3164-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Accepted: 10/12/2016] [Indexed: 12/22/2022] Open
Abstract
Background ChIP-nexus, an extension of the ChIP-exo protocol, can be used to map the borders of protein-bound DNA sequences at nucleotide resolution, requires less input DNA and enables selective PCR duplicate removal using random barcodes. However, the use of random barcodes requires additional preprocessing of the mapping data, which complicates the computational analysis. To date, only a very limited number of software packages are available for the analysis of ChIP-exo data, which have not yet been systematically tested and compared on ChIP-nexus data. Results Here, we present a comprehensive software package for ChIP-nexus data that exploits the random barcodes for selective removal of PCR duplicates and for quality control. Furthermore, we developed bespoke methods to estimate the width of the protected region resulting from protein-DNA binding and to infer binding positions from ChIP-nexus data. Finally, we applied our peak calling method as well as the two other methods MACE and MACS2 to the available ChIP-nexus data. Conclusions The Q-nexus software is efficient and easy to use. Novel statistics about duplication rates in consideration of random barcodes are calculated. Our method for the estimation of the width of the protected region yields unbiased signatures that are highly reproducible for biological replicates and at the same time very specific for the respective factors analyzed. As judged by the irreproducible discovery rate (IDR), our peak calling algorithm shows a substantially better reproducibility. An implementation of Q-nexus is available at http://charite.github.io/Q/. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3164-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Peter Hansen
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, Berlin, 13353, Germany.,Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, Berlin, 13353, Germany
| | - Jochen Hecht
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Jonas Ibn-Salem
- Faculty of Biology, Johannes Gutenberg University Mainz, Ackermannweg 4, Mainz, 55128, Germany.,Institute of Molecular Biology, Ackermannweg 4, Mainz, 55128, Germany
| | - Benjamin S Menkuec
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, Berlin, 13353, Germany
| | - Sebastian Roskosch
- Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 14, Berlin, 14195, Germany
| | - Matthias Truss
- Labor für Pädiatrische Molekularbiologie, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, Berlin, 13353, Germany
| | - Peter N Robinson
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, Berlin, 13353, Germany. .,Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, Berlin, 13353, Germany. .,Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 14, Berlin, 14195, Germany. .,Max Planck Institute for Molecular Genetics, Inhestr. 63-73, Berlin, 14195, Germany. .,Current address: The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, 06032, CT, USA.
| |
Collapse
|
22
|
Uusküla-Reimand L, Hou H, Samavarchi-Tehrani P, Rudan MV, Liang M, Medina-Rivera A, Mohammed H, Schmidt D, Schwalie P, Young EJ, Reimand J, Hadjur S, Gingras AC, Wilson MD. Topoisomerase II beta interacts with cohesin and CTCF at topological domain borders. Genome Biol 2016; 17:182. [PMID: 27582050 PMCID: PMC5006368 DOI: 10.1186/s13059-016-1043-8] [Citation(s) in RCA: 147] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2015] [Accepted: 08/10/2016] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Type II DNA topoisomerases (TOP2) regulate DNA topology by generating transient double stranded breaks during replication and transcription. Topoisomerase II beta (TOP2B) facilitates rapid gene expression and functions at the later stages of development and differentiation. To gain new insight into the genome biology of TOP2B, we used proteomics (BioID), chromatin immunoprecipitation, and high-throughput chromosome conformation capture (Hi-C) to identify novel proximal TOP2B protein interactions and characterize the genomic landscape of TOP2B binding at base pair resolution. RESULTS Our human TOP2B proximal protein interaction network included members of the cohesin complex and nucleolar proteins associated with rDNA biology. TOP2B associates with DNase I hypersensitivity sites, allele-specific transcription factor (TF) binding, and evolutionarily conserved TF binding sites on the mouse genome. Approximately half of all CTCF/cohesion-bound regions coincided with TOP2B binding. Base pair resolution ChIP-exo mapping of TOP2B, CTCF, and cohesin sites revealed a striking structural ordering of these proteins along the genome relative to the CTCF motif. These ordered TOP2B-CTCF-cohesin sites flank the boundaries of topologically associating domains (TADs) with TOP2B positioned externally and cohesin internally to the domain loop. CONCLUSIONS TOP2B is positioned to solve topological problems at diverse cis-regulatory elements and its occupancy is a highly ordered and prevalent feature of CTCF/cohesin binding sites that flank TADs.
Collapse
Affiliation(s)
- Liis Uusküla-Reimand
- Genetics and Genome Biology Program, SickKids Research Institute, Toronto, ON Canada
- Department of Gene Technology, Tallinn University of Technology, Tallinn, Estonia
| | - Huayun Hou
- Genetics and Genome Biology Program, SickKids Research Institute, Toronto, ON Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON Canada
| | | | - Matteo Vietri Rudan
- Research Department of Cancer Biology, Cancer Institute, University College London, London, UK
| | - Minggao Liang
- Genetics and Genome Biology Program, SickKids Research Institute, Toronto, ON Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON Canada
| | - Alejandra Medina-Rivera
- Genetics and Genome Biology Program, SickKids Research Institute, Toronto, ON Canada
- Present address: International Laboratory for Research in Human Genomics, Universidad Nacional Autónoma de México, Juriquilla, Querétaro Mexico
| | - Hisham Mohammed
- Cancer Research UK, Cambridge Institute, University of Cambridge, Cambridge, UK
- Present address: The Babraham Institute, Cambridge, UK
| | - Dominic Schmidt
- Cancer Research UK, Cambridge Institute, University of Cambridge, Cambridge, UK
- Present address: Syncona Partners LLP, London, UK
| | - Petra Schwalie
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
- Present address: Laboratory of Systems Biology and Genetics, Lausanne, Switzerland
| | - Edwin J. Young
- Genetics and Genome Biology Program, SickKids Research Institute, Toronto, ON Canada
| | - Jüri Reimand
- Ontario Institute for Cancer Research, Toronto, ON Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON Canada
| | - Suzana Hadjur
- Research Department of Cancer Biology, Cancer Institute, University College London, London, UK
| | - Anne-Claude Gingras
- Department of Molecular Genetics, University of Toronto, Toronto, ON Canada
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, ON Canada
| | - Michael D. Wilson
- Genetics and Genome Biology Program, SickKids Research Institute, Toronto, ON Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON Canada
| |
Collapse
|
23
|
Madrigal P. On Accounting for Sequence-Specific Bias in Genome-Wide Chromatin Accessibility Experiments: Recent Advances and Contradictions. Front Bioeng Biotechnol 2015; 3:144. [PMID: 26442258 PMCID: PMC4585268 DOI: 10.3389/fbioe.2015.00144] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2015] [Accepted: 09/07/2015] [Indexed: 11/13/2022] Open
Affiliation(s)
- Pedro Madrigal
- Wellcome Trust Sanger Institute , Cambridge , UK ; Department of Surgery, University of Cambridge , Cambridge , UK
| |
Collapse
|
24
|
Abstract
Recent advances in experimental and computational methodologies are enabling ultra-high resolution genome-wide profiles of protein-DNA binding events. For example, the ChIP-exo protocol precisely characterizes protein-DNA cross-linking patterns by combining chromatin immunoprecipitation (ChIP) with 5' → 3' exonuclease digestion. Similarly, deeply sequenced chromatin accessibility assays (e.g. DNase-seq and ATAC-seq) enable the detection of protected footprints at protein-DNA binding sites. With these techniques and others, we have the potential to characterize the individual nucleotides that interact with transcription factors, nucleosomes, RNA polymerases and other regulatory proteins in a particular cellular context. In this review, we explain the experimental assays and computational analysis methods that enable high-resolution profiling of protein-DNA binding events. We discuss the challenges and opportunities associated with such approaches.
Collapse
Affiliation(s)
- Shaun Mahony
- a Department of Biochemistry & Molecular Biology , Center for Eukaryotic Gene Regulation, The Pennsylvania State University , University Park , PA , USA
| | - B Franklin Pugh
- a Department of Biochemistry & Molecular Biology , Center for Eukaryotic Gene Regulation, The Pennsylvania State University , University Park , PA , USA
| |
Collapse
|
25
|
Matteau D, Rodrigue S. Precise Identification of DNA-Binding Proteins Genomic Location by Exonuclease Coupled Chromatin Immunoprecipitation ( ChIP-exo). Methods Mol Biol 2015; 1334:173-193. [PMID: 26404150 DOI: 10.1007/978-1-4939-2877-4_11] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
DNA-binding proteins play a crucial role in all living organisms by interacting with various DNA sequences across the genome. While several methods have been used to study the interaction between DNA and proteins in vitro, chromatin immunoprecipitation followed by sequencing (ChIP-seq) has become the standard technique for identifying the genome-wide location of DNA-binding proteins in vivo. However, the resolution of standard ChIP-seq methodology is limited by the DNA fragmentation process and presence of contaminating DNA. A significant improvement of the ChIP-seq technique results from the addition of an exonuclease treatment during the immunoprecipitation step (ChIP-exo) that lowers background noise and more importantly increases the identification of binding sites to a level near to single-base resolution by effectively footprinting DNA-bound proteins. By doing so, ChIP-exo offers new opportunities for a better characterization of the complex and fascinating architecture that resides in DNA-proteins interactions and provides new insights for the comprehension of important molecular mechanisms.
Collapse
Affiliation(s)
- Dominick Matteau
- Département de biologie, Faculté des sciences, Université de Sherbrooke, 2500 boulevard de l'Université, Sherbrooke, QC, Canada, J1K 2R1
| | - Sébastien Rodrigue
- Département de biologie, Faculté des sciences, Université de Sherbrooke, 2500 boulevard de l'Université, Sherbrooke, QC, Canada, J1K 2R1.
| |
Collapse
|
26
|
Carroll TS, Liang Z, Salama R, Stark R, de Santiago I. Impact of artifact removal on ChIP quality metrics in ChIP-seq and ChIP-exo data. Front Genet 2014; 5:75. [PMID: 24782889 PMCID: PMC3989762 DOI: 10.3389/fgene.2014.00075] [Citation(s) in RCA: 144] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2014] [Accepted: 03/24/2014] [Indexed: 01/22/2023] Open
Abstract
With the advent of ChIP-seq multiplexing technologies and the subsequent increase in ChIP-seq throughput, the development of working standards for the quality assessment of ChIP-seq studies has received significant attention. The ENCODE consortium's large scale analysis of transcription factor binding and epigenetic marks as well as concordant work on ChIP-seq by other laboratories has established a new generation of ChIP-seq quality control measures. The use of these metrics alongside common processing steps has however not been evaluated. In this study, we investigate the effects of blacklisting and removal of duplicated reads on established metrics of ChIP-seq quality and show that the interpretation of these metrics is highly dependent on the ChIP-seq preprocessing steps applied. Further to this we perform the first investigation of the use of these metrics for ChIP-exo data and make recommendations for the adaptation of the NSC statistic to allow for the assessment of ChIP-exo efficiency.
Collapse
Affiliation(s)
| | - Ziwei Liang
- Lymphocyte Development, MRC Clinical Sciences Centre, Imperial CollegeLondon, UK
| | - Rafik Salama
- Cambridge Institute CRUK, University of CambridgeCambridge, UK
| | - Rory Stark
- Cambridge Institute CRUK, University of CambridgeCambridge, UK
| | | |
Collapse
|