1
|
Luo X, Zou Q. Identifying the "stripe" transcription factors and cooperative binding related to DNA methylation. Commun Biol 2024; 7:1265. [PMID: 39367138 PMCID: PMC11452537 DOI: 10.1038/s42003-024-06992-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 09/30/2024] [Indexed: 10/06/2024] Open
Abstract
DNA methylation plays a critical role in gene regulation by modulating the DNA binding of transcription factors (TFs). This study integrates TFs' ChIP-seq profiles with WGBS profiles to investigate how DNA methylation affects protein interactions. Statistical methods and a 5-letter DNA motif calling model have been developed to characterize DNA sequences bound by proteins, while considering the effects of DNA modifications. By employing these methods, 79 significant universal "stripe" TFs and cofactors (USFs), 2360 co-binding protein pairs, and distinct protein modules associated with various DNA methylation states have been identified. The USFs hint a regulatory hierarchy within these protein interactions. Proteins preferentially bind to non-CpG sites in methylated regions, indicating binding affinity is not solely CpG-dependent. Proteins involved in methylation-specific USFs and cobinding pairs play essential roles in promoting and sustaining DNA methylation through interacting with DNMTs or inhibiting TET binding. These findings underscore the interplay between protein binding and methylation, offering insights into epigenetic regulation in cellular biology.
Collapse
Affiliation(s)
- Ximei Luo
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.
| |
Collapse
|
2
|
Jhamat N, Guo Y, Han J, Humblot P, Bongcam-Rudloff E, Andersson G, Niazi A. Enrichment of Cis-Acting Regulatory Elements in Differentially Methylated Regions Following Lipopolysaccharide Treatment of Bovine Endometrial Epithelial Cells. Int J Mol Sci 2024; 25:9832. [PMID: 39337320 PMCID: PMC11432661 DOI: 10.3390/ijms25189832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Revised: 09/07/2024] [Accepted: 09/09/2024] [Indexed: 09/30/2024] Open
Abstract
Endometritis is an inflammatory disease that negatively influences fertility and is common in milk-producing cows. An in vitro model for bovine endometrial inflammation was used to identify enrichment of cis-acting regulatory elements in differentially methylated regions (DMRs) in the genome of in vitro-cultured primary bovine endometrial epithelial cells (bEECs) before and after treatment with lipopolysaccharide (LPS) from E. coli, a key player in the development of endometritis. The enriched regulatory elements contain binding sites for transcription factors with established roles in inflammation and hypoxia including NFKB and Hif-1α. We further showed co-localization of certain enriched cis-acting regulatory motifs including ARNT, Hif-1α, and NRF1. Our results show an intriguing interplay between increased mRNA levels in LPS-treated bEECs of the mRNAs encoding the key transcription factors such as AHR, EGR2, and STAT1, whose binding sites were enriched in the DMRs. Our results demonstrate an extraordinary cis-regulatory complexity in these DMRs having binding sites for both inflammatory and hypoxia-dependent transcription factors. Obtained data using this in vitro model for bacterial-induced endometrial inflammation have provided valuable information regarding key transcription factors relevant for clinical endometritis in both cattle and humans.
Collapse
Affiliation(s)
- Naveed Jhamat
- Department of Animal Biosciences, Swedish University of Agricultural Sciences, P.O. Box 7023, SE-75007 Uppsala, Sweden
| | - Yongzhi Guo
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, P.O. Box 7023, SE-75007 Uppsala, Sweden
| | - Jilong Han
- Department of Animal Biosciences, Swedish University of Agricultural Sciences, P.O. Box 7023, SE-75007 Uppsala, Sweden
| | - Patrice Humblot
- Department of Clinical Sciences, Swedish University of Agricultural Sciences, P.O. Box 7023, SE-75007 Uppsala, Sweden
| | - Erik Bongcam-Rudloff
- Department of Animal Biosciences, Swedish University of Agricultural Sciences, P.O. Box 7023, SE-75007 Uppsala, Sweden
- SLU-Global Bioinformatics Centre, Swedish University of Agricultural Sciences, P.O. Box 7023, SE-75007 Uppsala, Sweden
| | - Göran Andersson
- Department of Animal Biosciences, Swedish University of Agricultural Sciences, P.O. Box 7023, SE-75007 Uppsala, Sweden
| | - Adnan Niazi
- Department of Animal Biosciences, Swedish University of Agricultural Sciences, P.O. Box 7023, SE-75007 Uppsala, Sweden
- SLU-Global Bioinformatics Centre, Swedish University of Agricultural Sciences, P.O. Box 7023, SE-75007 Uppsala, Sweden
| |
Collapse
|
3
|
Yu X, Zhou J, Ye W, Xu J, Li R, Huang L, Chai Y, Wen M, Xu S, Zhou Y. Time-course swRNA-seq uncovers a hierarchical gene regulatory network in controlling the response-repair-remodeling after wounding. Commun Biol 2024; 7:694. [PMID: 38844830 PMCID: PMC11156874 DOI: 10.1038/s42003-024-06352-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 05/17/2024] [Indexed: 06/09/2024] Open
Abstract
Wounding initiates intricate responses crucial for tissue repair and regeneration. Yet, the gene regulatory networks governing wound healing remain poorly understood. Here, employing single-worm RNA sequencing (swRNA-seq) across 12 time-points, we delineated a three-stage wound repair process in C. elegans: response, repair, and remodeling. Integrating diverse datasets, we constructed a dynamic regulatory network comprising 241 transcription regulators and their inferred targets. We identified potentially seven autoregulatory TFs and five cross-autoregulatory loops involving pqm-1 and jun-1. We revealed that TFs might interact with chromatin factors and form TF-TF combinatory modules via intrinsically disordered regions to enhance response robustness. We experimentally validated six regulators functioning in transcriptional and translocation-dependent manners. Notably, nhr-76, daf-16, nhr-84, and oef-1 are potentially required for efficient repair, while elt-2 may act as an inhibitor. These findings elucidate transcriptional responses and hierarchical regulatory networks during C. elegans wound repair, shedding light on mechanisms underlying tissue repair and regeneration.
Collapse
Affiliation(s)
- Xinghai Yu
- College of Life Sciences, TaiKang Center for Life and Medical Sciences, RNA Institute, Hubei Key Laboratory of Cell Homeostasis, Wuhan University, Wuhan, 430072, China
| | - Jinghua Zhou
- Center for Stem Cell and Regenerative Medicine and Department of Cardiology of the Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Wenkai Ye
- Center for Stem Cell and Regenerative Medicine and Department of Cardiology of the Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Jingxiu Xu
- Center for Stem Cell and Regenerative Medicine and Department of Cardiology of the Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Rui Li
- Institute of Hydrobiology, Chinese Academy of Science, Wuhan, 430072, China
| | - Li Huang
- College of Life Sciences, TaiKang Center for Life and Medical Sciences, RNA Institute, Hubei Key Laboratory of Cell Homeostasis, Wuhan University, Wuhan, 430072, China
| | - Yi Chai
- The Zhejiang University-University of Edinburgh Institute, 718 East Haizhou Rd., Haining, Zhejiang, 314400, China
| | - Miaomiao Wen
- College of Life Sciences, TaiKang Center for Life and Medical Sciences, RNA Institute, Hubei Key Laboratory of Cell Homeostasis, Wuhan University, Wuhan, 430072, China
| | - Suhong Xu
- Center for Stem Cell and Regenerative Medicine and Department of Cardiology of the Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, China.
- The Zhejiang University-University of Edinburgh Institute, 718 East Haizhou Rd., Haining, Zhejiang, 314400, China.
| | - Yu Zhou
- College of Life Sciences, TaiKang Center for Life and Medical Sciences, RNA Institute, Hubei Key Laboratory of Cell Homeostasis, Wuhan University, Wuhan, 430072, China.
- Frontier Science Center for Immunology and Metabolism, Wuhan University, Wuhan, 430072, China.
- State Key Laboratory of Virology, Wuhan University, Wuhan, 430072, China.
- Institute for Advanced Studies, Wuhan University, Wuhan, 430072, China.
| |
Collapse
|
4
|
Khullar S, Huang X, Ramesh R, Svaren J, Wang D. NetREm: Network Regression Embeddings reveal cell-type transcription factor coordination for gene regulation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.10.25.563769. [PMID: 37961577 PMCID: PMC10634989 DOI: 10.1101/2023.10.25.563769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Transcription factor (TF) coordination plays a key role in target gene (TG) regulation via protein-protein interactions (PPIs) and DNA co-binding to regulatory elements. Single-cell technologies facilitate gene expression measurement for individual cells and cell-type identification, yet the connection between TF coordination and TG regulation of various cell types remains unclear. To address this, we have developed a novel computational approach, Network Regression Embeddings (NetREm), to reveal cell-type TF-TF coordination activities for TG regulation. NetREm leverages network-constrained regularization using prior knowledge of direct and/or indirect PPIs among TFs to analyze single-cell gene expression data. We test NetREm by simulation data and benchmark its performance in 4 real-world applications that have gold standard TF-TG networks available: mouse (mESCs) and simulated human (hESCs) embryonic stem (ESCs), human hematopoietic stem (HSCs), and mouse dendritic (mDCs) cells. Further, we use NetREm to prioritize valid novel TF-TF coordination links in human Peripheral Blood Mononuclear cell (PBMC) sub-types. We apply NetREm to analyze various cell types in both central (CNS) and peripheral (PNS) nerve system (NS) (e.g. neuronal, glial, Schwann cells (SCs)) as well as in Alzheimers disease (AD). Our findings uncover cell-type coordinating TFs and identify new TF-TG candidate links. We validate our top predictions using Cut&Run and knockout loss-of-function expression data in rat/mouse models and compare results with additional functional genomic data, including expression quantitative trait loci (eQTL) and Genome-Wide Association Studies (GWAS) to link genetic variants (single nucleotide polymorphisms (SNPs)) to TF coordination.
Collapse
|
5
|
Trimbour R, Deutschmann IM, Cantini L. Molecular mechanisms reconstruction from single-cell multi-omics data with HuMMuS. Bioinformatics 2024; 40:btae143. [PMID: 38460192 PMCID: PMC11065476 DOI: 10.1093/bioinformatics/btae143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 12/20/2023] [Accepted: 03/07/2024] [Indexed: 03/11/2024] Open
Abstract
MOTIVATION The molecular identity of a cell results from a complex interplay between heterogeneous molecular layers. Recent advances in single-cell sequencing technologies have opened the possibility to measure such molecular layers of regulation. RESULTS Here, we present HuMMuS, a new method for inferring regulatory mechanisms from single-cell multi-omics data. Differently from the state-of-the-art, HuMMuS captures cooperation between biological macromolecules and can easily include additional layers of molecular regulation. We benchmarked HuMMuS with respect to the state-of-the-art on both paired and unpaired multi-omics datasets. Our results proved the improvements provided by HuMMuS in terms of transcription factor (TF) targets, TF binding motifs and regulatory regions prediction. Finally, once applied to snmC-seq, scATAC-seq and scRNA-seq data from mouse brain cortex, HuMMuS enabled to accurately cluster scRNA profiles and to identify potential driver TFs. AVAILABILITY AND IMPLEMENTATION HuMMuS is available at https://github.com/cantinilab/HuMMuS.
Collapse
Affiliation(s)
- Remi Trimbour
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics Group, F-75015 Paris, France
- Institut de Biologie de l’Ecole Normale Supérieure, CNRS, INSERM, Ecole Normale Supérieure, Université PSL, 75005 Paris, France
| | - Ina Maria Deutschmann
- Institut de Biologie de l’Ecole Normale Supérieure, CNRS, INSERM, Ecole Normale Supérieure, Université PSL, 75005 Paris, France
| | - Laura Cantini
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics Group, F-75015 Paris, France
- Institut de Biologie de l’Ecole Normale Supérieure, CNRS, INSERM, Ecole Normale Supérieure, Université PSL, 75005 Paris, France
| |
Collapse
|
6
|
Smith GD, Ching WH, Cornejo-Páramo P, Wong ES. Decoding enhancer complexity with machine learning and high-throughput discovery. Genome Biol 2023; 24:116. [PMID: 37173718 PMCID: PMC10176946 DOI: 10.1186/s13059-023-02955-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 04/28/2023] [Indexed: 05/15/2023] Open
Abstract
Enhancers are genomic DNA elements controlling spatiotemporal gene expression. Their flexible organization and functional redundancies make deciphering their sequence-function relationships challenging. This article provides an overview of the current understanding of enhancer organization and evolution, with an emphasis on factors that influence these relationships. Technological advancements, particularly in machine learning and synthetic biology, are discussed in light of how they provide new ways to understand this complexity. Exciting opportunities lie ahead as we continue to unravel the intricacies of enhancer function.
Collapse
Affiliation(s)
- Gabrielle D Smith
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia
| | - Wan Hern Ching
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
| | - Paola Cornejo-Páramo
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia
| | - Emily S Wong
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia.
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia.
| |
Collapse
|
7
|
Kshirsagar M, Yuan H, Ferres JL, Leslie C. BindVAE: Dirichlet variational autoencoders for de novo motif discovery from accessible chromatin. Genome Biol 2022; 23:174. [PMID: 35971180 PMCID: PMC9380350 DOI: 10.1186/s13059-022-02723-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 06/28/2022] [Indexed: 11/10/2022] Open
Abstract
We present a novel unsupervised deep learning approach called BindVAE, based on Dirichlet variational autoencoders, for jointly decoding multiple TF binding signals from open chromatin regions. BindVAE can disentangle an input DNA sequence into distinct latent factors that encode cell-type specific in vivo binding signals for individual TFs, composite patterns for TFs involved in cooperative binding, and genomic context surrounding the binding sites. On the task of retrieving the motifs of expressed TFs in a given cell type, BindVAE is competitive with existing motif discovery approaches.
Collapse
Affiliation(s)
| | - Han Yuan
- Calico Life Sciences, South San Francisco, CA, USA
| | | | | |
Collapse
|
8
|
Bhogale S, Sinha S. Thermodynamics-based modeling reveals regulatory effects of indirect transcription factor-DNA binding. iScience 2022; 25:104152. [PMID: 35465052 PMCID: PMC9018382 DOI: 10.1016/j.isci.2022.104152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 12/28/2021] [Accepted: 03/21/2022] [Indexed: 11/30/2022] Open
Abstract
Transcription factors (TFs) influence gene expression by binding to DNA, yet experimental data suggests that they also frequently bind regulatory DNA indirectly by interacting with other DNA-bound proteins. Here, we used a data modeling approach to test if such indirect binding by TFs plays a significant role in gene regulation. We first incorporated regulatory function of indirectly bound TFs into a thermodynamics-based model for predicting enhancer-driven expression from its sequence. We then fit the new model to a rich data set comprising hundreds of enhancers and their regulatory activities during mesoderm specification in Drosophila embryogenesis and showed that the newly incorporated mechanism results in significantly better agreement with data. In the process, we derived the first sequence-level model of this extensively characterized regulatory program. We further showed that allowing indirect binding of a TF explains its localization at enhancers more accurately than with direct binding only. Our model also provided a simple explanation of how a TF may switch between activating and repressive roles depending on context. Inclusion of indirect DNA binding of transcription factor improves enhancer function prediction Context specific activating or repressive roles of TFs Indirect binding improves fits to experimental TF-DNA binding data Role of Tinman depends on its DNA-binding mode (direct or indirect)
Collapse
|
9
|
Yi X, Zheng Z, Xu H, Zhou Y, Huang D, Wang J, Feng X, Zhao K, Fan X, Zhang S, Dong X, Wang Z, Shen Y, Cheng H, Shi L, Li MJ. Interrogating cell type-specific cooperation of transcriptional regulators in 3D chromatin. iScience 2021; 24:103468. [PMID: 34888502 PMCID: PMC8634045 DOI: 10.1016/j.isci.2021.103468] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 09/23/2021] [Accepted: 11/12/2021] [Indexed: 12/14/2022] Open
Abstract
Context-specific activities of transcription regulators (TRs) in the nucleus modulate spatiotemporal gene expression precisely. Using the largest ChIP-seq data and chromatin loops in the human K562 cell line, we initially interrogated TR cooperation in 3D chromatin via a graphical model and revealed many known and novel TRs manipulating context-specific pathways. To explore TR cooperation across broad tissue/cell types, we systematically leveraged large-scale open chromatin profiles, computational footprinting, and high-resolution chromatin interactions to investigate tissue/cell type-specific TR cooperation. We first delineated a landscape of TR cooperation across 40 human tissue/cell types. Network modularity analyses uncovered the commonality and specificity of TR cooperation in different conditions. We also demonstrated that TR cooperation information can better interpret the disease-causal variants identified by genome-wide association studies and recapitulate cell states during neural development. Our study characterizes shared and unique patterns of TR cooperation associated with the cell type specificity of gene regulation in 3D chromatin. Computational inference of transcriptional regulator (TR) cooperation in 3D chromatin A landscape of 3D TR cooperation across 40 human tissue/cell types TR cooperation can better interpret the disease-causal variants identified by GWAS Cooperation of certain TRs shapes context-specific gene regulation in cell development
Collapse
Affiliation(s)
- Xianfu Yi
- School of Biomedical Engineering and Technology, Tianjin Medical University, Tianjin 300070, China.,Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China
| | - Zhanye Zheng
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Hang Xu
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China
| | - Yao Zhou
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Dandan Huang
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Jianhua Wang
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Xiangling Feng
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Ke Zhao
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Xutong Fan
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Shijie Zhang
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Xiaobao Dong
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Genetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Zhao Wang
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Yujun Shen
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Hui Cheng
- State Key Laboratory of Experimental Hematology, Chinese Academy of Medical Sciences, Tianjin 300070, China
| | - Lei Shi
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Mulin Jun Li
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China.,Department of Epidemiology and Biostatistics, Tianjin Key Laboratory of Molecular Cancer Epidemiology, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| |
Collapse
|
10
|
Bernardini A, Lorenzo M, Chaves-Sanjuan A, Swuec P, Pigni M, Saad D, Konarev PV, Graewert MA, Valentini E, Svergun DI, Nardini M, Mantovani R, Gnesutta N. The USR domain of USF1 mediates NF-Y interactions and cooperative DNA binding. Int J Biol Macromol 2021; 193:401-413. [PMID: 34673109 DOI: 10.1016/j.ijbiomac.2021.10.056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 10/07/2021] [Accepted: 10/08/2021] [Indexed: 10/20/2022]
Abstract
The trimeric CCAAT-binding NF-Y is a "pioneer" Transcription Factor -TF- known to cooperate with neighboring TFs to regulate gene expression. Genome-wide analyses detected a precise stereo-alignment -10/12 bp- of CCAAT with E-box elements and corresponding colocalization of NF-Y with basic-Helix-Loop-Helix (bHLH) TFs. We dissected here NF-Y interactions with USF1 and MAX. USF1, but not MAX, cooperates in DNA binding with NF-Y. NF-Y and USF1 synergize to activate target promoters. Reconstruction of complexes by structural means shows independent DNA binding of MAX, whereas USF1 has extended contacts with NF-Y, involving the USR, a USF-specific amino acid sequence stretch required for trans-activation. The USR is an intrinsically disordered domain and adopts different conformations based on E-box-CCAAT distances. Deletion of the USR abolishes cooperative DNA binding with NF-Y. Our data indicate that the functionality of certain unstructured domains involves adapting to small variation in stereo-alignments of the multimeric TFs sites.
Collapse
Affiliation(s)
- Andrea Bernardini
- Dipartimento di Bioscienze, Università degli Studi di Milano, Milano 20133, Italy
| | - Mariangela Lorenzo
- Dipartimento di Bioscienze, Università degli Studi di Milano, Milano 20133, Italy
| | | | - Paolo Swuec
- Dipartimento di Bioscienze, Università degli Studi di Milano, Milano 20133, Italy
| | - Matteo Pigni
- Dipartimento di Bioscienze, Università degli Studi di Milano, Milano 20133, Italy
| | - Dana Saad
- Dipartimento di Bioscienze, Università degli Studi di Milano, Milano 20133, Italy
| | - Petr V Konarev
- A.V. Shubnikov Institute of Crystallography, Federal Scientific Research Centre "Crystallography and Photonics" of Russian Academy of Science, Moscow 119333, Russian Federation
| | | | - Erica Valentini
- European Molecular Biology Laboratory, Hamburg Unit, Hamburg 22607, Germany
| | - Dmitri I Svergun
- European Molecular Biology Laboratory, Hamburg Unit, Hamburg 22607, Germany
| | - Marco Nardini
- Dipartimento di Bioscienze, Università degli Studi di Milano, Milano 20133, Italy
| | - Roberto Mantovani
- Dipartimento di Bioscienze, Università degli Studi di Milano, Milano 20133, Italy.
| | - Nerina Gnesutta
- Dipartimento di Bioscienze, Università degli Studi di Milano, Milano 20133, Italy.
| |
Collapse
|
11
|
Hasegawa Y, Struhl K. Different SP1 binding dynamics at individual genomic loci in human cells. Proc Natl Acad Sci U S A 2021; 118:e2113579118. [PMID: 34764224 PMCID: PMC8609546 DOI: 10.1073/pnas.2113579118] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/06/2021] [Indexed: 11/18/2022] Open
Abstract
Using a tamoxifen-inducible time-course ChIP-sequencing (ChIP-seq) approach, we show that the ubiquitous transcription factor SP1 has different binding dynamics at its target sites in the human genome. SP1 very rapidly reaches maximal binding levels at some sites, but binding kinetics at other sites is biphasic, with rapid half-maximal binding followed by a considerably slower increase to maximal binding. While ∼70% of SP1 binding sites are located at promoter regions, loci with slow SP1 binding kinetics are enriched in enhancer and Polycomb-repressed regions. Unexpectedly, SP1 sites with fast binding kinetics tend to have higher quality and more copies of the SP1 sequence motif. Different cobinding factors associate near SP1 binding sites depending on their binding kinetics and on their location at promoters or enhancers. For example, NFY and FOS are preferentially associated near promoter-bound SP1 sites with fast binding kinetics, whereas DNA motifs of ETS and homeodomain proteins are preferentially observed at sites with slow binding kinetics. At promoters but not enhancers, proteins involved in sumoylation and PML bodies associate more strongly with slow SP1 binding sites than with the fast binding sites. The speed of SP1 binding is not associated with nucleosome occupancy, and it is not necessarily coupled to higher transcriptional activity. These results with SP1 are in contrast to those of human TBP, indicating that there is no common mechanism affecting transcription factor binding kinetics. The biphasic kinetics at some SP1 target sites suggest the existence of distinct chromatin states at these loci in different cells within the overall population.
Collapse
Affiliation(s)
- Yuko Hasegawa
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115
| | - Kevin Struhl
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115
| |
Collapse
|
12
|
ChIP-GSM: Inferring active transcription factor modules to predict functional regulatory elements. PLoS Comput Biol 2021; 17:e1009203. [PMID: 34292930 PMCID: PMC8330942 DOI: 10.1371/journal.pcbi.1009203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 08/03/2021] [Accepted: 06/20/2021] [Indexed: 11/19/2022] Open
Abstract
Transcription factors (TFs) often function as a module including both master factors and mediators binding at cis-regulatory regions to modulate nearby gene transcription. ChIP-seq profiling of multiple TFs makes it feasible to infer functional TF modules. However, when inferring TF modules based on co-localization of ChIP-seq peaks, often many weak binding events are missed, especially for mediators, resulting in incomplete identification of modules. To address this problem, we develop a ChIP-seq data-driven Gibbs Sampler to infer Modules (ChIP-GSM) using a Bayesian framework that integrates ChIP-seq profiles of multiple TFs. ChIP-GSM samples read counts of module TFs iteratively to estimate the binding potential of a module to each region and, across all regions, estimates the module abundance. Using inferred module-region probabilistic bindings as feature units, ChIP-GSM then employs logistic regression to predict active regulatory elements. Validation of ChIP-GSM predicted regulatory regions on multiple independent datasets sharing the same context confirms the advantage of using TF modules for predicting regulatory activity. In a case study of K562 cells, we demonstrate that the ChIP-GSM inferred modules form as groups, activate gene expression at different time points, and mediate diverse functional cellular processes. Hence, ChIP-GSM infers biologically meaningful TF modules and improves the prediction accuracy of regulatory region activities.
Collapse
|
13
|
Biswas A, Narlikar L. A universal framework for detecting cis-regulatory diversity in DNA regulatory regions. Genome Res 2021; 31:1646-1662. [PMID: 34285090 PMCID: PMC8415372 DOI: 10.1101/gr.274563.120] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2020] [Accepted: 07/09/2021] [Indexed: 12/02/2022]
Abstract
High-throughput sequencing-based assays measure different biochemical activities pertaining to gene regulation, genome-wide. These activities include transcription factor (TF)–DNA binding, enhancer activity, open chromatin, and more. A major goal is to understand underlying sequence components, or motifs, that can explain the measured activity. It is usually not one motif but a combination of motifs bound by cooperatively acting proteins that confers activity to such regions. Furthermore, regions can be diverse, governed by different combinations of TFs/motifs. Current approaches do not take into account this issue of combinatorial diversity. We present a new statistical framework, cisDIVERSITY, which models regions as diverse modules characterized by combinations of motifs while simultaneously learning the motifs themselves. Because cisDIVERSITY does not rely on knowledge of motifs, modules, cell type, or organism, it is general enough to be applied to regions reported by most high-throughput assays. For example, in enhancer predictions resulting from different assays—GRO-cap, STARR-seq, and those measuring chromatin structure—cisDIVERSITY discovers distinct modules and combinations of TF binding sites, some specific to the assay. From protein–DNA binding data, cisDIVERSITY identifies potential cofactors of the profiled TF, whereas from ATAC-seq data, it identifies tissue-specific regulatory modules. Finally, analysis of single-cell ATAC-seq data suggests that regions open in one cell-state encode information about future states, with certain modules staying open and others closing down in the next time point.
Collapse
Affiliation(s)
- Anushua Biswas
- CSIR-National Chemical Laboratory, Academy of Scientific and Innovative Research
| | - Leelavati Narlikar
- CSIR-National Chemical Laboratory, Academy of Scientific and Innovative Research
| |
Collapse
|
14
|
Jeong D, Lim S, Lee S, Oh M, Cho C, Seong H, Jung W, Kim S. Construction of Condition-Specific Gene Regulatory Network Using Kernel Canonical Correlation Analysis. Front Genet 2021; 12:652623. [PMID: 34093651 PMCID: PMC8172963 DOI: 10.3389/fgene.2021.652623] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Accepted: 03/26/2021] [Indexed: 01/01/2023] Open
Abstract
Gene expression profile or transcriptome can represent cellular states, thus understanding gene regulation mechanisms can help understand how cells respond to external stress. Interaction between transcription factor (TF) and target gene (TG) is one of the representative regulatory mechanisms in cells. In this paper, we present a novel computational method to construct condition-specific transcriptional networks from transcriptome data. Regulatory interaction between TFs and TGs is very complex, specifically multiple-to-multiple relations. Experimental data from TF Chromatin Immunoprecipitation sequencing is useful but produces one-to-multiple relations between TF and TGs. On the other hand, co-expression networks of genes can be useful for constructing condition transcriptional networks, but there are many false positive relations in co-expression networks. In this paper, we propose a novel method to construct a condition-specific and combinatorial transcriptional network, applying kernel canonical correlation analysis (kernel CCA) to identify multiple-to-multiple TF-TG relations in certain biological condition. Kernel CCA is a well-established statistical method for computing the correlation of a group of features vs. another group of features. We, therefore, employed kernel CCA to embed TFs and TGs into a new space where the correlation of TFs and TGs are reflected. To demonstrate the usefulness of our network construction method, we used the blood transcriptome data for the investigation on the response to high fat diet in a human and an arabidopsis data set for the investigation on the response to cold/heat stress. Our method detected not only important regulatory interactions reported in previous studies but also novel TF-TG relations where a module of TF is regulating a module of TGs upon specific stress.
Collapse
Affiliation(s)
- Dabin Jeong
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
| | - Sangsoo Lim
- Bioinformatics Institute, Seoul National University, Seoul, South Korea
| | - Sangseon Lee
- BK21 FOUR Intelligence Computing, Seoul National University, Seoul, South Korea
| | - Minsik Oh
- Department of Computer Science and Engineering, Seoul National University, Seoul, South Korea
| | - Changyun Cho
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
| | - Hyeju Seong
- Department of Crop Science, Konkuk University, Seoul, South Korea
| | - Woosuk Jung
- Department of Crop Science, Konkuk University, Seoul, South Korea
| | - Sun Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
- Bioinformatics Institute, Seoul National University, Seoul, South Korea
- Department of Computer Science and Engineering, Institute of Engineering Research, Seoul National University, Seoul, South Korea
| |
Collapse
|
15
|
Lou S, Li T, Kong X, Zhang J, Liu J, Lee D, Gerstein M. TopicNet: a framework for measuring transcriptional regulatory network change. Bioinformatics 2021; 36:i474-i481. [PMID: 32657410 PMCID: PMC7355251 DOI: 10.1093/bioinformatics/btaa403] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
Motivation Recently, many chromatin immunoprecipitation sequencing experiments have been carried out for a diverse group of transcription factors (TFs) in many different types of human cells. These experiments manifest large-scale and dynamic changes in regulatory network connectivity (i.e. network ‘rewiring’), highlighting the different regulatory programs operating in disparate cellular states. However, due to the dense and noisy nature of current regulatory networks, directly comparing the gains and losses of targets of key TFs across cell states is often not informative. Thus, here, we seek an abstracted, low-dimensional representation to understand the main features of network change. Results We propose a method called TopicNet that applies latent Dirichlet allocation to extract functional topics for a collection of genes regulated by a given TF. We then define a rewiring score to quantify regulatory-network changes in terms of the topic changes for this TF. Using this framework, we can pinpoint particular TFs that change greatly in network connectivity between different cellular states (such as observed in oncogenesis). Also, incorporating gene expression data, we define a topic activity score that measures the degree to which a given topic is active in a particular cellular state. And we show how activity differences can indicate differential survival in various cancers. Availability and Implementation The TopicNet framework and related analysis were implemented using R and all codes are available at https://github.com/gersteinlab/topicnet. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shaoke Lou
- Department of Molecular Biophysics and Biochemistry
| | - Tianxiao Li
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | | | - Jing Zhang
- Department of Molecular Biophysics and Biochemistry
| | - Jason Liu
- Department of Molecular Biophysics and Biochemistry
| | - Donghoon Lee
- Department of Molecular Biophysics and Biochemistry
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry
- To whom correspondence should be addressed. E-mail:
| |
Collapse
|
16
|
Yamada N, Rossi MJ, Farrell N, Pugh BF, Mahony S. Alignment and quantification of ChIP-exo crosslinking patterns reveal the spatial organization of protein-DNA complexes. Nucleic Acids Res 2020; 48:11215-11226. [PMID: 32747934 PMCID: PMC7672471 DOI: 10.1093/nar/gkaa618] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Revised: 06/25/2020] [Accepted: 07/13/2020] [Indexed: 12/12/2022] Open
Abstract
The ChIP-exo assay precisely delineates protein-DNA crosslinking patterns by combining chromatin immunoprecipitation with 5' to 3' exonuclease digestion. Within a regulatory complex, the physical distance of a regulatory protein to DNA affects crosslinking efficiencies. Therefore, the spatial organization of a protein-DNA complex could potentially be inferred by analyzing how crosslinking signatures vary between its subunits. Here, we present a computational framework that aligns ChIP-exo crosslinking patterns from multiple proteins across a set of coordinately bound regulatory regions, and which detects and quantifies protein-DNA crosslinking events within the aligned profiles. By producing consistent measurements of protein-DNA crosslinking strengths across multiple proteins, our approach enables characterization of relative spatial organization within a regulatory complex. Applying our approach to collections of ChIP-exo data, we demonstrate that it can recover aspects of regulatory complex spatial organization at yeast ribosomal protein genes and yeast tRNA genes. We also demonstrate the ability to quantify changes in protein-DNA complex organization across conditions by applying our approach to analyze Drosophila Pol II transcriptional components. Our results suggest that principled analyses of ChIP-exo crosslinking patterns enable inference of spatial organization within protein-DNA complexes.
Collapse
Affiliation(s)
- Naomi Yamada
- Center for Eukaryotic Gene Regulation, Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Matthew J Rossi
- Center for Eukaryotic Gene Regulation, Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Nina Farrell
- Center for Eukaryotic Gene Regulation, Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - B Franklin Pugh
- Center for Eukaryotic Gene Regulation, Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Shaun Mahony
- Center for Eukaryotic Gene Regulation, Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
17
|
Martínez-Zamudio RI, Roux PF, de Freitas JANLF, Robinson L, Doré G, Sun B, Belenki D, Milanovic M, Herbig U, Schmitt CA, Gil J, Bischof O. AP-1 imprints a reversible transcriptional programme of senescent cells. Nat Cell Biol 2020; 22:842-855. [PMID: 32514071 PMCID: PMC7899185 DOI: 10.1038/s41556-020-0529-5] [Citation(s) in RCA: 110] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Accepted: 04/27/2020] [Indexed: 12/17/2022]
Abstract
Senescent cells affect many physiological and pathophysiological processes. While select genetic and epigenetic elements for senescence induction have been identified, the dynamics, epigenetic mechanisms and regulatory networks defining senescence competence, induction and maintenance remain poorly understood, precluding the deliberate therapeutic targeting of senescence for health benefits. Here, we examined the possibility that the epigenetic state of enhancers determines senescent cell fate. We explored this by generating time-resolved transcriptomes and epigenome profiles during oncogenic RAS-induced senescence and validating central findings in different cell biology and disease models of senescence. Through integrative analysis and functional validation, we reveal links between enhancer chromatin, transcription factor recruitment and senescence competence. We demonstrate that activator protein 1 (AP-1) 'pioneers' the senescence enhancer landscape and defines the organizational principles of the transcription factor network that drives the transcriptional programme of senescent cells. Together, our findings enabled us to manipulate the senescence phenotype with potential therapeutic implications.
Collapse
Affiliation(s)
- Ricardo Iván Martínez-Zamudio
- Institut Pasteur, Paris, France
- INSERM U993, Paris, France
- Center for Cell Signaling, Department of Microbiology, Biochemistry and Molecular Genetics, New Jersey Medical School of Rutgers Biomedical and Health Sciences, Rutgers University, Newark, NJ, USA
| | - Pierre-François Roux
- Institut Pasteur, Paris, France
- INSERM U993, Paris, France
- Johnson & Johnson, Upstream Skin Research, Issy-les-Moulineaux, France
| | | | - Lucas Robinson
- Institut Pasteur, Paris, France
- INSERM U993, Paris, France
- Université de Paris, Sorbonne Paris Cité, Paris, France
| | - Gregory Doré
- Institut Pasteur, Paris, France
- INSERM U993, Paris, France
| | - Bin Sun
- MRC London Institute of Medical Sciences (LMS), London, UK
- Institute of Clinical Sciences (ICS), Faculty of Medicine, Imperial College London, London, UK
| | - Dimitri Belenki
- Department of Hematology, Oncology and Tumor Immunology, Virchow Campus, and Molekulares Krebsforschungszentrum, Charité-University Medical Center, Berlin, Germany
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
| | - Maja Milanovic
- Department of Hematology, Oncology and Tumor Immunology, Virchow Campus, and Molekulares Krebsforschungszentrum, Charité-University Medical Center, Berlin, Germany
- Deutsches Konsortium für Translationale Krebsforschung (German Cancer Consortium), Berlin, Germany
| | - Utz Herbig
- Center for Cell Signaling, Department of Microbiology, Biochemistry and Molecular Genetics, New Jersey Medical School of Rutgers Biomedical and Health Sciences, Rutgers University, Newark, NJ, USA
| | - Clemens A Schmitt
- Department of Hematology, Oncology and Tumor Immunology, Virchow Campus, and Molekulares Krebsforschungszentrum, Charité-University Medical Center, Berlin, Germany
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
- Deutsches Konsortium für Translationale Krebsforschung (German Cancer Consortium), Berlin, Germany
- Department of Hematology and Oncology, Kepler University Hospital, Johannes Kepler University, Linz, Austria
| | - Jesús Gil
- MRC London Institute of Medical Sciences (LMS), London, UK
- Institute of Clinical Sciences (ICS), Faculty of Medicine, Imperial College London, London, UK
| | - Oliver Bischof
- Institut Pasteur, Paris, France.
- INSERM U993, Paris, France.
| |
Collapse
|
18
|
Spakowicz D, Lou S, Barron B, Gomez JL, Li T, Liu Q, Grant N, Yan X, Hoyd R, Weinstock G, Chupp GL, Gerstein M. Approaches for integrating heterogeneous RNA-seq data reveal cross-talk between microbes and genes in asthmatic patients. Genome Biol 2020; 21:150. [PMID: 32571363 PMCID: PMC7310008 DOI: 10.1186/s13059-020-02033-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Accepted: 04/30/2020] [Indexed: 11/16/2022] Open
Abstract
Sputum induction is a non-invasive method to evaluate the airway environment, particularly for asthma. RNA sequencing (RNA-seq) of sputum samples can be challenging to interpret due to the complex and heterogeneous mixtures of human cells and exogenous (microbial) material. In this study, we develop a pipeline that integrates dimensionality reduction and statistical modeling to grapple with the heterogeneity. LDA(Latent Dirichlet allocation)-link connects microbes to genes using reduced-dimensionality LDA topics. We validate our method with single-cell RNA-seq and microscopy and then apply it to the sputum of asthmatic patients to find known and novel relationships between microbes and genes.
Collapse
Affiliation(s)
- Daniel Spakowicz
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Division of Medical Oncology, Ohio State University College of Medicine, Columbus, OH, USA
- Department of Biomedical Informatics, Ohio State University College of Medicine, Columbus, OH, USA
| | - Shaoke Lou
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Brian Barron
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Jose L Gomez
- Section of Pulmonary, Critical Care, and Sleep Medicine, Department of Internal Medicine, Yale University School of Medicine, New Haven, CT, USA
| | - Tianxiao Li
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Qing Liu
- Section of Pulmonary, Critical Care, and Sleep Medicine, Department of Internal Medicine, Yale University School of Medicine, New Haven, CT, USA
| | - Nicole Grant
- Section of Pulmonary, Critical Care, and Sleep Medicine, Department of Internal Medicine, Yale University School of Medicine, New Haven, CT, USA
| | - Xiting Yan
- Section of Pulmonary, Critical Care, and Sleep Medicine, Department of Internal Medicine, Yale University School of Medicine, New Haven, CT, USA
| | - Rebecca Hoyd
- Division of Medical Oncology, Ohio State University College of Medicine, Columbus, OH, USA
| | - George Weinstock
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Geoffrey L Chupp
- Section of Pulmonary, Critical Care, and Sleep Medicine, Department of Internal Medicine, Yale University School of Medicine, New Haven, CT, USA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA.
- Department of Computer Science, Yale University, New Haven, CT, USA.
- Department of Statistics and Data Science, Yale University, New Haven, CT, USA.
| |
Collapse
|
19
|
Caspar SM, Dubacher N, Kopps AM, Meienberg J, Henggeler C, Matyas G. Clinical sequencing: From raw data to diagnosis with lifetime value. Clin Genet 2019; 93:508-519. [PMID: 29206278 DOI: 10.1111/cge.13190] [Citation(s) in RCA: 62] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2017] [Revised: 11/28/2017] [Accepted: 11/30/2017] [Indexed: 12/22/2022]
Abstract
High-throughput sequencing (HTS) has revolutionized genetics by enabling the detection of sequence variants at hitherto unprecedented large scale. Despite these advances, however, there are still remaining challenges in the complete coverage of targeted regions (genes, exome or genome) as well as in HTS data analysis and interpretation. Moreover, it is easy to get overwhelmed by the plethora of available methods and tools for HTS. Here, we review the step-by-step process from the generation of sequence data to molecular diagnosis of Mendelian diseases. Highlighting advantages and limitations, this review addresses the current state of (1) HTS technologies, considering targeted, whole-exome, and whole-genome sequencing on short- and long-read platforms; (2) read alignment, variant calling and interpretation; as well as (3) regulatory issues related to genetic counseling, reimbursement, and data storage.
Collapse
Affiliation(s)
- S M Caspar
- Center for Cardiovascular Genetics and Gene Diagnostics, Foundation for People with Rare Diseases, Schlieren-Zurich, Switzerland
| | - N Dubacher
- Center for Cardiovascular Genetics and Gene Diagnostics, Foundation for People with Rare Diseases, Schlieren-Zurich, Switzerland
| | - A M Kopps
- Center for Cardiovascular Genetics and Gene Diagnostics, Foundation for People with Rare Diseases, Schlieren-Zurich, Switzerland
| | - J Meienberg
- Center for Cardiovascular Genetics and Gene Diagnostics, Foundation for People with Rare Diseases, Schlieren-Zurich, Switzerland
| | - C Henggeler
- Center for Cardiovascular Genetics and Gene Diagnostics, Foundation for People with Rare Diseases, Schlieren-Zurich, Switzerland
| | - G Matyas
- Center for Cardiovascular Genetics and Gene Diagnostics, Foundation for People with Rare Diseases, Schlieren-Zurich, Switzerland.,Zurich Center for Integrative Human Physiology, University of Zurich, Zurich, Switzerland
| |
Collapse
|
20
|
Guo Y, Tian K, Zeng H, Guo X, Gifford DK. A novel k-mer set memory (KSM) motif representation improves regulatory variant prediction. Genome Res 2018; 28:891-900. [PMID: 29654070 PMCID: PMC5991515 DOI: 10.1101/gr.226852.117] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Accepted: 04/04/2018] [Indexed: 12/15/2022]
Abstract
The representation and discovery of transcription factor (TF) sequence binding specificities is critical for understanding gene regulatory networks and interpreting the impact of disease-associated noncoding genetic variants. We present a novel TF binding motif representation, the k-mer set memory (KSM), which consists of a set of aligned k-mers that are overrepresented at TF binding sites, and a new method called KMAC for de novo discovery of KSMs. We find that KSMs more accurately predict in vivo binding sites than position weight matrix (PWM) models and other more complex motif models across a large set of ChIP-seq experiments. Furthermore, KSMs outperform PWMs and more complex motif models in predicting in vitro binding sites. KMAC also identifies correct motifs in more experiments than five state-of-the-art motif discovery methods. In addition, KSM-derived features outperform both PWM and deep learning model derived sequence features in predicting differential regulatory activities of expression quantitative trait loci (eQTL) alleles. Finally, we have applied KMAC to 1600 ENCODE TF ChIP-seq data sets and created a public resource of KSM and PWM motifs. We expect that the KSM representation and KMAC method will be valuable in characterizing TF binding specificities and in interpreting the effects of noncoding genetic variations.
Collapse
Affiliation(s)
- Yuchun Guo
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Kevin Tian
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Haoyang Zeng
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Xiaoyun Guo
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - David Kenneth Gifford
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
21
|
Wang X, Lin P, Ho JWK. Discovery of cell-type specific DNA motif grammar in cis-regulatory elements using random Forest. BMC Genomics 2018; 19:929. [PMID: 29363433 PMCID: PMC5780765 DOI: 10.1186/s12864-017-4340-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Background It has been observed that many transcription factors (TFs) can bind to different genomic loci depending on the cell type in which a TF is expressed in, even though the individual TF usually binds to the same core motif in different cell types. How a TF can bind to the genome in such a highly cell-type specific manner, is a critical research question. One hypothesis is that a TF requires co-binding of different TFs in different cell types. If this is the case, it may be possible to observe different combinations of TF motifs – a motif grammar – located at the TF binding sites in different cell types. In this study, we develop a bioinformatics method to systematically identify DNA motifs in TF binding sites across multiple cell types based on published ChIP-seq data, and address two questions: (1) can we build a machine learning classifier to predict cell-type specificity based on motif combinations alone, and (2) can we extract meaningful cell-type specific motif grammars from this classifier model. Results We present a Random Forest (RF) based approach to build a multi-class classifier to predict the cell-type specificity of a TF binding site given its motif content. We applied this RF classifier to two published ChIP-seq datasets of TF (TCF7L2 and MAX) across multiple cell types. Using cross-validation, we show that motif combinations alone are indeed predictive of cell types. Furthermore, we present a rule mining approach to extract the most discriminatory rules in the RF classifier, thus allowing us to discover the underlying cell-type specific motif grammar. Conclusions Our bioinformatics analysis supports the hypothesis that combinatorial TF motif patterns are cell-type specific. Electronic supplementary material The online version of this article (10.1186/s12864-017-4340-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xin Wang
- Victor Chang Cardiac Research Institute, Darlinghurst, NSW, 2010, Australia.,St. Vincent's Clinical School, University of New South Wales, Darlinghurst, NSW, 2010, Australia
| | - Peijie Lin
- Victor Chang Cardiac Research Institute, Darlinghurst, NSW, 2010, Australia.,St. Vincent's Clinical School, University of New South Wales, Darlinghurst, NSW, 2010, Australia
| | - Joshua W K Ho
- Victor Chang Cardiac Research Institute, Darlinghurst, NSW, 2010, Australia. .,St. Vincent's Clinical School, University of New South Wales, Darlinghurst, NSW, 2010, Australia.
| |
Collapse
|
22
|
Dubois-Chevalier J, Mazrooei P, Lupien M, Staels B, Lefebvre P, Eeckhoute J. Organizing combinatorial transcription factor recruitment at cis-regulatory modules. Transcription 2017; 9:233-239. [PMID: 29105538 DOI: 10.1080/21541264.2017.1394424] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
Gene transcriptional regulation relies on cis-regulatory DNA modules (CRMs), which serve as nexus sites for integration of multiple transcription factor (TF) activities. Here, we provide evidence and discuss recent literature indicating that TF recruitment to CRMs is organized into combinations of trans-regulatory protein modules (TRMs). We propose that TRMs are functional entities composed of TFs displaying the most highly interdependent chromatin binding which are, in addition, able to modulate their recruitment to CRMs through inter-TRM effects. These findings shed light on the architectural organization of TF recruitment encoded by their recognition motifs within CRMs.
Collapse
Affiliation(s)
- Julie Dubois-Chevalier
- a Université de Lille - Inserm - Chru de Lille, Institut Pasteur de Lille , U1011- EGID, F-59000 Lille , France
| | - Parisa Mazrooei
- b The Princess Margaret Cancer Centre, University Health Network, Department of Medical Biophysics , University of Toronto , Toronto , ON M5G 1L7 , Canada
| | - Mathieu Lupien
- b The Princess Margaret Cancer Centre, University Health Network, Department of Medical Biophysics , University of Toronto , Toronto , ON M5G 1L7 , Canada
| | - Bart Staels
- a Université de Lille - Inserm - Chru de Lille, Institut Pasteur de Lille , U1011- EGID, F-59000 Lille , France
| | - Philippe Lefebvre
- a Université de Lille - Inserm - Chru de Lille, Institut Pasteur de Lille , U1011- EGID, F-59000 Lille , France
| | - Jérôme Eeckhoute
- a Université de Lille - Inserm - Chru de Lille, Institut Pasteur de Lille , U1011- EGID, F-59000 Lille , France
| |
Collapse
|
23
|
Dixit A, Parnas O, Li B, Chen J, Fulco CP, Jerby-Arnon L, Marjanovic ND, Dionne D, Burks T, Raychowdhury R, Adamson B, Norman TM, Lander ES, Weissman JS, Friedman N, Regev A. Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens. Cell 2016; 167:1853-1866.e17. [PMID: 27984732 PMCID: PMC5181115 DOI: 10.1016/j.cell.2016.11.038] [Citation(s) in RCA: 933] [Impact Index Per Article: 116.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2016] [Revised: 11/14/2016] [Accepted: 11/19/2016] [Indexed: 01/12/2023]
Abstract
Genetic screens help infer gene function in mammalian cells, but it has remained difficult to assay complex phenotypes-such as transcriptional profiles-at scale. Here, we develop Perturb-seq, combining single-cell RNA sequencing (RNA-seq) and clustered regularly interspaced short palindromic repeats (CRISPR)-based perturbations to perform many such assays in a pool. We demonstrate Perturb-seq by analyzing 200,000 cells in immune cells and cell lines, focusing on transcription factors regulating the response of dendritic cells to lipopolysaccharide (LPS). Perturb-seq accurately identifies individual gene targets, gene signatures, and cell states affected by individual perturbations and their genetic interactions. We posit new functions for regulators of differentiation, the anti-viral response, and mitochondrial function during immune activation. By decomposing many high content measurements into the effects of perturbations, their interactions, and diverse cell metadata, Perturb-seq dramatically increases the scope of pooled genomic assays.
Collapse
Affiliation(s)
- Atray Dixit
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard-MIT Division of Health Sciences and Technology, Cambridge, MA 02139, USA
| | - Oren Parnas
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Biyu Li
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Jenny Chen
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard-MIT Division of Health Sciences and Technology, Cambridge, MA 02139, USA
| | - Charles P Fulco
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | | | - Nemanja D Marjanovic
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Computational and Systems Biology, Massachusetts Institute of Technology, Cambridge, MA 02140, USA
| | - Danielle Dionne
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Tyler Burks
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | | | - Britt Adamson
- Department of Cellular and Molecular Pharmacology, California Institute of Quantitative Biosciences, Center for RNA Systems Biology, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Thomas M Norman
- Department of Cellular and Molecular Pharmacology, California Institute of Quantitative Biosciences, Center for RNA Systems Biology, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Eric S Lander
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA; Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02140, USA
| | - Jonathan S Weissman
- Department of Cellular and Molecular Pharmacology, California Institute of Quantitative Biosciences, Center for RNA Systems Biology, University of California, San Francisco, San Francisco, CA 94158, USA; Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| | - Nir Friedman
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; School of Engineering and Computer Science and Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem 91904, Israel
| | - Aviv Regev
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02140, USA; Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA.
| |
Collapse
|