1
|
Fu X, Mo S, Buendia A, Laurent A, Shao A, del Mar Alvarez-Torres M, Yu T, Tan J, Su J, Sagatelian R, Ferrando AA, Ciccia A, Lan Y, Owens DM, Palomero T, Xing EP, Rabadan R. GET: a foundation model of transcription across human cell types. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.24.559168. [PMID: 39005360 PMCID: PMC11244937 DOI: 10.1101/2023.09.24.559168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Transcriptional regulation, involving the complex interplay between regulatory sequences and proteins, directs all biological processes. Computational models of transcription lack generalizability to accurately extrapolate in unseen cell types and conditions. Here, we introduce GET, an interpretable foundation model designed to uncover regulatory grammars across 213 human fetal and adult cell types. Relying exclusively on chromatin accessibility data and sequence information, GET achieves experimental-level accuracy in predicting gene expression even in previously unseen cell types. GET showcases remarkable adaptability across new sequencing platforms and assays, enabling regulatory inference across a broad range of cell types and conditions, and uncovering universal and cell type specific transcription factor interaction networks. We evaluated its performance on prediction of regulatory activity, inference of regulatory elements and regulators, and identification of physical interactions between transcription factors. Specifically, we show GET outperforms current models in predicting lentivirus-based massive parallel reporter assay readout with reduced input data. In fetal erythroblasts, we identify distal (>1Mbp) regulatory regions that were missed by previous models. In B cells, we identified a lymphocyte-specific transcription factor-transcription factor interaction that explains the functional significance of a leukemia-risk predisposing germline mutation. In sum, we provide a generalizable and accurate model for transcription together with catalogs of gene regulation and transcription factor interactions, all with cell type specificity.
Collapse
Affiliation(s)
- Xi Fu
- Department of Systems Biology, Columbia University, New York, NY, USA
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Shentong Mo
- Department of Machine Learning, Carnegie Mellon University, Pittsburgh, PA, USA
- Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE
| | - Alejandro Buendia
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Anouchka Laurent
- Institute for Cancer Genetics, Columbia University, New York, NY, USA
| | - Anqi Shao
- Department of Dermatology, Columbia University, New York, NY, USA
| | | | - Tianji Yu
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Jimin Tan
- Regeneron Genetics Center, Regeneron, Tarrytown, NY, USA
| | - Jiayu Su
- Department of Systems Biology, Columbia University, New York, NY, USA
| | | | - Adolfo A. Ferrando
- Department of Dermatology, Columbia University, New York, NY, USA
- Regeneron Genetics Center, Regeneron, Tarrytown, NY, USA
| | - Alberto Ciccia
- Department of Genetics and Development, Columbia University, New York, NY, USA
| | - Yanyan Lan
- Institute for AI Industry Research, Tsinghua University, Beijing, China
| | - David M. Owens
- Institute for Cancer Genetics, Columbia University, New York, NY, USA
- Department of Pathology & Cell Biology, Columbia University, New York, NY, USA
| | - Teresa Palomero
- Institute for Cancer Genetics, Columbia University, New York, NY, USA
- Department of Pathology & Cell Biology, Columbia University, New York, NY, USA
| | - Eric P. Xing
- Department of Machine Learning, Carnegie Mellon University, Pittsburgh, PA, USA
- Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE
| | - Raul Rabadan
- Department of Systems Biology, Columbia University, New York, NY, USA
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| |
Collapse
|
2
|
Inge MM, Miller R, Hook H, Bray D, Keenan JL, Zhao R, Gilmore TD, Siggers T. Rapid profiling of transcription factor-cofactor interaction networks reveals principles of epigenetic regulation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.05.588333. [PMID: 38617258 PMCID: PMC11014505 DOI: 10.1101/2024.04.05.588333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Transcription factor (TF)-cofactor (COF) interactions define dynamic, cell-specific networks that govern gene expression; however, these networks are understudied due to a lack of methods for high-throughput profiling of DNA-bound TF-COF complexes. Here we describe the Cofactor Recruitment (CoRec) method for rapid profiling of cell-specific TF-COF complexes. We define a lysine acetyltransferase (KAT)-TF network in resting and stimulated T cells. We find promiscuous recruitment of KATs for many TFs and that 35% of KAT-TF interactions are condition specific. KAT-TF interactions identify NF-κB as a primary regulator of acutely induced H3K27ac. Finally, we find that heterotypic clustering of CBP/P300-recruiting TFs is a strong predictor of total promoter H3K27ac. Our data supports clustering of TF sites that broadly recruit KATs as a mechanism for widespread co-occurring histone acetylation marks. CoRec can be readily applied to different cell systems and provides a powerful approach to define TF-COF networks impacting chromatin state and gene regulation.
Collapse
Affiliation(s)
- MM Inge
- Department of Biology, Boston University, Boston, MA, USA
- Biological Design Center, Boston University, Boston, MA, USA
- These authors contributed equally
| | - R Miller
- Department of Biology, Boston University, Boston, MA, USA
- Bioinformatics Program, Boston University, Boston, MA, USA
- Biological Design Center, Boston University, Boston, MA, USA
- These authors contributed equally
| | - H Hook
- Department of Biology, Boston University, Boston, MA, USA
| | - D Bray
- Department of Biology, Boston University, Boston, MA, USA
- Bioinformatics Program, Boston University, Boston, MA, USA
| | - JL Keenan
- Department of Biology, Boston University, Boston, MA, USA
- Bioinformatics Program, Boston University, Boston, MA, USA
| | - R Zhao
- Department of Biology, Boston University, Boston, MA, USA
| | - TD Gilmore
- Department of Biology, Boston University, Boston, MA, USA
| | - T Siggers
- Department of Biology, Boston University, Boston, MA, USA
- Bioinformatics Program, Boston University, Boston, MA, USA
- Biological Design Center, Boston University, Boston, MA, USA
| |
Collapse
|
3
|
Doughty BR, Hinks MM, Schaepe JM, Marinov GK, Thurm AR, Rios-Martinez C, Parks BE, Tan Y, Marklund E, Dubocanin D, Bintu L, Greenleaf WJ. Single-molecule chromatin configurations link transcription factor binding to expression in human cells. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.02.578660. [PMID: 38352517 PMCID: PMC10862896 DOI: 10.1101/2024.02.02.578660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
The binding of multiple transcription factors (TFs) to genomic enhancers activates gene expression in mammalian cells. However, the molecular details that link enhancer sequence to TF binding, promoter state, and gene expression levels remain opaque. We applied single-molecule footprinting (SMF) to measure the simultaneous occupancy of TFs, nucleosomes, and components of the transcription machinery on engineered enhancer/promoter constructs with variable numbers of TF binding sites for both a synthetic and an endogenous TF. We find that activation domains enhance a TF's capacity to compete with nucleosomes for binding to DNA in a BAF-dependent manner, TF binding on nucleosome-free DNA is consistent with independent binding between TFs, and average TF occupancy linearly contributes to promoter activation rates. We also decompose TF strength into separable binding and activation terms, which can be tuned and perturbed independently. Finally, we develop thermodynamic and kinetic models that quantitatively predict both the binding microstates observed at the enhancer and subsequent time-dependent gene expression. This work provides a template for quantitative dissection of distinct contributors to gene activation, including the activity of chromatin remodelers, TF activation domains, chromatin acetylation, TF concentration, TF binding affinity, and TF binding site configuration.
Collapse
Affiliation(s)
| | - Michaela M Hinks
- Bioengineering Department, Stanford University, Stanford, CA 94305, USA
| | - Julia M Schaepe
- Bioengineering Department, Stanford University, Stanford, CA 94305, USA
| | - Georgi K Marinov
- Genetics Department, Stanford University, Stanford, CA 94305, USA
| | - Abby R Thurm
- Biophysics Graduate Program, Stanford University, Stanford, CA 94305, USA
| | | | - Benjamin E Parks
- Computer Science Department, Stanford University, Stanford, CA 94305, USA
| | - Yingxuan Tan
- Computer Science Department, Stanford University, Stanford, CA 94305, USA
| | - Emil Marklund
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Danilo Dubocanin
- Genetics Department, Stanford University, Stanford, CA 94305, USA
| | - Lacramioara Bintu
- Bioengineering Department, Stanford University, Stanford, CA 94305, USA
| | - William J Greenleaf
- Genetics Department, Stanford University, Stanford, CA 94305, USA
- Department of Applied Physics, Stanford University, Stanford, CA 94205, USA
| |
Collapse
|