1
|
Tahara S, Tsuchiya T, Matsumoto H, Ozaki H. Transcription factor-binding k-mer analysis clarifies the cell type dependency of binding specificities and cis-regulatory SNPs in humans. BMC Genomics 2023; 24:597. [PMID: 37805453 PMCID: PMC10560430 DOI: 10.1186/s12864-023-09692-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 09/21/2023] [Indexed: 10/09/2023] Open
Abstract
BACKGROUND Transcription factors (TFs) exhibit heterogeneous DNA-binding specificities in individual cells and whole organisms under natural conditions, and de novo motif discovery usually provides multiple motifs, even from a single chromatin immunoprecipitation-sequencing (ChIP-seq) sample. Despite the accumulation of ChIP-seq data and ChIP-seq-derived motifs, the diversity of DNA-binding specificities across different TFs and cell types remains largely unexplored. RESULTS Here, we applied MOCCS2, our k-mer-based motif discovery method, to a collection of human TF ChIP-seq samples across diverse TFs and cell types, and systematically computed profiles of TF-binding specificity scores for all k-mers. After quality control, we compiled a set of TF-binding specificity score profiles for 2,976 high-quality ChIP-seq samples, comprising 473 TFs and 398 cell types. Using these high-quality samples, we confirmed that the k-mer-based TF-binding specificity profiles reflected TF- or TF-family dependent DNA-binding specificities. We then compared the binding specificity scores of ChIP-seq samples with the same TFs but with different cell type classes and found that half of the analyzed TFs exhibited differences in DNA-binding specificities across cell type classes. Additionally, we devised a method to detect differentially bound k-mers between two ChIP-seq samples and detected k-mers exhibiting statistically significant differences in binding specificity scores. Moreover, we demonstrated that differences in the binding specificity scores between k-mers on the reference and alternative alleles could be used to predict the effect of variants on TF binding, as validated by in vitro and in vivo assay datasets. Finally, we demonstrated that binding specificity score differences can be used to interpret disease-associated non-coding single-nucleotide polymorphisms (SNPs) as TF-affecting SNPs and provide candidates responsible for TFs and cell types. CONCLUSIONS Our study provides a basis for investigating the regulation of gene expression in a TF-, TF family-, or cell-type-dependent manner. Furthermore, our differential analysis of binding-specificity scores highlights noncoding disease-associated variants in humans.
Collapse
Affiliation(s)
- Saeko Tahara
- Bioinformatics Laboratory, Institute of Medicine, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8577, Japan
- School of Medicine, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8577, Japan
| | - Takaho Tsuchiya
- Bioinformatics Laboratory, Institute of Medicine, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8577, Japan
- Center for Artificial Intelligence Research, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8577, Japan
| | - Hirotaka Matsumoto
- School of Information and Data Sciences, Nagasaki University, 1-14, Bunkyo-Machi, Nagasaki City, Nagasaki, 852-8521, Japan
- Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics, Wako, Saitama, 351-0198, Japan
| | - Haruka Ozaki
- Bioinformatics Laboratory, Institute of Medicine, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8577, Japan.
- Center for Artificial Intelligence Research, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki, 305-8577, Japan.
- Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics, Wako, Saitama, 351-0198, Japan.
| |
Collapse
|
2
|
Rube HT, Rastogi C, Feng S, Kribelbauer JF, Li A, Becerra B, Melo LAN, Do BV, Li X, Adam HH, Shah NH, Mann RS, Bussemaker HJ. Prediction of protein-ligand binding affinity from sequencing data with interpretable machine learning. Nat Biotechnol 2022; 40:1520-1527. [PMID: 35606422 PMCID: PMC9546773 DOI: 10.1038/s41587-022-01307-0] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2021] [Accepted: 04/04/2022] [Indexed: 01/02/2023]
Abstract
Protein-ligand interactions are increasingly profiled at high throughput using affinity selection and massively parallel sequencing. However, these assays do not provide the biophysical parameters that most rigorously quantify molecular interactions. Here we describe a flexible machine learning method, called ProBound, that accurately defines sequence recognition in terms of equilibrium binding constants or kinetic rates. This is achieved using a multi-layered maximum-likelihood framework that models both the molecular interactions and the data generation process. We show that ProBound quantifies transcription factor (TF) behavior with models that predict binding affinity over a range exceeding that of previous resources; captures the impact of DNA modifications and conformational flexibility of multi-TF complexes; and infers specificity directly from in vivo data such as ChIP-seq without peak calling. When coupled with an assay called KD-seq, it determines the absolute affinity of protein-ligand interactions. We also apply ProBound to profile the kinetics of kinase-substrate interactions. ProBound opens new avenues for decoding biological networks and rationally engineering protein-ligand interactions.
Collapse
Affiliation(s)
- H Tomas Rube
- Department of Bioengineering, University of California, Merced, Merced, CA, USA
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Chaitanya Rastogi
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Siqian Feng
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| | | | - Allyson Li
- Department of Chemistry, Columbia University, New York, NY, USA
| | - Basheer Becerra
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Lucas A N Melo
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Bach Viet Do
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Xiaoting Li
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Hammaad H Adam
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Neel H Shah
- Department of Chemistry, Columbia University, New York, NY, USA
| | - Richard S Mann
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY, USA.
- Department of Systems Biology, Columbia University, New York, NY, USA.
| |
Collapse
|