1
|
Gamache J, Gingerich D, Shwab EK, Barrera J, Garrett ME, Hume C, Crawford GE, Ashley-Koch AE, Chiba-Falek O. Integrative single-nucleus multi-omics analysis prioritizes candidate cis and trans regulatory networks and their target genes in Alzheimer's disease brains. Cell Biosci 2023; 13:185. [PMID: 37789374 PMCID: PMC10546724 DOI: 10.1186/s13578-023-01120-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 08/30/2023] [Indexed: 10/05/2023] Open
Abstract
BACKGROUND The genetic underpinnings of late-onset Alzheimer's disease (LOAD) are yet to be fully elucidated. Although numerous LOAD-associated loci have been discovered, the causal variants and their target genes remain largely unknown. Since the brain is composed of heterogenous cell subtypes, it is imperative to study the brain on a cell subtype specific level to explore the biological processes underlying LOAD. METHODS Here, we present the largest parallel single-nucleus (sn) multi-omics study to simultaneously profile gene expression (snRNA-seq) and chromatin accessibility (snATAC-seq) to date, using nuclei from 12 normal and 12 LOAD brains. We identified cell subtype clusters based on gene expression and chromatin accessibility profiles and characterized cell subtype-specific LOAD-associated differentially expressed genes (DEGs), differentially accessible peaks (DAPs) and cis co-accessibility networks (CCANs). RESULTS Integrative analysis defined disease-relevant CCANs in multiple cell subtypes and discovered LOAD-associated cell subtype-specific candidate cis regulatory elements (cCREs), their candidate target genes, and trans-interacting transcription factors (TFs), some of which, including ELK1, JUN, and SMAD4 in excitatory neurons, were also LOAD-DEGs. Finally, we focused on a subset of cell subtype-specific CCANs that overlap known LOAD-GWAS regions and catalogued putative functional SNPs changing the affinities of TF motifs within LOAD-cCREs linked to LOAD-DEGs, including APOE and MYO1E in a specific subtype of microglia and BIN1 in a subpopulation of oligodendrocytes. CONCLUSIONS To our knowledge, this study represents the most comprehensive systematic interrogation to date of regulatory networks and the impact of genetic variants on gene dysregulation in LOAD at a cell subtype resolution. Our findings reveal crosstalk between epigenetic, genomic, and transcriptomic determinants of LOAD pathogenesis and define catalogues of candidate genes, cCREs, and variants involved in LOAD genetic etiology and the cell subtypes in which they act to exert their pathogenic effects. Overall, these results suggest that cell subtype-specific cis-trans interactions between regulatory elements and TFs, and the genes dysregulated by these networks contribute to the development of LOAD.
Collapse
Affiliation(s)
- Julia Gamache
- Division of Translational Brain Sciences, Department of Neurology, Duke University Medical Center, DUMC Box 2900, Durham, NC, 27710, USA
- Center for Genomic and Computational Biology, Duke University Medical Center, Durham, NC, 27708, USA
| | - Daniel Gingerich
- Division of Translational Brain Sciences, Department of Neurology, Duke University Medical Center, DUMC Box 2900, Durham, NC, 27710, USA
- Center for Genomic and Computational Biology, Duke University Medical Center, Durham, NC, 27708, USA
| | - E Keats Shwab
- Division of Translational Brain Sciences, Department of Neurology, Duke University Medical Center, DUMC Box 2900, Durham, NC, 27710, USA
- Center for Genomic and Computational Biology, Duke University Medical Center, Durham, NC, 27708, USA
| | - Julio Barrera
- Division of Translational Brain Sciences, Department of Neurology, Duke University Medical Center, DUMC Box 2900, Durham, NC, 27710, USA
- Center for Genomic and Computational Biology, Duke University Medical Center, Durham, NC, 27708, USA
| | - Melanie E Garrett
- Duke Molecular Physiology Institute, Duke University Medical Center, DUMC Box 104775, Durham, NC, 27701, USA
| | - Cordelia Hume
- Division of Translational Brain Sciences, Department of Neurology, Duke University Medical Center, DUMC Box 2900, Durham, NC, 27710, USA
- Center for Genomic and Computational Biology, Duke University Medical Center, Durham, NC, 27708, USA
| | - Gregory E Crawford
- Center for Genomic and Computational Biology, Duke University Medical Center, Durham, NC, 27708, USA.
- Department of Pediatrics, Division of Medical Genetics, Duke University Medical Center, DUMC Box 3382, Durham, NC, 27708, USA.
- Center for Advanced Genomic Technologies, Duke University Medical Center, Durham, NC, 27708, USA.
| | - Allison E Ashley-Koch
- Duke Molecular Physiology Institute, Duke University Medical Center, DUMC Box 104775, Durham, NC, 27701, USA.
- Department of Medicine, Duke University Medical Center, Durham, NC, 27708, USA.
| | - Ornit Chiba-Falek
- Division of Translational Brain Sciences, Department of Neurology, Duke University Medical Center, DUMC Box 2900, Durham, NC, 27710, USA.
- Center for Genomic and Computational Biology, Duke University Medical Center, Durham, NC, 27708, USA.
| |
Collapse
|
2
|
Ali S, Bello B, Chourasia P, Punathil RT, Zhou Y, Patterson M. PWM2Vec: An Efficient Embedding Approach for Viral Host Specification from Coronavirus Spike Sequences. BIOLOGY 2022; 11:418. [PMID: 35336792 PMCID: PMC8945605 DOI: 10.3390/biology11030418] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 02/24/2022] [Accepted: 03/07/2022] [Indexed: 01/14/2023]
Abstract
The study of host specificity has important connections to the question about the origin of SARS-CoV-2 in humans which led to the COVID-19 pandemic-an important open question. There are speculations that bats are a possible origin. Likewise, there are many closely related (corona)viruses, such as SARS, which was found to be transmitted through civets. The study of the different hosts which can be potential carriers and transmitters of deadly viruses to humans is crucial to understanding, mitigating, and preventing current and future pandemics. In coronaviruses, the surface (S) protein, or spike protein, is important in determining host specificity, since it is the point of contact between the virus and the host cell membrane. In this paper, we classify the hosts of over five thousand coronaviruses from their spike protein sequences, segregating them into clusters of distinct hosts among birds, bats, camels, swine, humans, and weasels, to name a few. We propose a feature embedding based on the well-known position weight matrix (PWM), which we call PWM2Vec, and we use it to generate feature vectors from the spike protein sequences of these coronaviruses. While our embedding is inspired by the success of PWMs in biological applications, such as determining protein function and identifying transcription factor binding sites, we are the first (to the best of our knowledge) to use PWMs from viral sequences to generate fixed-length feature vector representations, and use them in the context of host classification. The results on real world data show that when using PWM2Vec, machine learning classifiers are able to perform comparably to the baseline models in terms of predictive performance and runtime-in some cases, the performance is better. We also measure the importance of different amino acids using information gain to show the amino acids which are important for predicting the host of a given coronavirus. Finally, we perform some statistical analyses on these results to show that our embedding is more compact than the embeddings of the baseline models.
Collapse
Affiliation(s)
| | | | | | | | | | - Murray Patterson
- Department of Computer Science, Georgia State University, Atlanta, GA 30303, USA; (S.A.); (B.B.); (P.C.); (R.T.P.); (Y.Z.)
| |
Collapse
|
3
|
Tsukanov AV, Levitsky VG, Merkulova TI. Application of alternative de novo motif recognition models for analysis of structural heterogeneity of transcription factor binding sites: a case study of FOXA2 binding sites. Vavilovskii Zhurnal Genet Selektsii 2021; 25:7. [PMID: 34547062 PMCID: PMC8408018 DOI: 10.18699/vj21.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Revised: 01/10/2021] [Accepted: 01/12/2021] [Indexed: 11/24/2022] Open
Abstract
The most popular model for the search of ChIP-seq data for transcription factor binding sites (TFBS)
is the positional weight matrix (PWM). However, this model does not take into account dependencies between
nucleotide occurrences in different site positions. Currently, two recently proposed models, BaMM and InMoDe,
can do as much. However, application of these models was usually limited only to comparing their recognition
accuracies with that of PWMs, while none of the analyses of the co-prediction and relative positioning of hits of different models in peaks has yet been performed. To close this gap, we propose the pipeline called MultiDeNA. This
pipeline includes stages of model training, assessing their recognition accuracy, scanning ChIP-seq peaks and their
classification based on scan results. We applied our pipeline to 22 ChIP-seq datasets of TF FOXA2 and considered
PWM, dinucleotide PWM (diPWM), BaMM and InMoDe models. The combination of these four models allowed a
significant increase in the fraction of recognized peaks compared to that for the sole PWM model: the increase was
26.3 %. The BaMM model provided the main contribution to the recognition of sites. Although the major fraction of
predicted peaks contained TFBS of different models with coincided positions, the medians of the fraction of peaks
containing the predictions of sole models were 1.08, 0.49, 4.15 and 1.73 % for PWM, diPWM, BaMM and InMoDe,
respectively. Thus, FOXA2 BSs were not fully described by only a sole model, which indicates theirs heterogeneity.
We assume that the BaMM model is the most successful in describing the structure of the FOXA2 BS in ChIP-seq
datasets under study.
Collapse
Affiliation(s)
- A V Tsukanov
- Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - V G Levitsky
- Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia Novosibirsk State University, Novosibirsk, Russia
| | - T I Merkulova
- Institute of Cytology and Genetics of Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia Novosibirsk State University, Novosibirsk, Russia
| |
Collapse
|
4
|
Khamis AM, Motwalli O, Oliva R, Jankovic BR, Medvedeva YA, Ashoor H, Essack M, Gao X, Bajic VB. A novel method for improved accuracy of transcription factor binding site prediction. Nucleic Acids Res 2018; 46:e72. [PMID: 29617876 PMCID: PMC6037060 DOI: 10.1093/nar/gky237] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2017] [Revised: 03/01/2018] [Accepted: 03/20/2018] [Indexed: 12/12/2022] Open
Abstract
Identifying transcription factor (TF) binding sites (TFBSs) is important in the computational inference of gene regulation. Widely used computational methods of TFBS prediction based on position weight matrices (PWMs) usually have high false positive rates. Moreover, computational studies of transcription regulation in eukaryotes frequently require numerous PWM models of TFBSs due to a large number of TFs involved. To overcome these problems we developed DRAF, a novel method for TFBS prediction that requires only 14 prediction models for 232 human TFs, while at the same time significantly improves prediction accuracy. DRAF models use more features than PWM models, as they combine information from TFBS sequences and physicochemical properties of TF DNA-binding domains into machine learning models. Evaluation of DRAF on 98 human ChIP-seq datasets shows on average 1.54-, 1.96- and 5.19-fold reduction of false positives at the same sensitivities compared to models from HOCOMOCO, TRANSFAC and DeepBind, respectively. This observation suggests that one can efficiently replace the PWM models for TFBS prediction by a small number of DRAF models that significantly improve prediction accuracy. The DRAF method is implemented in a web tool and in a stand-alone software freely available at http://cbrc.kaust.edu.sa/DRAF.
Collapse
Affiliation(s)
- Abdullah M Khamis
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955–6900, Saudi Arabia
| | - Olaa Motwalli
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955–6900, Saudi Arabia
| | - Romina Oliva
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955–6900, Saudi Arabia
- Department of Sciences and Technologies, University ‘Parthenope’ of Naples, Centro Direzionale Isola C4 80143, Naples, Italy
| | - Boris R Jankovic
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955–6900, Saudi Arabia
| | - Yulia A Medvedeva
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955–6900, Saudi Arabia
- Institute of Bioengineering, Research Centre of Biotechnology, Russian Academy of Science, 117312 Moscow, Russia
- Department of Computational Biology, Vavilov Institute of General Genetics, Russian Academy of Science, 119991 Moscow, Russia
- Department of Biological and Medical Physics, Moscow Institute of Physics and Technology, 141701, Dolgoprudny, Moscow Region, Russia
| | - Haitham Ashoor
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955–6900, Saudi Arabia
| | - Magbubah Essack
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955–6900, Saudi Arabia
| | - Xin Gao
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955–6900, Saudi Arabia
| | - Vladimir B Bajic
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955–6900, Saudi Arabia
| |
Collapse
|
5
|
Lee NK, Azizan FL, Wong YS, Omar N. DeepFinder: An integration of feature-based and deep learning approach for DNA motif discovery. BIOTECHNOL BIOTEC EQ 2018. [DOI: 10.1080/13102818.2018.1438209] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open
Affiliation(s)
- Nung Kion Lee
- Department of Cognitive Sciences, Faculty of Cognitive Sciences and Human Development, Universiti Malaysia Sarawak, Kota Samarahan, Sarawak, Malaysia
| | - Farah Liyana Azizan
- Centre For Pre-University Studies, Universiti Malaysia Sarawak, Kota Samarahan, Sarawak, Malaysia
| | - Yu Shiong Wong
- Department of Cognitive Sciences, Faculty of Cognitive Sciences and Human Development, Universiti Malaysia Sarawak, Kota Samarahan, Sarawak, Malaysia
| | - Norshafarina Omar
- Department of Cognitive Sciences, Faculty of Cognitive Sciences and Human Development, Universiti Malaysia Sarawak, Kota Samarahan, Sarawak, Malaysia
| |
Collapse
|
6
|
Kulakovskiy IV, Vorontsov IE, Yevshin IS, Soboleva AV, Kasianov AS, Ashoor H, Ba-Alawi W, Bajic VB, Medvedeva YA, Kolpakov FA, Makeev VJ. HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models. Nucleic Acids Res 2016; 44:D116-25. [PMID: 26586801 PMCID: PMC4702883 DOI: 10.1093/nar/gkv1249] [Citation(s) in RCA: 146] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2015] [Revised: 10/29/2015] [Accepted: 10/30/2015] [Indexed: 02/06/2023] Open
Abstract
Models of transcription factor (TF) binding sites provide a basis for a wide spectrum of studies in regulatory genomics, from reconstruction of regulatory networks to functional annotation of transcripts and sequence variants. While TFs may recognize different sequence patterns in different conditions, it is pragmatic to have a single generic model for each particular TF as a baseline for practical applications. Here we present the expanded and enhanced version of HOCOMOCO (http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco10), the collection of models of DNA patterns, recognized by transcription factors. HOCOMOCO now provides position weight matrix (PWM) models for binding sites of 601 human TFs and, in addition, PWMs for 396 mouse TFs. Furthermore, we introduce the largest up to date collection of dinucleotide PWM models for 86 (52) human (mouse) TFs. The update is based on the analysis of massive ChIP-Seq and HT-SELEX datasets, with the validation of the resulting models on in vivo data. To facilitate a practical application, all HOCOMOCO models are linked to gene and protein databases (Entrez Gene, HGNC, UniProt) and accompanied by precomputed score thresholds. Finally, we provide command-line tools for PWM and diPWM threshold estimation and motif finding in nucleotide sequences.
Collapse
Affiliation(s)
- Ivan V Kulakovskiy
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991, GSP-1, Vavilova 32, Moscow, Russia Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, GSP-1, Gubkina 3, Moscow, Russia
| | - Ilya E Vorontsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, GSP-1, Gubkina 3, Moscow, Russia
| | - Ivan S Yevshin
- Design Technological Institute of Digital Techniques, Siberian Branch of the Russian Academy of Sciences, 630090, Academician Rzhanov 6, Novosibirsk, Russia Institute of Systems Biology Ltd, 630112, office 901, Krasina 54, Novosibirsk, Russia
| | - Anastasiia V Soboleva
- Moscow Institute of Physics and Technology, 141700, Institutskiy per. 9, Dolgoprudny, Moscow Region, Russia
| | - Artem S Kasianov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, GSP-1, Gubkina 3, Moscow, Russia
| | - Haitham Ashoor
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal 23955-6900, Saudi Arabia
| | - Wail Ba-Alawi
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal 23955-6900, Saudi Arabia
| | - Vladimir B Bajic
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal 23955-6900, Saudi Arabia
| | - Yulia A Medvedeva
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, GSP-1, Gubkina 3, Moscow, Russia Center for Bioengineering, Russian Academy of Sciences, 117312, 60-letiya Oktyabrya 7/2, Moscow, Russia
| | - Fedor A Kolpakov
- Design Technological Institute of Digital Techniques, Siberian Branch of the Russian Academy of Sciences, 630090, Academician Rzhanov 6, Novosibirsk, Russia Institute of Systems Biology Ltd, 630112, office 901, Krasina 54, Novosibirsk, Russia
| | - Vsevolod J Makeev
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991, GSP-1, Vavilova 32, Moscow, Russia Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, GSP-1, Gubkina 3, Moscow, Russia Moscow Institute of Physics and Technology, 141700, Institutskiy per. 9, Dolgoprudny, Moscow Region, Russia
| |
Collapse
|
7
|
Eggeling R, Roos T, Myllymäki P, Grosse I. Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data. BMC Bioinformatics 2015; 16:375. [PMID: 26552868 PMCID: PMC4640111 DOI: 10.1186/s12859-015-0797-4] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2015] [Accepted: 10/23/2015] [Indexed: 11/29/2022] Open
Abstract
Background Statistical modeling of transcription factor binding sites is one of the classical fields in bioinformatics. The position weight matrix (PWM) model, which assumes statistical independence among all nucleotides in a binding site, used to be the standard model for this task for more than three decades but its simple assumptions are increasingly put into question. Recent high-throughput sequencing methods have provided data sets of sufficient size and quality for studying the benefits of more complex models. However, learning more complex models typically entails the danger of overfitting, and while model classes that dynamically adapt the model complexity to data have been developed, effective model selection is to date only possible for fully observable data, but not, e.g., within de novo motif discovery. Results To address this issue, we propose a stochastic algorithm for performing robust model selection in a latent variable setting. This algorithm yields a solution without relying on hyperparameter-tuning via massive cross-validation or other computationally expensive resampling techniques. Using this algorithm for learning inhomogeneous parsimonious Markov models, we study the degree of putative higher-order intra-motif dependencies for transcription factor binding sites inferred via de novo motif discovery from ChIP-seq data. We find that intra-motif dependencies are prevalent and not limited to first-order dependencies among directly adjacent nucleotides, but that second-order models appear to be the significantly better choice. Conclusions The traditional PWM model appears to be indeed insufficient to infer realistic sequence motifs, as it is on average outperformed by more complex models that take into account intra-motif dependencies. Moreover, using such models together with an appropriate model selection procedure does not lead to a significant performance loss in comparison with the PWM model for any of the studied transcription factors. Hence, we find it worthwhile to recommend that any modern motif discovery algorithm should attempt to take into account intra-motif dependencies. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0797-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ralf Eggeling
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, Halle, Germany. .,Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland.
| | - Teemu Roos
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland.
| | - Petri Myllymäki
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland.
| | - Ivo Grosse
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, Halle, Germany. .,German Center for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.
| |
Collapse
|
8
|
Guo Z, Maki M, Ding R, Yang Y, Zhang B, Xiong L. Genome-wide survey of tissue-specific microRNA and transcription factor regulatory networks in 12 tissues. Sci Rep 2014; 4:5150. [PMID: 24889152 PMCID: PMC5381490 DOI: 10.1038/srep05150] [Citation(s) in RCA: 146] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2014] [Accepted: 05/08/2014] [Indexed: 12/18/2022] Open
Abstract
Tissue-specific miRNAs (TS miRNA) specifically expressed in particular tissues play an important role in tissue identity, differentiation and function. However, transcription factor (TF) and TS miRNA regulatory networks across multiple tissues have not been systematically studied. Here, we manually extracted 116 TS miRNAs and systematically investigated the regulatory network of TF-TS miRNA in 12 human tissues. We identified 2,347 TF-TS miRNA regulatory relations and revealed that most TF binding sites tend to enrich close to the transcription start site of TS miRNAs. Furthermore, we found TS miRNAs were regulated widely by non-tissue specific TFs and the tissue-specific expression level of TF have a close relationship with TF-genes regulation. Finally, we describe TSmiR (http://bioeng.swjtu.edu.cn/TSmiR), a novel and web-searchable database that houses interaction maps of TF-TS miRNA in 12 tissues. Taken together, these observations provide a new suggestion to better understand the regulatory network and mechanisms of TF-TS miRNAs underlying different tissues.
Collapse
Affiliation(s)
- Zhiyun Guo
- School of Life Sciences and Bioengineering, Southwest Jiaotong University, Chengdu, 610031, P.R. China
| | - Miranda Maki
- Department of Biology, Lakehead University, Oliver Road, Thunder Bay, Ontario
| | - Ruofan Ding
- School of Life Sciences and Bioengineering, Southwest Jiaotong University, Chengdu, 610031, P.R. China
| | - Yalan Yang
- School of Life Sciences and Bioengineering, Southwest Jiaotong University, Chengdu, 610031, P.R. China
| | - Bao Zhang
- School of Life Sciences and Bioengineering, Southwest Jiaotong University, Chengdu, 610031, P.R. China
| | - Lili Xiong
- School of Life Sciences and Bioengineering, Southwest Jiaotong University, Chengdu, 610031, P.R. China
| |
Collapse
|
9
|
Application of experimentally verified transcription factor binding sites models for computational analysis of ChIP-Seq data. BMC Genomics 2014; 15:80. [PMID: 24472686 PMCID: PMC4234207 DOI: 10.1186/1471-2164-15-80] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2013] [Accepted: 01/25/2014] [Indexed: 02/07/2023] Open
Abstract
Background ChIP-Seq is widely used to detect genomic segments bound by transcription factors (TF), either directly at DNA binding sites (BSs) or indirectly via other proteins. Currently, there are many software tools implementing different approaches to identify TFBSs within ChIP-Seq peaks. However, their use for the interpretation of ChIP-Seq data is usually complicated by the absence of direct experimental verification, making it difficult both to set a threshold to avoid recognition of too many false-positive BSs, and to compare the actual performance of different models. Results Using ChIP-Seq data for FoxA2 binding loci in mouse adult liver and human HepG2 cells we compared FoxA binding-site predictions for four computational models of two fundamental classes: pattern matching based on existing training set of experimentally confirmed TFBSs (oPWM and SiteGA) and de novo motif discovery (ChIPMunk and diChIPMunk). To properly select prediction thresholds for the models, we experimentally evaluated affinity of 64 predicted FoxA BSs using EMSA that allows safely distinguishing sequences able to bind TF. As a result we identified thousands of reliable FoxA BSs within ChIP-Seq loci from mouse liver and human HepG2 cells. It was found that the performance of conventional position weight matrix (PWM) models was inferior with the highest false positive rate. On the contrary, the best recognition efficiency was achieved by the combination of SiteGA & diChIPMunk/ChIPMunk models, properly identifying FoxA BSs in up to 90% of loci for both mouse and human ChIP-Seq datasets. Conclusions The experimental study of TF binding to oligonucleotides corresponding to predicted sites increases the reliability of computational methods for TFBS-recognition in ChIP-Seq data analysis. Regarding ChIP-Seq data interpretation, basic PWMs have inferior TFBS recognition quality compared to the more sophisticated SiteGA and de novo motif discovery methods. A combination of models from different principles allowed identification of proper TFBSs. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-80) contains supplementary material, which is available to authorized users.
Collapse
|
10
|
Glass C, Wuertzer C, Cui X, Bi Y, Davuluri R, Xiao YY, Wilson M, Owens K, Zhang Y, Perkins A. Global Identification of EVI1 Target Genes in Acute Myeloid Leukemia. PLoS One 2013; 8:e67134. [PMID: 23826213 PMCID: PMC3694976 DOI: 10.1371/journal.pone.0067134] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2012] [Accepted: 05/19/2013] [Indexed: 12/20/2022] Open
Abstract
The ecotropic virus integration site 1 (EVI1) transcription factor is associated with human myeloid malignancy of poor prognosis and is overexpressed in 8-10% of adult AML and strikingly up to 27% of pediatric MLL-rearranged leukemias. For the first time, we report comprehensive genomewide EVI1 binding and whole transcriptome gene deregulation in leukemic cells using a combination of ChIP-Seq and RNA-Seq expression profiling. We found disruption of terminal myeloid differentiation and cell cycle regulation to be prominent in EVI-induced leukemogenesis. Specifically, we identified EVI1 directly binds to and downregulates the master myeloid differentiation gene Cebpe and several of its downstream gene targets critical for terminal myeloid differentiation. We also found EVI1 binds to and downregulates Serpinb2 as well as numerous genes involved in the Jak-Stat signaling pathway. Finally, we identified decreased expression of several ATP-dependent P2X purinoreceptors genes involved in apoptosis mechanisms. These findings provide a foundation for future study of potential therapeutic gene targets for EVI1-induced leukemia.
Collapse
Affiliation(s)
- Carolyn Glass
- Department of Pathology and Lab Medicine, University of Rochester Medical Center, Rochester, New York, United States of America
| | - Charles Wuertzer
- Department of Pathology and Lab Medicine, University of Rochester Medical Center, Rochester, New York, United States of America
| | - Xiaohui Cui
- Department of Pathology and Lab Medicine, University of Rochester Medical Center, Rochester, New York, United States of America
| | - Yingtao Bi
- Molecular and Cellular Oncogenesis Program, Center for Systems and Computational Biology The Wistar Institute, Philadelphia, Pennsylvania, United States of America
| | - Ramana Davuluri
- Molecular and Cellular Oncogenesis Program, Center for Systems and Computational Biology The Wistar Institute, Philadelphia, Pennsylvania, United States of America
| | - Ying-Yi Xiao
- Department of Pathology, Yale University School of Medicine, New Haven, Connecticut, United States of America
| | - Michael Wilson
- Department of Pathology and Lab Medicine, University of Rochester Medical Center, Rochester, New York, United States of America
| | - Kristina Owens
- Department of Pathology and Lab Medicine, University of Rochester Medical Center, Rochester, New York, United States of America
| | - Yi Zhang
- Department of Pathology and Lab Medicine, University of Rochester Medical Center, Rochester, New York, United States of America
| | - Archibald Perkins
- Department of Pathology and Lab Medicine, University of Rochester Medical Center, Rochester, New York, United States of America
| |
Collapse
|
11
|
Guo AM, Sun K, Su X, Wang H, Sun H. YY1TargetDB: an integral information resource for Yin Yang 1 target loci. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013; 2013:bat007. [PMID: 23411719 PMCID: PMC3572531 DOI: 10.1093/database/bat007] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Yin Yang 1 (YY1), a ubiquitously expressed transcription factor, plays a critical role in regulating cell development, differentiation, cellular proliferation and tumorigenesis. Previous studies identified many YY1-regulated target genes in both human and mouse. Emerging global mapping by Chromatin ImmnoPrecipitation (ChIP)-based high-throughput experiments indicate that YY1 binds to a vast number of loci genome-wide. However, the information is widely scattered in many disparate poorly cross-indexed literatures; a large portion was only published recently by the ENCODE consortium with limited annotation. A centralized database, which annotates and organizes YY1-binding loci and target motifs in a systematic way with easy access, will be valuable resources for the research community. We therefore implemented a web-based YY1 Target loci Database (YY1TargetDB). This database contains YY1-binding loci (binding peaks) from ChIP-seq and ChIP-on-chip experiments, computationally predicated YY1 and cofactor motifs within each locus. It also collects the experimentally verified YY1-binding motifs from individual researchers. The current version of YY1TargetDB contains 92 314 binding loci identified by ChIP-based experiments; 157 200 YY1-binding motifs in which 42 are experimentally verified and 157 158 are computationally predicted; and 130 759 binding motifs for 47 cofactors. Database URL:http://www.myogenesisdb.org/YY1TargetDB
Collapse
Affiliation(s)
- Andy M Guo
- Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | | | | | | | | |
Collapse
|
12
|
Kulakovskiy I, Levitsky V, Oshchepkov D, Bryzgalov L, Vorontsov I, Makeev V. From binding motifs in ChIP-Seq data to improved models of transcription factor binding sites. J Bioinform Comput Biol 2013; 11:1340004. [PMID: 23427986 DOI: 10.1142/s0219720013400040] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Chromatin immunoprecipitation followed by deep sequencing (ChIP-Seq) became a method of choice to locate DNA segments bound by different regulatory proteins. ChIP-Seq produces extremely valuable information to study transcriptional regulation. The wet-lab workflow is often supported by downstream computational analysis including construction of models of nucleotide sequences of transcription factor binding sites in DNA, which can be used to detect binding sites in ChIP-Seq data at a single base pair resolution. The most popular TFBS model is represented by positional weight matrix (PWM) with statistically independent positional weights of nucleotides in different columns; such PWMs are constructed from a gapless multiple local alignment of sequences containing experimentally identified TFBSs. Modern high-throughput techniques, including ChIP-Seq, provide enough data for careful training of advanced models containing more parameters than PWM. Yet, many suggested multiparametric models often provide only incremental improvement of TFBS recognition quality comparing to traditional PWMs trained on ChIP-Seq data. We present a novel computational tool, diChIPMunk, that constructs TFBS models as optimal dinucleotide PWMs, thus accounting for correlations between nucleotides neighboring in input sequences. diChIPMunk utilizes many advantages of ChIPMunk, its ancestor algorithm, accounting for ChIP-Seq base coverage profiles ("peak shape") and using the effective subsampling-based core procedure which allows processing of large datasets. We demonstrate that diPWMs constructed by diChIPMunk outperform traditional PWMs constructed by ChIPMunk from the same ChIP-Seq data. Software website: http://autosome.ru/dichipmunk/
Collapse
Affiliation(s)
- Ivan Kulakovskiy
- Laboratory of Bioinformatics and Systems Biology, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Street 32, Moscow 119991, GSP-1, Russia.
| | | | | | | | | | | |
Collapse
|
13
|
Kulakovskiy IV, Medvedeva YA, Schaefer U, Kasianov AS, Vorontsov IE, Bajic VB, Makeev VJ. HOCOMOCO: a comprehensive collection of human transcription factor binding sites models. Nucleic Acids Res 2012; 41:D195-202. [PMID: 23175603 PMCID: PMC3531053 DOI: 10.1093/nar/gks1089] [Citation(s) in RCA: 156] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Transcription factor (TF) binding site (TFBS) models are crucial for computational reconstruction of transcription regulatory networks. In existing repositories, a TF often has several models (also called binding profiles or motifs), obtained from different experimental data. Having a single TFBS model for a TF is more pragmatic for practical applications. We show that integration of TFBS data from various types of experiments into a single model typically results in the improved model quality probably due to partial correction of source specific technique bias. We present the Homo sapiens comprehensive model collection (HOCOMOCO, http://autosome.ru/HOCOMOCO/, http://cbrc.kaust.edu.sa/hocomoco/) containing carefully hand-curated TFBS models constructed by integration of binding sequences obtained by both low- and high-throughput methods. To construct position weight matrices to represent these TFBS models, we used ChIPMunk software in four computational modes, including newly developed periodic positional prior mode associated with DNA helix pitch. We selected only one TFBS model per TF, unless there was a clear experimental evidence for two rather distinct TFBS models. We assigned a quality rating to each model. HOCOMOCO contains 426 systematically curated TFBS models for 401 human TFs, where 172 models are based on more than one data source.
Collapse
Affiliation(s)
- Ivan V Kulakovskiy
- Laboratory of Bioinformatics and Systems Biology, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Street 32, Moscow 119991, GSP-1, Russia.
| | | | | | | | | | | | | |
Collapse
|
14
|
Wang X, Zhang A, Ren W, Chen C, Dong J. Genome-wide Inference of Transcription Factor-DNA Binding Specificity in Cell Regeneration Using a Combination Strategy. Chem Biol Drug Des 2012; 80:734-44. [DOI: 10.1111/cbdd.12013] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|