1
|
Zhao J, Huai J. Role of primary aging hallmarks in Alzheimer´s disease. Theranostics 2023; 13:197-230. [PMID: 36593969 PMCID: PMC9800733 DOI: 10.7150/thno.79535] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 11/15/2022] [Indexed: 12/03/2022] Open
Abstract
Alzheimer's disease (AD) is the most common neurodegenerative disease, which severely threatens the health of the elderly and causes significant economic and social burdens. The causes of AD are complex and include heritable but mostly aging-related factors. The primary aging hallmarks include genomic instability, telomere wear, epigenetic changes, and loss of protein stability, which play a dominant role in the aging process. Although AD is closely associated with the aging process, the underlying mechanisms involved in AD pathogenesis have not been well characterized. This review summarizes the available literature about primary aging hallmarks and their roles in AD pathogenesis. By analyzing published literature, we attempted to uncover the possible mechanisms of aberrant epigenetic markers with related enzymes, transcription factors, and loss of proteostasis in AD. In particular, the importance of oxidative stress-induced DNA methylation and DNA methylation-directed histone modifications and proteostasis are highlighted. A molecular network of gene regulatory elements that undergoes a dynamic change with age may underlie age-dependent AD pathogenesis, and can be used as a new drug target to treat AD.
Collapse
|
2
|
Sghaier N, Essemine J, Ayed RB, Gorai M, Ben Marzoug R, Rebai A, Qu M. An Evidence Theory and Fuzzy Logic Combined Approach for the Prediction of Potential ARF-Regulated Genes in Quinoa. PLANTS (BASEL, SWITZERLAND) 2022; 12:71. [PMID: 36616201 PMCID: PMC9824623 DOI: 10.3390/plants12010071] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Accepted: 11/26/2022] [Indexed: 06/17/2023]
Abstract
Quinoa constitutes among the tolerant plants to the challenging and harmful abiotic environmental factors. Quinoa was selected as among the model crops destined for bio-saline agriculture that could contribute to the staple food security for an ever-growing worldwide population under various climate change scenarios. The auxin response factors (ARFs) constitute the main contributors in the plant adaptation to severe environmental conditions. Thus, the determination of the ARF-binding sites represents the major step that could provide promising insights helping in plant breeding programs and improving agronomic traits. Hence, determining the ARF-binding sites is a challenging task, particularly in species with large genome sizes. In this report, we present a data fusion approach based on Dempster-Shafer evidence theory and fuzzy set theory to predict the ARF-binding sites. We then performed an "In-silico" identification of the ARF-binding sites in Chenopodium quinoa. The characterization of some known pathways implicated in the auxin signaling in other higher plants confirms our prediction reliability. Furthermore, several pathways with no or little available information about their functions were identified to play important roles in the adaptation of quinoa to environmental conditions. The predictive auxin response genes associated with the detected ARF-binding sites may certainly help to explore the biological roles of some unknown genes newly identified in quinoa.
Collapse
Affiliation(s)
- Nesrine Sghaier
- National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya 572024, China
- CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200032, China
- Laboratory of Advanced Technology and Intelligent Systems, National Engineering School of Sousse, Sousse 4023, Tunisia
| | - Jemaa Essemine
- CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200032, China
| | - Rayda Ben Ayed
- Department of Agronomy and Plant Biotechnology, National Institute of Agronomy of Tunisia (INAT), 43 Avenue Charles Nicolle, 1082 El Mahrajène, University of Carthage-Tunis, Tunis 1082, Tunisia
- Laboratory of Extremophile Plants, Centre of Biotechnology of Borj-Cédria, B.P. 901, Hammam Lif 2050, Tunisia
| | - Mustapha Gorai
- Higher Institute of Applied Biology Medenine, University of Gabes, Medenine 4119, Tunisia
| | - Riadh Ben Marzoug
- Laboratory of Molecular and Cellular Screening Processes, Sfax Biotechnology Center, B.P 1177, Sfax 3018, Tunisia
| | - Ahmed Rebai
- Laboratory of Molecular and Cellular Screening Processes, Sfax Biotechnology Center, B.P 1177, Sfax 3018, Tunisia
| | - Mingnan Qu
- National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya 572024, China
- CAS Center for Excellence in Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200032, China
| |
Collapse
|
3
|
Rivière Q, Corso M, Ciortan M, Noël G, Verbruggen N, Defrance M. Exploiting Genomic Features to Improve the Prediction of Transcription Factor-Binding Sites in Plants. PLANT & CELL PHYSIOLOGY 2022; 63:1457-1473. [PMID: 35799371 DOI: 10.1093/pcp/pcac095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2021] [Revised: 06/07/2022] [Accepted: 07/06/2022] [Indexed: 06/15/2023]
Abstract
The identification of transcription factor (TF) target genes is central in biology. A popular approach is based on the location by pattern matching of potential cis-regulatory elements (CREs). During the last few years, tools integrating next-generation sequencing data have been developed to improve the performance of pattern matching. However, such tools have not yet been comprehensively evaluated in plants. Hence, we developed a new streamlined method aiming at predicting CREs and target genes of plant TFs in specific organs or conditions. Our approach implements a supervised machine learning strategy, which allows decision rule models to be learnt using TF ChIP-chip/seq experimental data. Different layers of genomic features were integrated in predictive models: the position on the gene, the DNA sequence conservation, the chromatin state and various CRE footprints. Among the tested features, the chromatin features were crucial for improving the accuracy of the method. Furthermore, we evaluated the transferability of predictive models across TFs, organs and species. Finally, we validated our method by correctly inferring the target genes of key TFs controlling metabolite biosynthesis at the organ level in Arabidopsis. We developed a tool-Wimtrap-to reproduce our approach in plant species and conditions/organs for which ChIP-chip/seq data are available. Wimtrap is a user-friendly R package that supports an R Shiny web interface and is provided with pre-built models that can be used to quickly get predictions of CREs and TF gene targets in different organs or conditions in Arabidopsis thaliana, Solanum lycopersicum, Oryza sativa and Zea mays.
Collapse
Affiliation(s)
- Quentin Rivière
- Brussels Bioengineering School, Laboratory of Plant Physiology and molecular Genetics, Université Libre de Bruxelles, Brussels 1050, Belgium
| | - Massimiliano Corso
- Brussels Bioengineering School, Laboratory of Plant Physiology and molecular Genetics, Université Libre de Bruxelles, Brussels 1050, Belgium
- INRAE, AgroParisTech, Institut Jean-Pierre Bourgin (IJPB), Université Paris-Saclay, Versailles 78000, France
| | - Madalina Ciortan
- Interuniversity Institute of Bioinformatics in Brussels, Machine Learning Group, Université Libre de Bruxelles, Brussels 1050, Belgium
| | - Grégoire Noël
- Functional and Evolutionary Entomology, Gembloux Agro-Bio Tech, University of Liège, Passage des Déportés 2, Gembloux 5030, Belgium
| | - Nathalie Verbruggen
- Brussels Bioengineering School, Laboratory of Plant Physiology and molecular Genetics, Université Libre de Bruxelles, Brussels 1050, Belgium
| | - Matthieu Defrance
- Interuniversity Institute of Bioinformatics in Brussels, Machine Learning Group, Université Libre de Bruxelles, Brussels 1050, Belgium
| |
Collapse
|
4
|
SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features Learning from a Language Model. Genes (Basel) 2022; 13:genes13040568. [PMID: 35456374 PMCID: PMC9028922 DOI: 10.3390/genes13040568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Revised: 03/22/2022] [Accepted: 03/22/2022] [Indexed: 11/16/2022] Open
Abstract
A large number of inorganic and organic compounds are able to bind DNA and form complexes, among which drug-related molecules are important. Chromatin accessibility changes not only directly affect drug–DNA interactions, but they can promote or inhibit the expression of the critical genes associated with drug resistance by affecting the DNA binding capacity of TFs and transcriptional regulators. However, the biological experimental techniques for measuring it are expensive and time-consuming. In recent years, several kinds of computational methods have been proposed to identify accessible regions of the genome. Existing computational models mostly ignore the contextual information provided by the bases in gene sequences. To address these issues, we proposed a new solution called SemanticCAP. It introduces a gene language model that models the context of gene sequences and is thus able to provide an effective representation of a certain site in a gene sequence. Basically, we merged the features provided by the gene language model into our chromatin accessibility model. During the process, we designed methods called SFA and SFC to make feature fusion smoother. Compared to DeepSEA, gkm-SVM, and k-mer using public benchmarks, our model proved to have better performance, showing a 1.25% maximum improvement in auROC and a 2.41% maximum improvement in auPRC.
Collapse
|
5
|
D’Arienzo V, Ferguson J, Giraud G, Chapus F, Harris JM, Wing PAC, Claydon A, Begum S, Zhuang X, Balfe P, Testoni B, McKeating JA, Parish JL. The CCCTC-binding factor CTCF represses hepatitis B virus enhancer I and regulates viral transcription. Cell Microbiol 2021; 23:e13274. [PMID: 33006186 PMCID: PMC7116737 DOI: 10.1111/cmi.13274] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Revised: 09/09/2020] [Accepted: 09/29/2020] [Indexed: 12/17/2022]
Abstract
Hepatitis B virus (HBV) infection is of global importance with over 2 billion people exposed to the virus during their lifetime and at risk of progressive liver disease, cirrhosis and hepatocellular carcinoma. HBV is a member of the Hepadnaviridae family that replicates via episomal copies of a covalently closed circular DNA (cccDNA) genome. The chromatinization of this small viral genome, with overlapping open reading frames and regulatory elements, suggests an important role for epigenetic pathways to regulate viral transcription. The chromatin-organising transcriptional insulator protein, CCCTC-binding factor (CTCF), has been reported to regulate transcription in a diverse range of viruses. We identified two conserved CTCF binding sites in the HBV genome within enhancer I and chromatin immunoprecipitation (ChIP) analysis demonstrated an enrichment of CTCF binding to integrated or episomal copies of the viral genome. siRNA knock-down of CTCF results in a significant increase in pre-genomic RNA levels in de novo infected HepG2 cells and those supporting episomal HBV DNA replication. Furthermore, mutation of these sites in HBV DNA minicircles abrogated CTCF binding and increased pre-genomic RNA levels, providing evidence of a direct role for CTCF in repressing HBV transcription.
Collapse
Affiliation(s)
| | - Jack Ferguson
- institute of Cancer and Genomic sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| | - Guillaume Giraud
- CRCL INSERM and Cancer Research Center of Lyon (CRCL), Lyon, France
| | - Fleur Chapus
- CRCL INSERM and Cancer Research Center of Lyon (CRCL), Lyon, France
| | - James M. Harris
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Peter A. C. Wing
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Adam Claydon
- institute of Cancer and Genomic sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| | - Sophia Begum
- institute of Cancer and Genomic sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| | - Xiaodong Zhuang
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Peter Balfe
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Barbara Testoni
- CRCL INSERM and Cancer Research Center of Lyon (CRCL), Lyon, France
| | | | - Joanna L. Parish
- institute of Cancer and Genomic sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| |
Collapse
|
6
|
Zhou M, Li H, Wang X, Guan Y. Evidence of widespread, independent sequence signature for transcription factor cobinding. Genome Res 2021; 31:265-278. [PMID: 33303494 PMCID: PMC7849410 DOI: 10.1101/gr.267310.120] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 12/03/2020] [Indexed: 01/03/2023]
Abstract
Transcription factors (TFs) are the vocabulary that genomes use to regulate gene expression and phenotypes. The interactions among TFs enrich this vocabulary and orchestrate diverse biological processes. Although simple models identify open chromatin and the presence of TF motifs as the two major contributors to TF binding patterns, it remains elusive what contributes to the in vivo TF cobinding landscape. In this study, we developed a machine learning algorithm to explore the contributors of the cobinding patterns. The algorithm substantially outperforms the state-of-the-field models for TF cobinding prediction. Game theory-based feature importance analysis reveals that, for most of the TF pairs we studied, independent motif sequences contribute one or more of the two TFs under investigation to their cobinding patterns. Such independent motif sequences include, but are not limited to, transcription initiation-related proteins and known TF complexes. We found the motif sequence signatures and the TFs are rarely mutual, corroborating a hierarchical and directional organization of the regulatory network and refuting the possibility of artifacts caused by shared sequence similarity with the TFs under investigation. We modeled such regulatory language with directed graphs, which reveal shared, global factors that are related to many binding and cobinding patterns.
Collapse
Affiliation(s)
- Manqi Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Hongyang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Xueqing Wang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| |
Collapse
|
7
|
Srivastava D, Aydin B, Mazzoni EO, Mahony S. An interpretable bimodal neural network characterizes the sequence and preexisting chromatin predictors of induced transcription factor binding. Genome Biol 2021; 22:20. [PMID: 33413545 PMCID: PMC7788824 DOI: 10.1186/s13059-020-02218-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Accepted: 12/03/2020] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Transcription factor (TF) binding specificity is determined via a complex interplay between the transcription factor's DNA binding preference and cell type-specific chromatin environments. The chromatin features that correlate with transcription factor binding in a given cell type have been well characterized. For instance, the binding sites for a majority of transcription factors display concurrent chromatin accessibility. However, concurrent chromatin features reflect the binding activities of the transcription factor itself and thus provide limited insight into how genome-wide TF-DNA binding patterns became established in the first place. To understand the determinants of transcription factor binding specificity, we therefore need to examine how newly activated transcription factors interact with sequence and preexisting chromatin landscapes. RESULTS Here, we investigate the sequence and preexisting chromatin predictors of TF-DNA binding by examining the genome-wide occupancy of transcription factors that have been induced in well-characterized chromatin environments. We develop Bichrom, a bimodal neural network that jointly models sequence and preexisting chromatin data to interpret the genome-wide binding patterns of induced transcription factors. We find that the preexisting chromatin landscape is a differential global predictor of TF-DNA binding; incorporating preexisting chromatin features improves our ability to explain the binding specificity of some transcription factors substantially, but not others. Furthermore, by analyzing site-level predictors, we show that transcription factor binding in previously inaccessible chromatin tends to correspond to the presence of more favorable cognate DNA sequences. CONCLUSIONS Bichrom thus provides a framework for modeling, interpreting, and visualizing the joint sequence and chromatin landscapes that determine TF-DNA binding dynamics.
Collapse
Affiliation(s)
- Divyanshi Srivastava
- Center for Eukaryotic Gene Regulation, Department of Biochemistry & Molecular Biology, Pennsylvania State University, University Park, PA, USA
| | - Begüm Aydin
- Department of Biology, New York University, New York, NY, USA
| | | | - Shaun Mahony
- Center for Eukaryotic Gene Regulation, Department of Biochemistry & Molecular Biology, Pennsylvania State University, University Park, PA, USA.
| |
Collapse
|
8
|
Wang H, Liu Y, Guan H, Fan GL. The Regulation of Target Genes by Co-occupancy of Transcription Factors, c-Myc and Mxi1 with Max in the Mouse Cell Line. Curr Bioinform 2020. [DOI: 10.2174/1574893614666191106103633] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Background:
The regulatory function of transcription factors on genes is not only related
to the location of binding genes and its related functions, but is also related to the methods of
binding.
Objective:
It is necessary to study the regulation effects in different binding methods on target genes.
Methods:
In this study, we provided a reliable theoretical basis for studying gene expression
regulation of co-binding transcription factors and further revealed the specific regulation of
transcription factor co-binding in cancer cells.
Results:
Transcription factors tend to combine with other transcription factors in the regulatory
region to form a competitive or synergistic relationship to regulate target genes accurately.
Conclusion:
We found that up-regulated genes in cancer cells were involved in the regulation of
their own immune system related to the normal cells.
Collapse
Affiliation(s)
- Hui Wang
- Department of Physics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, China
| | - Yuan Liu
- Department of Physics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, China
| | - Hua Guan
- ENT Department, Huhhot First Hospital, Hohhot, China
| | - Guo-Liang Fan
- Department of Physics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, China
| |
Collapse
|
9
|
Chen L, Capra JA. Learning and interpreting the gene regulatory grammar in a deep learning framework. PLoS Comput Biol 2020; 16:e1008334. [PMID: 33137083 PMCID: PMC7660921 DOI: 10.1371/journal.pcbi.1008334] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Revised: 11/12/2020] [Accepted: 09/12/2020] [Indexed: 12/12/2022] Open
Abstract
Deep neural networks (DNNs) have achieved state-of-the-art performance in identifying gene regulatory sequences, but they have provided limited insight into the biology of regulatory elements due to the difficulty of interpreting the complex features they learn. Several models of how combinatorial binding of transcription factors, i.e. the regulatory grammar, drives enhancer activity have been proposed, ranging from the flexible TF billboard model to the stringent enhanceosome model. However, there is limited knowledge of the prevalence of these (or other) sequence architectures across enhancers. Here we perform several hypothesis-driven analyses to explore the ability of DNNs to learn the regulatory grammar of enhancers. We created synthetic datasets based on existing hypotheses about combinatorial transcription factor binding site (TFBS) patterns, including homotypic clusters, heterotypic clusters, and enhanceosomes, from real TF binding motifs from diverse TF families. We then trained deep residual neural networks (ResNets) to model the sequences under a range of scenarios that reflect real-world multi-label regulatory sequence prediction tasks. We developed a gradient-based unsupervised clustering method to extract the patterns learned by the ResNet models. We demonstrated that simulated regulatory grammars are best learned in the penultimate layer of the ResNets, and the proposed method can accurately retrieve the regulatory grammar even when there is heterogeneity in the enhancer categories and a large fraction of TFBS outside of the regulatory grammar. However, we also identify common scenarios where ResNets fail to learn simulated regulatory grammars. Finally, we applied the proposed method to mouse developmental enhancers and were able to identify the components of a known heterotypic TF cluster. Our results provide a framework for interpreting the regulatory rules learned by ResNets, and they demonstrate that the ability and efficiency of ResNets in learning the regulatory grammar depends on the nature of the prediction task.
Collapse
Affiliation(s)
- Ling Chen
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, United States of America
| | - John A. Capra
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, United States of America
- Vanderbilt Genetics Institute and Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States of America
- Department of Computer Science, Vanderbilt University, Nashville, TN, United States of America
| |
Collapse
|
10
|
Zhou J, Lu Q, Xu R, Gui L, Wang H. Prediction of TF-Binding Site by Inclusion of Higher Order Position Dependencies. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1383-1393. [PMID: 30629513 DOI: 10.1109/tcbb.2019.2892124] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Most proposed methods for TF-binding site (TFBS) predictions only use low order dependencies for predictions due to the lack of efficient methods to extract higher order dependencies. In this work, we first propose a novel method to extract higher order dependencies by applying CNN on histone modification features. We then propose a novel TFBS prediction method, referred to as CNN_TF, by incorporating low order and higher order dependencies. CNN_TF is first evaluated on 13 TFs in the mES cell. Results show that using higher order dependencies outperforms low order dependencies significantly on 11 TFs. This indicates that higher order dependencies are indeed more effective for TFBS predictions than low order dependencies. Further experiments show that using both low order dependencies and higher order dependencies improves performance significantly on 12 TFs, indicating the two dependency types are complementary. To evaluate the influence of cell-types on prediction performances, CNN_TF was applied to five TFs in five cell-types of humans. Even though low order dependencies and higher order dependencies show different contributions in different cell-types, they are always complementary in predictions. When comparing to several state-of-the-art methods, CNN_TF outperforms them by at least 5.3 percent in AUPR.
Collapse
|
11
|
Zhou J, Lu Q, Gui L, Xu R, Long Y, Wang H. MTTFsite: cross-cell type TF binding site prediction by using multi-task learning. Bioinformatics 2020; 35:5067-5077. [PMID: 31161194 PMCID: PMC6954652 DOI: 10.1093/bioinformatics/btz451] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 05/19/2019] [Accepted: 05/30/2019] [Indexed: 12/30/2022] Open
Abstract
Motivation The prediction of transcription factor binding sites (TFBSs) is crucial for gene expression analysis. Supervised learning approaches for TFBS predictions require large amounts of labeled data. However, many TFs of certain cell types either do not have sufficient labeled data or do not have any labeled data. Results In this paper, a multi-task learning framework (called MTTFsite) is proposed to address the lack of labeled data problem by leveraging on labeled data available in cross-cell types. The proposed MTTFsite contains a shared CNN to learn common features for all cell types and a private CNN for each cell type to learn private features. The common features are aimed to help predicting TFBSs for all cell types especially those cell types that lack labeled data. MTTFsite is evaluated on 241 cell type TF pairs and compared with a baseline method without using any multi-task learning model and a fully shared multi-task model that uses only a shared CNN and do not use private CNNs. For cell types with insufficient labeled data, results show that MTTFsite performs better than the baseline method and the fully shared model on more than 89% pairs. For cell types without any labeled data, MTTFsite outperforms the baseline method and the fully shared model by more than 80 and 93% pairs, respectively. A novel gene expression prediction method (called TFChrome) using both MTTFsite and histone modification features is also presented. Results show that TFBSs predicted by MTTFsite alone can achieve good performance. When MTTFsite is combined with histone modification features, a significant 5.7% performance improvement is obtained. Availability and implementation The resource and executable code are freely available at http://hlt.hitsz.edu.cn/MTTFsite/ and http://www.hitsz-hlt.com:8080/MTTFsite/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiyun Zhou
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China.,Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong
| | - Qin Lu
- Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong
| | - Lin Gui
- Department of Computer Science, University of Warwick, Coventry CV4 4AL, UK
| | - Ruifeng Xu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China
| | - Yunfei Long
- Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong
| | - Hongpeng Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China
| |
Collapse
|
12
|
Using Machine-Learning Algorithms for Eutrophication Modeling: Case Study of Mar Menor Lagoon (Spain). INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:ijerph17041189. [PMID: 32069834 PMCID: PMC7068380 DOI: 10.3390/ijerph17041189] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Revised: 02/07/2020] [Accepted: 02/09/2020] [Indexed: 11/16/2022]
Abstract
The Mar Menor is a hypersaline coastal lagoon with high environmental value and a characteristic example of a highly anthropized hydro-ecosystem located in the southeast of Spain. An unprecedented eutrophication crisis in 2016 and 2019 with abrupt changes in the quality of its waters caused a great social alarm. Understanding and modeling the level of a eutrophication indicator, such as chlorophyll-a (Chl-a), benefits the management of this complex system. In this study, we investigate the potential machine learning (ML) methods to predict the level of Chl-a. Particularly, Multilayer Neural Networks (MLNNs) and Support Vector Regressions (SVRs) are evaluated using as a target dataset information of up to nine different water quality parameters. The most relevant input combinations were extracted using wrapper feature selection methods which simplified the structure of the model, resulting in a more accurate and efficient procedure. Although the performance in the validation phase showed that SVR models obtained better results than MLNNs, experimental results indicated that both ML algorithms provide satisfactory results in the prediction of Chl-a concentration, reaching up to 0.7 R2CV (cross-validated coefficient of determination) for the best-fit models.
Collapse
|
13
|
Homotypic cooperativity and collective binding are determinants of bHLH specificity and function. Proc Natl Acad Sci U S A 2019; 116:16143-16152. [PMID: 31341088 DOI: 10.1073/pnas.1818015116] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Eukaryotic cells express transcription factor (TF) paralogues that bind to nearly identical DNA sequences in vitro but bind at different genomic loci and perform different functions in vivo. Predicting how 2 paralogous TFs bind in vivo using DNA sequence alone is an important open problem. Here, we analyzed 2 yeast bHLH TFs, Cbf1p and Tye7p, which have highly similar binding preferences in vitro, yet bind at almost completely nonoverlapping target loci in vivo. We dissected the determinants of specificity for these 2 proteins by making a number of chimeric TFs in which we swapped different domains of Cbf1p and Tye7p and determined the effects on in vivo binding and cellular function. From these experiments, we learned that the Cbf1p dimer achieves its specificity by binding cooperatively with other Cbf1p dimers bound nearby. In contrast, we found that Tye7p achieves its specificity by binding cooperatively with 3 other DNA-binding proteins, Gcr1p, Gcr2p, and Rap1p. Remarkably, most promoters (63%) that are bound by Tye7p do not contain a consensus Tye7p binding site. Using this information, we were able to build simple models to accurately discriminate bound and unbound genomic loci for both Cbf1p and Tye7p. We then successfully reprogrammed the human bHLH NPAS2 to bind Cbf1p in vivo targets and a Tye7p target intergenic region to be bound by Cbf1p. These results demonstrate that the genome-wide binding targets of paralogous TFs can be discriminated using sequence information, and provide lessons about TF specificity that can be applied across the phylogenetic tree.
Collapse
|
14
|
Lan G, Zhou J, Xu R, Lu Q, Wang H. Cross-Cell-Type Prediction of TF-Binding Site by Integrating Convolutional Neural Network and Adversarial Network. Int J Mol Sci 2019; 20:ijms20143425. [PMID: 31336830 PMCID: PMC6679139 DOI: 10.3390/ijms20143425] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Revised: 06/27/2019] [Accepted: 07/08/2019] [Indexed: 01/18/2023] Open
Abstract
Transcription factor binding sites (TFBSs) play an important role in gene expression regulation. Many computational methods for TFBS prediction need sufficient labeled data. However, many transcription factors (TFs) lack labeled data in cell types. We propose a novel method, referred to as DANN_TF, for TFBS prediction. DANN_TF consists of a feature extractor, a label predictor, and a domain classifier. The feature extractor and the domain classifier constitute an Adversarial Network, which ensures that learned features are common features across different cell types. DANN_TF is evaluated on five TFs in five cell types with a total of 25 cell-type TF pairs and compared to a baseline method which does not use Adversarial Network. For both data augmentation and cross-cell-type prediction, DANN_TF performs better than the baseline method on most cell-type TF pairs. DANN_TF is further evaluated by an additional 13 TFs in the five cell types with a total of 65 cell-type TF pairs. Results show that DANN_TF achieves significantly higher AUC than the baseline method on 96.9% pairs of the 65 cell-type TF pairs. This is a strong indication that DANN_TF can indeed learn common features for cross-cell-type TFBS prediction.
Collapse
Affiliation(s)
- Gongqiang Lan
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China
| | - Jiyun Zhou
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China.
| | - Ruifeng Xu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China.
| | - Qin Lu
- Department of Computing, The Hong Kong Polytechnic University, Hong Kong 810005, China
| | - Hongpeng Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China
| |
Collapse
|
15
|
Keilwagen J, Posch S, Grau J. Accurate prediction of cell type-specific transcription factor binding. Genome Biol 2019; 20:9. [PMID: 30630522 PMCID: PMC6327544 DOI: 10.1186/s13059-018-1614-y] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Accepted: 12/18/2018] [Indexed: 01/11/2023] Open
Abstract
Prediction of cell type-specific, in vivo transcription factor binding sites is one of the central challenges in regulatory genomics. Here, we present our approach that earned a shared first rank in the "ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge" in 2017. In post-challenge analyses, we benchmark the influence of different feature sets and find that chromatin accessibility and binding motifs are sufficient to yield state-of-the-art performance. Finally, we provide 682 lists of predicted peaks for a total of 31 transcription factors in 22 primary cell types and tissues and a user-friendly version of our approach, Catchitt, for download.
Collapse
Affiliation(s)
- Jens Keilwagen
- Institute for Biosafety in Plant Biotechnology, Julius Kühn-Institut (JKI) - Federal Research Centre for Cultivated Plants, Erwin-Baur-Straße 27, Quedlinburg, 06484 Germany
| | - Stefan Posch
- Institute of Computer Science, Martin Luther University Halle–Wittenberg, Von-Seckendorff-Platz 1, Halle (Saale), 06120 Germany
| | - Jan Grau
- Institute of Computer Science, Martin Luther University Halle–Wittenberg, Von-Seckendorff-Platz 1, Halle (Saale), 06120 Germany
| |
Collapse
|
16
|
Girgis HZ, Velasco A, Reyes ZE. HebbPlot: an intelligent tool for learning and visualizing chromatin mark signatures. BMC Bioinformatics 2018; 19:310. [PMID: 30176808 PMCID: PMC6122555 DOI: 10.1186/s12859-018-2312-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2017] [Accepted: 08/14/2018] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Histone modifications play important roles in gene regulation, heredity, imprinting, and many human diseases. The histone code is complex and consists of more than 100 marks. Therefore, biologists need computational tools to characterize general signatures representing the distributions of tens of chromatin marks around thousands of regions. RESULTS To this end, we developed a software tool, HebbPlot, which utilizes a Hebbian neural network in learning a general chromatin signature from regions with a common function. Hebbian networks can learn the associations between tens of marks and thousands of regions. HebbPlot presents a signature as a digital image, which can be easily interpreted. Moreover, signatures produced by HebbPlot can be compared quantitatively. We validated HebbPlot in six case studies. The results of these case studies are novel or validating results already reported in the literature, indicating the accuracy of HebbPlot. Our results indicate that promoters have a directional chromatin signature; several marks tend to stretch downstream or upstream. H3K4me3 and H3K79me2 have clear directional distributions around active promoters. In addition, the signatures of high- and low-CpG promoters are different; H3K4me3, H3K9ac, and H3K27ac are the most different marks. When we studied the signatures of enhancers active in eight tissues, we observed that these signatures are similar, but not identical. Further, we identified some histone modifications - H3K36me3, H3K79me1, H3K79me2, and H4K8ac - that are associated with coding regions of active genes. Other marks - H4K12ac, H3K14ac, H3K27me3, and H2AK5ac - were found to be weakly associated with coding regions of inactive genes. CONCLUSIONS This study resulted in a novel software tool, HebbPlot, for learning and visualizing the chromatin signature of a genetic element. Using HebbPlot, we produced a visual catalog of the signatures of multiple genetic elements in 57 cell types available through the Roadmap Epigenomics Project. Furthermore, we made a progress toward a functional catalog consisting of 22 histone marks. In sum, HebbPlot is applicable to a wide array of studies, facilitating the deciphering of the histone code.
Collapse
Affiliation(s)
- Hani Z. Girgis
- Tandy School of Computer Science, University of Tulsa, 800 South Tucker Drive, Tulsa, 74104-9700 OK USA
| | - Alfredo Velasco
- Tandy School of Computer Science, University of Tulsa, 800 South Tucker Drive, Tulsa, 74104-9700 OK USA
| | - Zachary E. Reyes
- Tandy School of Computer Science, University of Tulsa, 800 South Tucker Drive, Tulsa, 74104-9700 OK USA
| |
Collapse
|
17
|
Chen A, Chen D, Chen Y. Advances of DNase-seq for mapping active gene regulatory elements across the genome in animals. Gene 2018; 667:83-94. [DOI: 10.1016/j.gene.2018.05.033] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2017] [Revised: 05/04/2018] [Accepted: 05/10/2018] [Indexed: 12/16/2022]
|
18
|
Aziz HA, Abdel-Salam ASG, Al-Obaide MAI, Alobydi HW, Al-Humaish S. Kynurenine 3-Monooxygenase Gene Associated With Nicotine Initiation and Addiction: Analysis of Novel Regulatory Features at 5' and 3'-Regions. Front Genet 2018; 9:198. [PMID: 29951083 PMCID: PMC6008986 DOI: 10.3389/fgene.2018.00198] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Accepted: 05/17/2018] [Indexed: 11/13/2022] Open
Abstract
Tobacco smoking is widespread behavior in Qatar and worldwide and is considered one of the major preventable causes of ill health and death. Nicotine is part of tobacco smoke that causes numerous health risks and is incredibly addictive; it binds to the α7 nicotinic acetylcholine receptor (α7nAChR) in the brain. Recent studies showed α7nAChR involvement in the initiation and addiction of smoking. Kynurenic acid (KA), a significant tryptophan metabolite, is an antagonist of α7nAChR. Inhibition of kynurenine 3-monooxygenase enzyme encoded by KMO enhances the KA levels. Modulating KMO gene expression could be a useful tactic for the treatment of tobacco initiation and dependence. Since KMO regulation is still poorly understood, we aimed to investigate the 5' and 3'-regulatory factors of KMO gene to advance our knowledge to modulate KMO gene expression. In this study, bioinformatics methods were used to identify the regulatory sequences associated with expression of KMO. The displayed differential expression of KMO mRNA in the same tissue and different tissues suggested the specific usage of the KMO multiple alternative promoters. Eleven KMO alternative promoters identified at 5'-regulatory region contain TATA-Box, lack CpG Island (CGI) and showed dinucleotide base-stacking energy values specific to transcription factor binding sites (TFBSs). The structural features of regulatory sequences can influence the transcription process and cell type-specific expression. The uncharacterized LOC105373233 locus coding for non-coding RNA (ncRNA) located on the reverse strand in a convergent manner at the 3'-side of KMO locus. The two genes likely expressed by a promoter that lacks TATA-Box harbor CGI and two TFBSs linked to the bidirectional transcription, the NRF1, and ZNF14 motifs. We identified two types of microRNA (miR) in the uncharacterized LOC105373233 ncRNA, which are like hsa-miR-5096 and hsa-miR-1285-3p and can target the miR recognition element (MRE) in the KMO mRNA. Pairwise sequence alignment identified 52 nucleotides sequence hosting MRE in the KMO 3' UTR untranslated region complementary to the ncRNA LOC105373233 sequence. We speculate that the identified miRs can modulate the KMO expression and together with alternative promoters at the 5'-regulatory region of KMO might contribute to the development of novel diagnostic and therapeutic algorithm for tobacco smoking.
Collapse
Affiliation(s)
- Hassan A Aziz
- College of Arts and Sciences, Qatar University, Doha, Qatar
| | | | - Mohammed A I Al-Obaide
- School of Medicine, Texas Tech University Health Sciences Center, Amarillo, TX, United States
| | | | | |
Collapse
|
19
|
Behera V, Evans P, Face CJ, Hamagami N, Sankaranarayanan L, Keller CA, Giardine B, Tan K, Hardison RC, Shi J, Blobel GA. Exploiting genetic variation to uncover rules of transcription factor binding and chromatin accessibility. Nat Commun 2018; 9:782. [PMID: 29472540 PMCID: PMC5823854 DOI: 10.1038/s41467-018-03082-6] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2017] [Accepted: 01/18/2018] [Indexed: 12/13/2022] Open
Abstract
Single-nucleotide variants that underlie phenotypic variation can affect chromatin occupancy of transcription factors (TFs). To delineate determinants of in vivo TF binding and chromatin accessibility, we introduce an approach that compares ChIP-seq and DNase-seq data sets from genetically divergent murine erythroid cell lines. The impact of discriminatory single-nucleotide variants on TF ChIP signal enables definition at single base resolution of in vivo binding characteristics of nuclear factors GATA1, TAL1, and CTCF. We further develop a facile complementary approach to more deeply test the requirements of critical nucleotide positions for TF binding by combining CRISPR-Cas9-mediated mutagenesis with ChIP and targeted deep sequencing. Finally, we extend our analytical pipeline to identify nearby contextual DNA elements that modulate chromatin binding by these three TFs, and to define sequences that impact kb-scale chromatin accessibility. Combined, our approaches reveal insights into the genetic basis of TF occupancy and their interplay with chromatin features.
Collapse
Affiliation(s)
- Vivek Behera
- University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Perry Evans
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Carolyne J Face
- Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Nicole Hamagami
- Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | | | | | | | - Kai Tan
- Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | | | - Junwei Shi
- University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Gerd A Blobel
- Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
| |
Collapse
|
20
|
Liu S, Zibetti C, Wan J, Wang G, Blackshaw S, Qian J. Assessing the model transferability for prediction of transcription factor binding sites based on chromatin accessibility. BMC Bioinformatics 2017; 18:355. [PMID: 28750606 PMCID: PMC5530957 DOI: 10.1186/s12859-017-1769-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2017] [Accepted: 07/19/2017] [Indexed: 12/04/2022] Open
Abstract
Background Computational prediction of transcription factor (TF) binding sites in different cell types is challenging. Recent technology development allows us to determine the genome-wide chromatin accessibility in various cellular and developmental contexts. The chromatin accessibility profiles provide useful information in prediction of TF binding events in various physiological conditions. Furthermore, ChIP-Seq analysis was used to determine genome-wide binding sites for a range of different TFs in multiple cell types. Integration of these two types of genomic information can improve the prediction of TF binding events. Results We assessed to what extent a model built upon on other TFs and/or other cell types could be used to predict the binding sites of TFs of interest. A random forest model was built using a set of cell type-independent features such as specific sequences recognized by the TFs and evolutionary conservation, as well as cell type-specific features derived from chromatin accessibility data. Our analysis suggested that the models learned from other TFs and/or cell lines performed almost as well as the model learned from the target TF in the cell type of interest. Interestingly, models based on multiple TFs performed better than single-TF models. Finally, we proposed a universal model, BPAC, which was generated using ChIP-Seq data from multiple TFs in various cell types. Conclusion Integrating chromatin accessibility information with sequence information improves prediction of TF binding.The prediction of TF binding is transferable across TFs and/or cell lines suggesting there are a set of universal “rules”. A computational tool was developed to predict TF binding sites based on the universal “rules”. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1769-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sheng Liu
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA
| | - Cristina Zibetti
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA
| | - Jun Wan
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA
| | - Guohua Wang
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA
| | - Seth Blackshaw
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA.,Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA.,Department of Neurology, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA.,Centre for Human Systems Biology, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA.,Institute for Cell Engineering, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA
| | - Jiang Qian
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, 21287, MD, USA.
| |
Collapse
|
21
|
Weber B, Zicola J, Oka R, Stam M. Plant Enhancers: A Call for Discovery. TRENDS IN PLANT SCIENCE 2016; 21:974-987. [PMID: 27593567 DOI: 10.1016/j.tplants.2016.07.013] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2016] [Revised: 07/18/2016] [Accepted: 07/28/2016] [Indexed: 05/12/2023]
Abstract
Higher eukaryotes typically contain many different cell types, displaying different cellular functions that are influenced by biotic and abiotic cues. The different functions are characterized by specific gene expression patterns mediated by regulatory sequences such as transcriptional enhancers. Recent genome-wide approaches have identified thousands of enhancers in animals, reviving interest in enhancers in gene regulation. Although the regulatory roles of plant enhancers are as crucial as those in animals, genome-wide approaches have only very recently been applied to plants. Here we review characteristics of enhancers at the DNA and chromatin level in plants and other species, their similarities and differences, and techniques widely used for genome-wide discovery of enhancers in animal systems that can be implemented in plants.
Collapse
Affiliation(s)
- Blaise Weber
- Swammerdam Institute for Life Sciences, Universiteit van Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands
| | - Johan Zicola
- Max Planck Institute for Plant Breeding Research, Department Plant Developmental Biology, Carl-von-Linné-Weg 10, 50829 Köln, Germany
| | - Rurika Oka
- Swammerdam Institute for Life Sciences, Universiteit van Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands
| | - Maike Stam
- Swammerdam Institute for Life Sciences, Universiteit van Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands.
| |
Collapse
|
22
|
Sharmin M, Bravo HC, Hannenhalli S. Heterogeneity of transcription factor binding specificity models within and across cell lines. Genome Res 2016; 26:1110-23. [PMID: 27311443 PMCID: PMC4971765 DOI: 10.1101/gr.199166.115] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2015] [Accepted: 06/16/2016] [Indexed: 12/24/2022]
Abstract
Complex gene expression patterns are mediated by the binding of transcription factors (TFs) to specific genomic loci. The in vivo occupancy of a TF is, in large part, determined by the TF's DNA binding interaction partners, motivating genomic context-based models of TF occupancy. However, approaches thus far have assumed a uniform TF binding model to explain genome-wide cell-type–specific binding sites. Therefore, the cell type heterogeneity of TF occupancy models, as well as the extent to which binding rules underlying a TF's occupancy are shared across cell types, has not been investigated. Here, we develop an ensemble-based approach (TRISECT) to identify the heterogeneous binding rules for cell-type–specific TF occupancy and analyze the inter-cell-type sharing of such rules. Comprehensive analysis of 23 TFs, each with ChIP-seq data in four to 12 different cell types, shows that by explicitly capturing the heterogeneity of binding rules, TRISECT accurately identifies in vivo TF occupancy. Importantly, many of the binding rules derived from individual cell types are shared across cell types and reveal distinct yet functionally coherent putative target genes in different cell types. Closer inspection of the predicted cell-type–specific interaction partners provides insights into the context-specific functional landscape of a TF. Together, our novel ensemble-based approach reveals, for the first time, a widespread heterogeneity of binding rules, comprising the interaction partners within a cell type, many of which nevertheless transcend cell types. Notably, the putative targets of shared binding rules in different cell types, while distinct, exhibit significant functional coherence.
Collapse
Affiliation(s)
- Mahfuza Sharmin
- Department of Computer Science, University of Maryland, College Park, Maryland 20742, USA; Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland 20742, USA
| | - Héctor Corrada Bravo
- Department of Computer Science, University of Maryland, College Park, Maryland 20742, USA; Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland 20742, USA
| | - Sridhar Hannenhalli
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland 20742, USA; Department of Cell and Molecular Biology, University of Maryland, College Park, Maryland 20742, USA
| |
Collapse
|
23
|
Volkova OA, Kondrakhin YV, Yevshin IS, Valeev TF, Sharipov RN. Assessment of translational importance of mammalian mRNA sequence features based on Ribo-Seq and mRNA-Seq data. J Bioinform Comput Biol 2016; 14:1641006. [PMID: 27122318 DOI: 10.1142/s0219720016410067] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Ribosome profiling technology (Ribo-Seq) allowed to highlight more details of mRNA translation in cell and get additional information on importance of mRNA sequence features for this process. Application of translation inhibitors like harringtonine and cycloheximide along with mRNA-Seq technique helped to assess such important characteristic as translation efficiency. We assessed the translational importance of features of mRNA sequences with the help of statistical analysis of Ribo-Seq and mRNA-Seq data. Translationally important features known from literature as well as proposed by the authors were used in analysis. Such comparisons as protein coding versus non-coding RNAs and high- versus low-translated mRNAs were performed. We revealed a set of features that allowed to discriminate the compared categories of RNA. Significant relationships between mRNA features and efficiency of translation were also established.
Collapse
Affiliation(s)
- Oxana A Volkova
- * Laboratory of Gene Engineering, The Federal Research Center Institute of Cytology and Genetics, The Siberian Branch of the Russian Academy of Sciences, prosp. acad. Lavrentyeva, 10, Novosibirsk 630090, Russia
| | - Yury V Kondrakhin
- † Laboratory of Bioinformatics, Design Technological Institute of Digital Techniques, The Siberian Branch of the Russian Academy of Sciences, ul. acad. Rzhanova, 6, Novosibirsk 630090, Russia.,‡ Institute of Systems Biology, Ltd, ul. Krasina, 54, Novosibirsk 630112, Russia
| | - Ivan S Yevshin
- † Laboratory of Bioinformatics, Design Technological Institute of Digital Techniques, The Siberian Branch of the Russian Academy of Sciences, ul. acad. Rzhanova, 6, Novosibirsk 630090, Russia.,‡ Institute of Systems Biology, Ltd, ul. Krasina, 54, Novosibirsk 630112, Russia
| | - Tagir F Valeev
- ‡ Institute of Systems Biology, Ltd, ul. Krasina, 54, Novosibirsk 630112, Russia.,§ Laboratory of Complex Systems Simulation, A.P. Ershov Institute of Informatics Systems, The Siberian Branch of the Russian Academy of Sciences, prosp. acad. Lavrentyeva, 6, Novosibirsk 630090, Russia
| | - Ruslan N Sharipov
- † Laboratory of Bioinformatics, Design Technological Institute of Digital Techniques, The Siberian Branch of the Russian Academy of Sciences, ul. acad. Rzhanova, 6, Novosibirsk 630090, Russia.,‡ Institute of Systems Biology, Ltd, ul. Krasina, 54, Novosibirsk 630112, Russia.,¶ Specialized Educational Scientific Center, Novosibirsk State University, ul. Pirogova, 2, Novosibirsk 630090, Russia
| |
Collapse
|