1
|
Uemura K, Ohyama T. Physical Peculiarity of Two Sites in Human Promoters: Universality and Diverse Usage in Gene Function. Int J Mol Sci 2024; 25:1487. [PMID: 38338773 PMCID: PMC10855393 DOI: 10.3390/ijms25031487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 01/15/2024] [Accepted: 01/18/2024] [Indexed: 02/12/2024] Open
Abstract
Since the discovery of physical peculiarities around transcription start sites (TSSs) and a site corresponding to the TATA box, research has revealed only the average features of these sites. Unsettled enigmas include the individual genes with these features and whether they relate to gene function. Herein, using 10 physical properties of DNA, including duplex DNA free energy, base stacking energy, protein-induced deformability, and stabilizing energy of Z-DNA, we clarified for the first time that approximately 97% of the promoters of 21,056 human protein-coding genes have distinctive physical properties around the TSS and/or position -27; of these, nearly 65% exhibited such properties at both sites. Furthermore, about 55% of the 21,056 genes had a minimum value of regional duplex DNA free energy within TSS-centered ±300 bp regions. Notably, distinctive physical properties within the promoters and free energies of the surrounding regions separated human protein-coding genes into five groups; each contained specific gene ontology (GO) terms. The group represented by immune response genes differed distinctly from the other four regarding the parameter of the free energies of the surrounding regions. A vital suggestion from this study is that physical-feature-based analyses of genomes may reveal new aspects of the organization and regulation of genes.
Collapse
Affiliation(s)
- Kohei Uemura
- Major in Integrative Bioscience and Biomedical Engineering, Graduate School of Science and Engineering, Waseda University, 2-2 Wakamatsu-cho, Shinjuku-ku, Tokyo 162-8480, Japan;
| | - Takashi Ohyama
- Major in Integrative Bioscience and Biomedical Engineering, Graduate School of Science and Engineering, Waseda University, 2-2 Wakamatsu-cho, Shinjuku-ku, Tokyo 162-8480, Japan;
- Department of Biology, Faculty of Education and Integrated Arts and Sciences, Waseda University, 2-2 Wakamatsu-cho, Shinjuku-ku, Tokyo 162-8480, Japan
| |
Collapse
|
2
|
Veil M, Yampolsky LY, Grüning B, Onichtchouk D. Pou5f3, SoxB1, and Nanog remodel chromatin on high nucleosome affinity regions at zygotic genome activation. Genome Res 2019; 29:383-395. [PMID: 30674556 PMCID: PMC6396415 DOI: 10.1101/gr.240572.118] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Accepted: 01/16/2019] [Indexed: 12/16/2022]
Abstract
The zebrafish embryo is transcriptionally mostly quiescent during the first 10 cell cycles, until the main wave of zygotic genome activation (ZGA) occurs, accompanied by fast chromatin remodeling. At ZGA, homologs of the mammalian stem cell transcription factors (TFs) Pou5f3, Nanog, and Sox19b bind to thousands of developmental enhancers to initiate transcription. So far, how these TFs influence chromatin dynamics at ZGA has remained unresolved. To address this question, we analyzed nucleosome positions in wild-type and maternal-zygotic (MZ) mutants for pou5f3 and nanog by MNase-seq. We show that Nanog, Sox19b, and Pou5f3 bind to the high nucleosome affinity regions (HNARs). HNARs are spanning over 600 bp, featuring high in vivo and predicted in vitro nucleosome occupancy and high predicted propeller twist DNA shape value. We suggest a two-step nucleosome destabilization-depletion model, in which the same intrinsic DNA properties of HNAR promote both high nucleosome occupancy and differential binding of TFs. In the first step, already before ZGA, Pou5f3 and Nanog destabilize nucleosomes at HNAR centers genome-wide. In the second step, post-ZGA, Nanog, Pou5f3, and SoxB1 maintain open chromatin state on the subset of HNARs, acting synergistically. Nanog binds to the HNAR center, whereas the Pou5f3 stabilizes the flanks. The HNAR model will provide a useful tool for genome regulatory studies in a variety of biological systems.
Collapse
Affiliation(s)
- Marina Veil
- Department of Developmental Biology, Institute of Biology I, Faculty of Biology, Albert Ludwigs University of Freiburg, 79104, Freiburg, Germany
| | - Lev Y Yampolsky
- Department of Biological Sciences, East Tennessee State University, Johnson City, Tennessee 37614-1710, USA.,Zoological Institute, Basel University, Basel, CH-4051 Switzerland
| | - Björn Grüning
- Department of Computer Science, Albert Ludwigs University of Freiburg, 79110, Freiburg, Germany.,Center for Biological Systems Analysis (ZBSA), University of Freiburg, 79104, Freiburg, Germany
| | - Daria Onichtchouk
- Department of Developmental Biology, Institute of Biology I, Faculty of Biology, Albert Ludwigs University of Freiburg, 79104, Freiburg, Germany.,Signalling Research centers BIOSS and CIBSS, 79104, Freiburg, Germany.,Institute of Developmental Biology RAS, 119991 Moscow, Russia
| |
Collapse
|
3
|
Demirci S, Peters SA, de Ridder D, van Dijk ADJ. DNA sequence and shape are predictive for meiotic crossovers throughout the plant kingdom. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2018; 95:686-699. [PMID: 29808512 DOI: 10.1111/tpj.13979] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2017] [Revised: 05/11/2018] [Accepted: 05/14/2018] [Indexed: 06/08/2023]
Abstract
A better understanding of genomic features influencing the location of meiotic crossovers (COs) in plant species is both of fundamental importance and of practical relevance for plant breeding. Using CO positions with sufficiently high resolution from four plant species [Arabidopsis thaliana, Solanum lycopersicum (tomato), Zea mays (maize) and Oryza sativa (rice)] we have trained machine-learning models to predict the susceptibility to CO formation. Our results show that CO occurrence within various plant genomes can be predicted by DNA sequence and shape features. Several features related to genome content and to genomic accessibility were consistently either positively or negatively related to COs in all four species. Other features were found as predictive only in specific species. Gene annotation-related features were especially predictive for maize, whereas in tomato and Arabidopsis propeller twist and helical twist (DNA shape features) and AT/TA dinucleotides were found to be the most important. In rice, high roll (another DNA shape feature) and low CA dinucleotide frequency in particular were found to be associated with CO occurrence. The accuracy of our models was sufficient for Arabidopsis and rice (area under receiver operating characteristic curve, AUROC > 0.5), and was high for tomato and maize (AUROC ≫ 0.5), demonstrating that DNA sequence and shape are predictive for meiotic COs throughout the plant kingdom.
Collapse
Affiliation(s)
- Sevgin Demirci
- Business Unit Bioscience, Cluster Applied Bioinformatics, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands
- Bioinformatics Group, Wageningen University and Research, Wageningen, the Netherlands
| | - Sander A Peters
- Business Unit Bioscience, Cluster Applied Bioinformatics, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Wageningen University and Research, Wageningen, the Netherlands
| | - Aalt D J van Dijk
- Business Unit Bioscience, Cluster Applied Bioinformatics, Wageningen University and Research, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands
- Bioinformatics Group, Wageningen University and Research, Wageningen, the Netherlands
- Biometris, Wageningen University and Research, Wageningen, the Netherlands
| |
Collapse
|
4
|
Zhao B, Xue B. Improving prediction accuracy using decision-tree-based meta-strategy and multi-threshold sequential-voting exemplified by miRNA target prediction. Genomics 2017; 109:227-232. [PMID: 28435088 DOI: 10.1016/j.ygeno.2017.04.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2017] [Revised: 03/28/2017] [Accepted: 04/19/2017] [Indexed: 01/12/2023]
Abstract
Lots of computational predictors have been developed for fast and large-scale analysis of biological data. However, many of them were developed long time ago when training datasets or sets of input features were rather small. Consequently, the utility of these predictors in much large datasets, which are very common in nowadays, need to be examined carefully. In addition, with the rapid development of scientific research, the expectation on the prediction accuracy of computational predictors is continuously uplifting. Therefore, developing novel strategies to improve the prediction accuracies of computational predictors becomes critical. In this study, the predictive results of existing individual miRNA target predictors were integrated into a decision-tree to make meta-prediction. When the multi-threshold sequential-voting technique was used, the prediction accuracy of the decision-tree was significantly improved by at least thirty percentage points compared to the individual predictors.
Collapse
Affiliation(s)
- Bi Zhao
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, 4202 East Fowler Ave. ISA2015, Tampa, Florida, 33620, USA
| | - Bin Xue
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, 4202 East Fowler Ave. ISA2015, Tampa, Florida, 33620, USA.
| |
Collapse
|
5
|
Xue B, Lipps D, Devineni S. Integrated Strategy Improves the Prediction Accuracy of miRNA in Large Dataset. PLoS One 2016; 11:e0168392. [PMID: 28002428 PMCID: PMC5176297 DOI: 10.1371/journal.pone.0168392] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2016] [Accepted: 11/29/2016] [Indexed: 01/08/2023] Open
Abstract
MiRNAs are short non-coding RNAs of about 22 nucleotides, which play critical roles in gene expression regulation. The biogenesis of miRNAs is largely determined by the sequence and structural features of their parental RNA molecules. Based on these features, multiple computational tools have been developed to predict if RNA transcripts contain miRNAs or not. Although being very successful, these predictors started to face multiple challenges in recent years. Many predictors were optimized using datasets of hundreds of miRNA samples. The sizes of these datasets are much smaller than the number of known miRNAs. Consequently, the prediction accuracy of these predictors in large dataset becomes unknown and needs to be re-tested. In addition, many predictors were optimized for either high sensitivity or high specificity. These optimization strategies may bring in serious limitations in applications. Moreover, to meet continuously raised expectations on these computational tools, improving the prediction accuracy becomes extremely important. In this study, a meta-predictor mirMeta was developed by integrating a set of non-linear transformations with meta-strategy. More specifically, the outputs of five individual predictors were first preprocessed using non-linear transformations, and then fed into an artificial neural network to make the meta-prediction. The prediction accuracy of meta-predictor was validated using both multi-fold cross-validation and independent dataset. The final accuracy of meta-predictor in newly-designed large dataset is improved by 7% to 93%. The meta-predictor is also proved to be less dependent on datasets, as well as has refined balance between sensitivity and specificity. This study has two folds of importance: First, it shows that the combination of non-linear transformations and artificial neural networks improves the prediction accuracy of individual predictors. Second, a new miRNA predictor with significantly improved prediction accuracy is developed for the community for identifying novel miRNAs and the complete set of miRNAs. Source code is available at:https://github.com/xueLab/mirMeta
Collapse
Affiliation(s)
- Bin Xue
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, Tampa, Florida, United States of America
- * E-mail:
| | - David Lipps
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, Tampa, Florida, United States of America
| | - Sree Devineni
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, Tampa, Florida, United States of America
| |
Collapse
|
6
|
Lu Y, Gan Y, Guan J, Zhou S. An integrative analysis of nucleosome occupancy and positioning using diverse sequence dependent properties. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2015.11.107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
7
|
Predicting nucleosome positioning based on geometrically transformed Tsallis entropy. PLoS One 2014; 9:e109395. [PMID: 25380134 PMCID: PMC4224380 DOI: 10.1371/journal.pone.0109395] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2014] [Accepted: 08/26/2014] [Indexed: 11/19/2022] Open
Abstract
As the fundamental unit of eukaryotic chromatin structure, nucleosome plays critical roles in gene expression and regulation by controlling physical access to transcription factors. In this paper, based on the geometrically transformed Tsallis entropy and two index-vectors, a valid nucleosome positioning information model is developed to describe the distribution of A/T-riched and G/C-riched dimeric and trimeric motifs along the DNA duplex. When applied to train the support vector machine, the model achieves high AUCs across five organisms, which have significantly outperformed the previous studies. Besides, we adopt the concept of relative distance to describe the probability of arbitrary DNA sequence covered by nucleosome. Thus, the average nucleosome occupancy profile over the S.cerevisiae genome is calculated. With our peak detection model, the isolated nucleosomes along genome sequence are located. When compared with some published results, it shows that our model is effective for nucleosome positioning. The index-vector component is identified to be an important influencing factor of nucleosome organizations.
Collapse
|
8
|
Zheng Y, Li X, Hu H. Computational discovery of feature patterns in nucleosomal DNA sequences. Genomics 2014; 104:87-95. [PMID: 25063528 DOI: 10.1016/j.ygeno.2014.07.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2013] [Revised: 04/18/2014] [Accepted: 07/15/2014] [Indexed: 11/27/2022]
Abstract
The identification of important factors that affect nucleosome formation is critical to clarify nucleosome-forming mechanisms and the role of the nucleosome in gene regulation. Various features reported in the literature led to our hypothesis that multiple features can together contribute to nucleosome formation. Therefore, we compiled 779 features and developed a pattern discovery and scoring algorithm FFN (Finding Features for Nucleosomes) to identify feature patterns that are differentially enriched in nucleosome-forming sequences and nucleosome-depletion sequences. Applying FFN to genome-wide nucleosome occupancy data in yeast and human, we identified statistically significant feature patterns that may influence nucleosome formation, many of which are common to the two species. We found that both sequence and structural features are important in nucleosome occupancy prediction. We discovered that, even for the same feature combinations, variations in feature values may lead to differences in predictive power. We demonstrated that the identified feature patterns could be used to assist nucleosomal sequence prediction.
Collapse
Affiliation(s)
- Yiyu Zheng
- Department of Electrical Engineering and Computer Science, University Of Central Florida, Orlando, FL 32816, USA
| | - Xiaoman Li
- Department of Electrical Engineering and Computer Science, University Of Central Florida, Orlando, FL 32816, USA; Burnett School of Biomedical Science, University Of Central Florida, Orlando, FL 32816, USA
| | - Haiyan Hu
- Department of Electrical Engineering and Computer Science, University Of Central Florida, Orlando, FL 32816, USA.
| |
Collapse
|
9
|
Gan Y, Zou G, Guan J, Xu G. A Novel Wavelet-Based Approach for Predicting Nucleosome Positions Using DNA Structural Information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:638-647. [PMID: 26356334 DOI: 10.1109/tcbb.2014.2306837] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Nucleosomes are basic elements of chromatin structure. The positioning of nucleosomes along a genome is very important to dictate eukaryotic DNA compaction and access. Current computational methods have focused on the analysis of nucleosome occupancy and the positioning of well-positioned nucleosomes. However, fuzzy nucleosomes require more complex configurations and are more difficult to predict their positions. We analyzed the positioning of well-positioned and fuzzy nucleosomes from a novel structural perspective, and proposed WaveNuc, a computational approach for inferring their positions based on continuous wavelet transformation. The comparative analysis demonstrates that these two kinds of nucleosomes exhibit different propeller twist structural characteristics. Well-positioned nucleosomes tend to locate at sharp peaks of the propeller twist profile, whereas fuzzy nucleosomes correspond to broader peaks. The sharpness of these peaks shows that the propeller twist profile may contain nucleosome positioning information. Exploiting this knowledge, we applied WaveNuc to detect the two different kinds of peaks of the propeller twist profile along the genome. We compared the performance of our method with existing methods on real data sets. The results show that the proposed method can accurately resolve complex configurations of fuzzy nucleosomes, which leads to better performance of nucleosome positioning prediction on the whole genome.
Collapse
|
10
|
Genetic and epigenetic determinants mediate proneness of oncogene breakpoint sites for involvement in TCR translocations. Genes Immun 2013; 15:72-81. [PMID: 24304972 DOI: 10.1038/gene.2013.63] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2013] [Revised: 09/30/2013] [Accepted: 10/22/2013] [Indexed: 01/03/2023]
Abstract
T-cell receptor (TCR) translocations are a genetic hallmark of T-cell acute lymphoblastic leukemia and lead to juxtaposition of oncogene and TCR loci. Oncogene loci become involved in translocations because they are accessible to the V(D)J recombination machinery. Such accessibility is predicted at cryptic recombination signal sequence (cRSS) sites ('Type 1') as well as other sites that are subject to DNA double-strand breaks (DSBs) ('Type 2') during early stages of thymocyte development. As chromatin accessibility markers have not been analyzed in the context of TCR-associated translocations, various genetic and epigenetic determinants of LMO2, TAL1 and TLX1 translocation breakpoint (BP) sites and BP cluster regions (BCRs) were examined in human thymocytes to establish DSB proneness and heterogeneity of BP site involvement in TCR translocations. Our data show that DSBs in BCRs are primarily induced in the presence of a genetic element of sequence vulnerability (cRSSs, transposable elements), whereas breaks at single BP sites lacking such elements are more likely induced by chance or perhaps because of patient-specific genetic vulnerability. Vulnerability to obtain DSBs is increased by features that determine chromatin organization, such as methylation status and nucleosome occupancy, although at different levels at different BP sites.
Collapse
|