1
|
Mariani L, Liu X, Lee K, Gisselbrecht SS, Cole PA, Bulyk ML. DNA flexibility regulates transcription factor binding to nucleosomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.02.610559. [PMID: 39463949 PMCID: PMC11507811 DOI: 10.1101/2024.09.02.610559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
Cell fate decisions are controlled by sequence-specific transcription factors (TFs), referred to as 'pioneer' factors, that bind their target sites within nucleosomes ('pioneer binding') and thus initiate chromatin opening. However, pioneers bind just a minority of their recognition sequences present in the genome, suggesting that local sequence context features may regulate pioneer binding. Here, we developed PIONEAR-seq, a highly parallel sequencing-based biochemical assay for high-throughput analysis of TF binding to nucleosomes on nucleosome positioning sequences. Using PIONEAR-seq, we characterized the pioneer binding of 7 human pioneer TFs. Comparison of TF binding to nucleosomes based on the synthetic Widom 601 (W601) model sequence versus three different genomic sequences revealed that the positional preferences of these TFs' binding to nucleosomes (i.e., dyad, periodic and end binding) is determined by the broader sequence context of the nucleosome, rather than being a property intrinsic to the TF. We propose a model where the flexibility and rigidity within nucleosomal DNA regulate where pioneers bind within nucleosomes. Our results suggest that the broader physical properties of nucleosomal DNA represent another layer of cis-regulatory information read out by TFs in eukaryotic genomes.
Collapse
Affiliation(s)
- Luca Mariani
- Division of Genetics, Department of Medicine; Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115
| | - Xiao Liu
- Division of Genetics, Department of Medicine; Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115
- Department of Biomedical Informatics; Harvard Medical School, Boston, MA 02115
| | - Kwangwoon Lee
- Division of Genetics, Department of Medicine; Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115
- Department of Biological Chemistry and Molecular Pharmacology; Harvard Medical School, Boston, MA 02115
| | - Stephen S. Gisselbrecht
- Division of Genetics, Department of Medicine; Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115
| | - Philip A. Cole
- Division of Genetics, Department of Medicine; Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115
- Department of Biological Chemistry and Molecular Pharmacology; Harvard Medical School, Boston, MA 02115
| | - Martha L. Bulyk
- Division of Genetics, Department of Medicine; Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115
- Department of Pathology; Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115
| |
Collapse
|
2
|
Forni D, Pozzoli U, Mozzi A, Cagliani R, Sironi M. Depletion of CpG dinucleotides in bacterial genomes may represent an adaptation to high temperatures. NAR Genom Bioinform 2024; 6:lqae088. [PMID: 39071851 PMCID: PMC11282364 DOI: 10.1093/nargab/lqae088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 06/17/2024] [Accepted: 07/18/2024] [Indexed: 07/30/2024] Open
Abstract
Dinucleotide biases have been widely investigated in the genomes of eukaryotes and viruses, but not in bacteria. We assembled a dataset of bacterial genomes (>15 000), which are representative of the genetic diversity in the kingdom Eubacteria, and we analyzed dinucleotide biases in relation to different traits. We found that TpA dinucleotides are the most depleted and that CpG dinucleotides show the widest dispersion. The abundances of both dinucleotides vary with genomic G + C content and show a very strong phylogenetic signal. After accounting for G + C content and phylogenetic inertia, we analyzed different bacterial lifestyle traits. We found that temperature preferences associate with the abundance of CpG dinucleotides, with thermophiles/hyperthemophiles being particularly depleted. Conversely, the TpA dinucleotide displays a bias that only depends on genomic G + C composition. Using predictions of intrinsic cyclizability we also show that CpG depletion may associate with higher DNA bendability in both thermophiles/hyperthermophiles and mesophiles, and that the former are predicted to have significantly more flexible genomes than the latter. We suggest that higher bendability is advantageous at high temperatures because it facilitates DNA positive supercoiling and that, through modulation of DNA mechanical properties, local or global CpG depletion controls genome organization, most likely not only in bacteria.
Collapse
Affiliation(s)
- Diego Forni
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, 23842 Bosisio Parini, Italy
| | - Uberto Pozzoli
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, 23842 Bosisio Parini, Italy
| | - Alessandra Mozzi
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, 23842 Bosisio Parini, Italy
| | - Rachele Cagliani
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, 23842 Bosisio Parini, Italy
| | - Manuela Sironi
- Scientific Institute IRCCS E. MEDEA, Bioinformatics, 23842 Bosisio Parini, Italy
| |
Collapse
|
3
|
Masoudi-Sobhanzadeh Y, Li S, Peng Y, Panchenko A. Interpretable deep residual network uncovers nucleosome positioning and associated features. Nucleic Acids Res 2024; 52:8734-8745. [PMID: 39036965 PMCID: PMC11347144 DOI: 10.1093/nar/gkae623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 05/31/2024] [Accepted: 07/04/2024] [Indexed: 07/23/2024] Open
Abstract
Nucleosomes represent elementary building units of eukaryotic chromosomes and consist of DNA wrapped around a histone octamer flanked by linker DNA segments. Nucleosomes are central in epigenetic pathways and their genomic positioning is associated with regulation of gene expression, DNA replication, DNA methylation and DNA repair, among other functions. Building on prior discoveries that DNA sequences noticeably affect nucleosome positioning, our objective is to identify nucleosome positions and related features across entire genome. Here, we introduce an interpretable framework based on the concepts of deep residual networks (NuPoSe). Trained on high-coverage human experimental MNase-seq data, NuPoSe is able to learn sequence and structural patterns associated with nucleosome organization in human genome. NuPoSe can be also applied to unseen data from different organisms and cell types. Our findings point to 43 informative features, most of them constitute tri-nucleotides, di-nucleotides and one tetra-nucleotide. Most features are significantly associated with the nucleosomal structural characteristics, namely, periodicity of nucleosomal DNA and its location with respect to a histone octamer. Importantly, we show that features derived from the 27 bp linker DNA flanking nucleosomes contribute up to 10% to the quality of the prediction model. This, along with the comprehensive training sets, deep-learning architecture, and feature selection method, may contribute to the NuPoSe's 80-89% classification accuracy on different independent datasets.
Collapse
Affiliation(s)
| | - Shuxiang Li
- Department of Pathology and Molecular Medicine, Queen's University, Kingston, K7L3N6, Canada
| | - Yunhui Peng
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan, 430079, China
| | - Anna R Panchenko
- Department of Pathology and Molecular Medicine, Queen's University, Kingston, K7L3N6, Canada
- Department of Biology and Molecular Sciences, Queen's University, Kingston, K7L3N6, Canada
- School of Computing, Queen's University, Kingston, K7L3N6, Canada
- Ontario Institute of Cancer Research, Toronto, M5G 0A3, Canada
| |
Collapse
|
4
|
P P, Riyaz A, Choudhury A, Choudhury PR, Pradhan N, Singh A, Nakul M, Dudeja C, Yadav A, Nath SK, Khanna V, Sharma T, Pradhan G, Takkar S, Rawal K. DNASCANNER v2: A Web-Based Tool to Analyze the Characteristic Properties of Nucleotide Sequences. J Comput Biol 2024; 31:651-669. [PMID: 38662479 DOI: 10.1089/cmb.2023.0227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/18/2024] Open
Abstract
Throughout the process of evolution, DNA undergoes the accumulation of distinct mutations, which can often result in highly organized patterns that serve various essential biological functions. These patterns encompass various genomic elements and provide valuable insights into the regulatory and functional aspects of DNA. The physicochemical, mechanical, thermodynamic, and structural properties of DNA sequences play a crucial role in the formation of specific patterns. These properties contribute to the three-dimensional structure of DNA and influence their interactions with proteins, regulatory elements, and other molecules. In this study, we introduce DNASCANNER v2, an advanced version of our previously published algorithm DNASCANNER for analyzing DNA properties. The current tool is built using the FLASK framework in Python language. Featuring a user-friendly interface tailored for nonspecialized researchers, it offers an extensive analysis of 158 DNA properties, including mono/di/trinucleotide frequencies, structural, physicochemical, thermodynamics, and mechanical properties of DNA sequences. The tool provides downloadable results and offers interactive plots for easy interpretation and comparison between different features. We also demonstrate the utility of DNASCANNER v2 in analyzing splice-site junctions, casposon insertion sequences, and transposon insertion sites (TIS) within the bacterial and human genomes, respectively. We also developed a deep learning module for the prediction of potential TIS in a given nucleotide sequence. In the future, we aim to optimize the performance of this prediction model through extensive training on larger data sets.
Collapse
Affiliation(s)
- Preeti P
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Azeen Riyaz
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Alakto Choudhury
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Priyanka Ray Choudhury
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Nischal Pradhan
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Abhishek Singh
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Mihir Nakul
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Chhavi Dudeja
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Abhijeet Yadav
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Swarsat Kaushik Nath
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Vrinda Khanna
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Trapti Sharma
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Gayatri Pradhan
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Simran Takkar
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| | - Kamal Rawal
- Center for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University, Noida, Uttar Pradesh, India
| |
Collapse
|
5
|
Hu S, Liu Y, Zhang Q, Bai J, Xu C. A continuum of zinc finger transcription factor retention on native chromatin underlies dynamic genome organization. Mol Syst Biol 2024; 20:799-824. [PMID: 38745107 PMCID: PMC11220090 DOI: 10.1038/s44320-024-00038-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 04/10/2024] [Accepted: 04/15/2024] [Indexed: 05/16/2024] Open
Abstract
Transcription factor (TF) residence on chromatin translates into quantitative transcriptional or structural outcomes on genome. Commonly used formaldehyde crosslinking fixes TF-DNA interactions cumulatively and compromises the measured occupancy level. Here we mapped the occupancy level of global or individual zinc finger TFs like CTCF and MAZ, in the form of highly resolved footprints, on native chromatin. By incorporating reinforcing perturbation conditions, we established S-score, a quantitative metric to proxy the continuum of CTCF or MAZ retention across different motifs on native chromatin. The native chromatin-retained CTCF sites harbor sequence features within CTCF motifs better explained by S-score than the metrics obtained from other crosslinking or native assays. CTCF retention on native chromatin correlates with local SUMOylation level, and anti-correlates with transcriptional activity. The S-score successfully delineates the otherwise-masked differential stability of chromatin structures mediated by CTCF, or by MAZ independent of CTCF. Overall, our study established a paradigm continuum of TF retention across binding sites on native chromatin, explaining the dynamic genome organization.
Collapse
Affiliation(s)
- Siling Hu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- China National Center for Bioinformation, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yangying Liu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- China National Center for Bioinformation, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Qifan Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- China National Center for Bioinformation, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Juan Bai
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
- China National Center for Bioinformation, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Chenhuan Xu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.
- China National Center for Bioinformation, Beijing, China.
- University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
6
|
Abbasi AF, Asim MN, Ahmed S, Dengel A. Long extrachromosomal circular DNA identification by fusing sequence-derived features of physicochemical properties and nucleotide distribution patterns. Sci Rep 2024; 14:9466. [PMID: 38658614 PMCID: PMC11043385 DOI: 10.1038/s41598-024-57457-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 03/18/2024] [Indexed: 04/26/2024] Open
Abstract
Long extrachromosomal circular DNA (leccDNA) regulates several biological processes such as genomic instability, gene amplification, and oncogenesis. The identification of leccDNA holds significant importance to investigate its potential associations with cancer, autoimmune, cardiovascular, and neurological diseases. In addition, understanding these associations can provide valuable insights about disease mechanisms and potential therapeutic approaches. Conventionally, wet lab-based methods are utilized to identify leccDNA, which are hindered by the need for prior knowledge, and resource-intensive processes, potentially limiting their broader applicability. To empower the process of leccDNA identification across multiple species, the paper in hand presents the very first computational predictor. The proposed iLEC-DNA predictor makes use of SVM classifier along with sequence-derived nucleotide distribution patterns and physicochemical properties-based features. In addition, the study introduces a set of 12 benchmark leccDNA datasets related to three species, namely Homo sapiens (HM), Arabidopsis Thaliana (AT), and Saccharomyces cerevisiae (SC/YS). It performs large-scale experimentation across 12 benchmark datasets under different experimental settings using the proposed predictor, more than 140 baseline predictors, and 858 encoder ensembles. The proposed predictor outperforms baseline predictors and encoder ensembles across diverse leccDNA datasets by producing average performance values of 81.09%, 62.2% and 81.08% in terms of ACC, MCC and AUC-ROC across all the datasets. The source code of the proposed and baseline predictors is available at https://github.com/FAhtisham/Extrachrosmosomal-DNA-Prediction . To facilitate the scientific community, a web application for leccDNA identification is available at https://sds_genetic_analysis.opendfki.de/iLEC_DNA/.
Collapse
Affiliation(s)
- Ahtisham Fazeel Abbasi
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, 67663, Kaiserslautern, Germany.
- German Research Center for Artificial Intelligence GmbH, 67663, Kaiserslautern, Germany.
| | - Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence GmbH, 67663, Kaiserslautern, Germany.
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence GmbH, 67663, Kaiserslautern, Germany
| | - Andreas Dengel
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, 67663, Kaiserslautern, Germany
- German Research Center for Artificial Intelligence GmbH, 67663, Kaiserslautern, Germany
| |
Collapse
|
7
|
Yang M, Zhang S, Zheng Z, Zhang P, Liang Y, Tang S. Employing bimodal representations to predict DNA bendability within a self-supervised pre-trained framework. Nucleic Acids Res 2024; 52:e33. [PMID: 38375921 PMCID: PMC11014357 DOI: 10.1093/nar/gkae099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/10/2024] [Accepted: 02/01/2024] [Indexed: 02/21/2024] Open
Abstract
The bendability of genomic DNA, which measures the DNA looping rate, is crucial for numerous biological processes of DNA. Recently, an advanced high-throughput technique known as 'loop-seq' has made it possible to measure the inherent cyclizability of DNA fragments. However, quantifying the bendability of large-scale DNA is costly, laborious, and time-consuming. To close the gap between rapidly evolving large language models and expanding genomic sequence information, and to elucidate the DNA bendability's impact on critical regulatory sequence motifs such as super-enhancers in the human genome, we introduce an innovative computational model, named MIXBend, to forecast the DNA bendability utilizing both nucleotide sequences and physicochemical properties. In MIXBend, a pre-trained language model DNABERT and convolutional neural network with attention mechanism are utilized to construct both sequence- and physicochemical-based extractors for the sophisticated refinement of DNA sequence representations. These bimodal DNA representations are then fed to a k-mer sequence-physicochemistry matching module to minimize the semantic gap between each modality. Lastly, a self-attention fusion layer is employed for the prediction of DNA bendability. In conclusion, the experimental results validate MIXBend's superior performance relative to other state-of-the-art methods. Additionally, MIXBend reveals both novel and known motifs from the yeast. Moreover, MIXBend discovers significant bendability fluctuations within super-enhancer regions and transcription factors binding sites in the human genome.
Collapse
Affiliation(s)
- Minghao Yang
- Bioscience and Biomedical Engineering Thrust, System Hub, Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511466, China
| | - Shichen Zhang
- Bioscience and Biomedical Engineering Thrust, System Hub, Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511466, China
| | - Zhihang Zheng
- Bioscience and Biomedical Engineering Thrust, System Hub, Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511466, China
| | - Pengfei Zhang
- Bioscience and Biomedical Engineering Thrust, System Hub, Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511466, China
| | - Yan Liang
- School of Artificial Intelligence, South China Normal University, Foshan 528225, China
| | - Shaojun Tang
- Bioscience and Biomedical Engineering Thrust, System Hub, Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511466, China
- Division of Life Science, Hong Kong University of Science and Technology, Hong Kong SAR 999077, China
| |
Collapse
|
8
|
Yan W, Tan L, Mengshan L, Weihong Z, Sheng S, Jun W, Fu-An W. Time series-based hybrid ensemble learning model with multivariate multidimensional feature coding for DNA methylation prediction. BMC Genomics 2023; 24:758. [PMID: 38082253 PMCID: PMC10712061 DOI: 10.1186/s12864-023-09866-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 12/02/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND DNA methylation is a form of epigenetic modification that impacts gene expression without modifying the DNA sequence, thereby exerting control over gene function and cellular development. The prediction of DNA methylation is vital for understanding and exploring gene regulatory mechanisms. Currently, machine learning algorithms are primarily used for model construction. However, several challenges remain to be addressed, including limited prediction accuracy, constrained generalization capability, and insufficient learning capacity. RESULTS In response to the aforementioned challenges, this paper leverages the similarities between DNA sequences and time series to introduce a time series-based hybrid ensemble learning model, called Multi2-Con-CAPSO-LSTM. The model utilizes multivariate and multidimensional encoding approach, combining three types of time series encodings with three kinds of genetic feature encodings, resulting in a total of nine types of feature encoding matrices. Convolutional Neural Networks are utilized to extract features from DNA sequences, including temporal, positional, physicochemical, and genetic information, thereby creating a comprehensive feature matrix. The Long Short-Term Memory model is then optimized using the Chaotic Accelerated Particle Swarm Optimization algorithm for predicting DNA methylation. CONCLUSIONS Through cross-validation experiments conducted on 17 species involving three types of DNA methylation (6 mA, 5hmC, and 4mC), the results demonstrate the robust predictive capabilities of the Multi2-Con-CAPSO-LSTM model in DNA methylation prediction across various types and species. Compared with other benchmark models, the Multi2-Con-CAPSO-LSTM model demonstrates significant advantages in sensitivity, specificity, accuracy, and correlation. The model proposed in this paper provides valuable insights and inspiration across various disciplines, including sequence alignment, genetic evolution, time series analysis, and structure-activity relationships.
Collapse
Affiliation(s)
- Wu Yan
- School of Biotechnology, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, 212018, China.
- School of Mathematics and Computer Science, Gannan Normal University, Ganzhou, Jiangxi, 341000, China.
- Sericultural Research Institute, Chinese Academy of Agricultural Sciences, Zhenjiang, Jiangsu, 212018, China.
| | - Li Tan
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, Jiangxi, 341000, China
| | - Li Mengshan
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, Jiangxi, 341000, China.
| | - Zhou Weihong
- School of Biotechnology, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, 212018, China
- Sericultural Research Institute, Chinese Academy of Agricultural Sciences, Zhenjiang, Jiangsu, 212018, China
| | - Sheng Sheng
- School of Biotechnology, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, 212018, China
- Sericultural Research Institute, Chinese Academy of Agricultural Sciences, Zhenjiang, Jiangsu, 212018, China
| | - Wang Jun
- School of Biotechnology, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, 212018, China
- Sericultural Research Institute, Chinese Academy of Agricultural Sciences, Zhenjiang, Jiangsu, 212018, China
| | - Wu Fu-An
- School of Biotechnology, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, 212018, China.
- Sericultural Research Institute, Chinese Academy of Agricultural Sciences, Zhenjiang, Jiangsu, 212018, China.
| |
Collapse
|
9
|
Biswas A, Basu A. The impact of the sequence-dependent physical properties of DNA on chromatin dynamics. Curr Opin Struct Biol 2023; 83:102698. [PMID: 37696706 DOI: 10.1016/j.sbi.2023.102698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 07/07/2023] [Accepted: 08/14/2023] [Indexed: 09/13/2023]
Abstract
The local mechanical properties of DNA depend on local sequence. Here we review recent genomic, structural, and computational efforts at deciphering the "mechanical code", i.e., the mapping between sequence and mechanics. We then discuss works that suggest how evolution has exploited the mechanical code to control the energetics of DNA-deforming biological processes such as nucleosome organization, transcription factor binding, DNA supercoiling, gene regulation, and 3D chromatin organization. As a whole, these recent works suggest that DNA sequence in diverse organisms can encode regulatory information governing diverse processes via the mechanical code.
Collapse
Affiliation(s)
- Aditi Biswas
- Department of Biosciences, Durham University, Durham, UK
| | - Aakash Basu
- Department of Biosciences, Durham University, Durham, UK.
| |
Collapse
|
10
|
Back G, Walther D. Predictions of DNA mechanical properties at a genomic scale reveal potentially new functional roles of DNA flexibility. NAR Genom Bioinform 2023; 5:lqad097. [PMID: 37954573 PMCID: PMC10632188 DOI: 10.1093/nargab/lqad097] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Revised: 09/28/2023] [Accepted: 10/25/2023] [Indexed: 11/14/2023] Open
Abstract
Mechanical properties of DNA have been implied to influence many of its biological functions. Recently, a new high-throughput method, called loop-seq, which allows measuring the intrinsic bendability of DNA fragments, has been developed. Using loop-seq data, we created a deep learning model to explore the biological significance of local DNA flexibility in a range of different species from different kingdoms. Consistently, we observed a characteristic and largely dinucleotide-composition-driven change of local flexibility near transcription start sites. In the presence of a TATA-box, a pronounced peak of high flexibility can be observed. Furthermore, depending on the transcription factor investigated, flanking-sequence-dependent DNA flexibility was identified as a potential factor influencing DNA binding. Compared to randomized genomic sequences, depending on species and taxa, actual genomic sequences were observed both with increased and lowered flexibility. Furthermore, in Arabidopsis thaliana, mutation rates, both de novo and fixed, were found to be associated with relatively rigid sequence regions. Our study presents a range of significant correlations between characteristic DNA mechanical properties and genomic features, the significance of which with regard to detailed molecular relevance awaits further theoretical and experimental exploration.
Collapse
Affiliation(s)
- Georg Back
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, Potsdam-Golm 14476, Germany
| | - Dirk Walther
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, Potsdam-Golm 14476, Germany
| |
Collapse
|
11
|
Jiang WJ, Hu C, Lai F, Pang W, Yi X, Xu Q, Wang H, Zhou J, Zhu H, Zhong C, Kuang Z, Fan R, Shen J, Zhou X, Wang YJ, Wong CCL, Zheng X, Wu HJ. Assessing base-resolution DNA mechanics on the genome scale. Nucleic Acids Res 2023; 51:9552-9566. [PMID: 37697433 PMCID: PMC10570052 DOI: 10.1093/nar/gkad720] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 08/09/2023] [Accepted: 08/18/2023] [Indexed: 09/13/2023] Open
Abstract
Intrinsic DNA properties including bending play a crucial role in diverse biological systems. A recent advance in a high-throughput technology called loop-seq makes it possible to determine the bendability of hundred thousand 50-bp DNA duplexes in one experiment. However, it's still challenging to assess base-resolution sequence bendability in large genomes such as human, which requires thousands of such experiments. Here, we introduce 'BendNet'-a deep neural network to predict the intrinsic DNA bending at base-resolution by using loop-seq results in yeast as training data. BendNet can predict the DNA bendability of any given sequence from different species with high accuracy. To explore the utility of BendNet, we applied it to the human genome and observed DNA bendability is associated with chromatin features and disease risk regions involving transcription/enhancer regulation, DNA replication, transcription factor binding and extrachromosomal circular DNA generation. These findings expand our understanding on DNA mechanics and its association with transcription regulation in mammals. Lastly, we built a comprehensive resource of genomic DNA bendability profiles for 307 species by applying BendNet, and provided an online tool to assess the bendability of user-specified DNA sequences (http://www.dnabendnet.com/).
Collapse
Affiliation(s)
- Wen-Jie Jiang
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Peking University Cancer Hospital and Institute, 100142 Beijing, China
- School of Basic Medical Sciences, Center for Precision Medicine Multi-Omics Research, Peking University Health Science Center, 102206 Beijing, China
| | - Congcong Hu
- Department of Mathematics, Shanghai Normal University, 200234 Shanghai, China
| | - Futing Lai
- School of Basic Medical Sciences, Center for Precision Medicine Multi-Omics Research, Peking University Health Science Center, 102206 Beijing, China
| | - Weixiong Pang
- Department of Mathematics, Shanghai Ocean University, 201306 Shanghai, China
| | - Xinyao Yi
- Department of Mathematics, Shanghai Normal University, 200234 Shanghai, China
| | - Qianyi Xu
- University of California, San Diego, CA 92103, USA
| | - Haojie Wang
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, 100101 Beijing, China
| | - Jialu Zhou
- Department of Gynecology and Obstetrics, Chinese PLA General Hospital, 100853 Beijing, China
| | - Hanwen Zhu
- School of Basic Medical Sciences, Center for Precision Medicine Multi-Omics Research, Peking University Health Science Center, 102206 Beijing, China
| | - Chunge Zhong
- College of Life and Health Sciences, Northeastern University, 110819 Shenyang, China
| | - Zeyu Kuang
- School of Basic Medical Sciences, Center for Precision Medicine Multi-Omics Research, Peking University Health Science Center, 102206 Beijing, China
| | - Ruiqi Fan
- Central Laboratory, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Peking University Cancer Hospital and Institute, 100142 Beijing, China
| | - Jing Shen
- Central Laboratory, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Peking University Cancer Hospital and Institute, 100142 Beijing, China
| | - Xiaorui Zhou
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Peking University Cancer Hospital and Institute, 100142 Beijing, China
| | - Yu-Juan Wang
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Peking University Cancer Hospital and Institute, 100142 Beijing, China
| | - Catherine C L Wong
- Department of Medical Research Center, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Science & Peking Union Medical College, 100730 Beijing, China
| | - Xiaoqi Zheng
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, 200025 Shanghai, China
| | - Hua-Jun Wu
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Peking University Cancer Hospital and Institute, 100142 Beijing, China
- School of Basic Medical Sciences, Center for Precision Medicine Multi-Omics Research, Peking University Health Science Center, 102206 Beijing, China
| |
Collapse
|
12
|
Wang X, Zeng H, Lin L, Huang Y, Lin H, Que Y. Deep learning-empowered crop breeding: intelligent, efficient and promising. FRONTIERS IN PLANT SCIENCE 2023; 14:1260089. [PMID: 37860239 PMCID: PMC10583549 DOI: 10.3389/fpls.2023.1260089] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 09/13/2023] [Indexed: 10/21/2023]
Abstract
Crop breeding is one of the main approaches to increase crop yield and improve crop quality. However, the breeding process faces challenges such as complex data, difficulties in data acquisition, and low prediction accuracy, resulting in low breeding efficiency and long cycle. Deep learning-based crop breeding is a strategy that applies deep learning techniques to improve and optimize the breeding process, leading to accelerated crop improvement, enhanced breeding efficiency, and the development of higher-yielding, more adaptive, and disease-resistant varieties for agricultural production. This perspective briefly discusses the mechanisms, key applications, and impact of deep learning in crop breeding. We also highlight the current challenges associated with this topic and provide insights into its future application prospects.
Collapse
Affiliation(s)
- Xiaoding Wang
- Fujian Provincial Key Lab of Network Security & Cryptology, College of Computer and Cyber Security, Fujian Normal University, Fuzhou, China
| | - Haitao Zeng
- Fujian Provincial Key Lab of Network Security & Cryptology, College of Computer and Cyber Security, Fujian Normal University, Fuzhou, China
| | - Limei Lin
- Fujian Provincial Key Lab of Network Security & Cryptology, College of Computer and Cyber Security, Fujian Normal University, Fuzhou, China
| | - Yanze Huang
- School of Computer Science and Mathematics, Fujian Provincial Key Laboratory of Big Data Mining and Applications, Fujian University of Technology, Fuzhou, China
| | - Hui Lin
- Fujian Provincial Key Lab of Network Security & Cryptology, College of Computer and Cyber Security, Fujian Normal University, Fuzhou, China
| | - Youxiong Que
- Key Laboratory of Sugarcane Biology and Genetic Breeding, Ministry of Agriculture and Rural Affairs, Fujian Agriculture and Forestry University, Fuzhou, China
- National Key Laboratory for Tropical Crop Breeding, Institute of Tropical Bioscience and Biotechnology, Chinese Academy of Tropical Agricultural Sciences, Hainan, China
| |
Collapse
|
13
|
Yadav M, Zuiddam M, Schiessel H. The role of transcript regions and amino acid choice in nucleosome positioning. NAR Genom Bioinform 2023; 5:lqad080. [PMID: 37705829 PMCID: PMC10495542 DOI: 10.1093/nargab/lqad080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 07/19/2023] [Accepted: 08/30/2023] [Indexed: 09/15/2023] Open
Abstract
Eukaryotic DNA is organized and compacted in a string of nucleosomes, DNA-wrapped protein cylinders. The positions of nucleosomes along DNA are not random but show well-known base pair sequence preferences that result from the sequence-dependent elastic and geometric properties of the DNA double helix. Here, we focus on DNA around transcription start sites, which are known to typically attract nucleosomes in multicellular life forms through their high GC content. We aim to understand how these GC signals, as observed in genome-wide averages, are produced and encoded through different genomic regions (mainly 5' UTRs, coding exons, and introns). Our study uses a bioinformatics approach to decompose the genome-wide GC signal into between-region and within-region signals. We find large differences in GC signal contributions between vertebrates and plants and, remarkably, even between closely related species. Introns contribute most to the GC signal in vertebrates, while in plants the exons dominate. Further, we find signal strengths stronger on DNA than on mRNA, suggesting a biological function of GC signals along the DNA itself, as is the case for nucleosome positioning. Finally, we make the surprising discovery that both the choice of synonymous codons and amino acids contribute to the nucleosome positioning signal.
Collapse
Affiliation(s)
- Manish Yadav
- Cluster of Excellence Physics of Life, TU Dresden, 01062 Dresden, Germany
| | - Martijn Zuiddam
- Institute Lorentz for Theoretical Physics, Leiden University, Leiden, the Netherlands
| | - Helmut Schiessel
- Cluster of Excellence Physics of Life, TU Dresden, 01062 Dresden, Germany
- Institut für Theoretische Physik, Technische Universität Dresden, 01062 Dresden, Germany
| |
Collapse
|
14
|
Hu W, Guan L, Li M. Prediction of DNA Methylation based on Multi-dimensional feature encoding and double convolutional fully connected convolutional neural network. PLoS Comput Biol 2023; 19:e1011370. [PMID: 37639434 PMCID: PMC10461834 DOI: 10.1371/journal.pcbi.1011370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 07/18/2023] [Indexed: 08/31/2023] Open
Abstract
DNA methylation takes on critical significance to the regulation of gene expression by affecting the stability of DNA and changing the structure of chromosomes. DNA methylation modification sites should be identified, which lays a solid basis for gaining more insights into their biological functions. Existing machine learning-based methods of predicting DNA methylation have not fully exploited the hidden multidimensional information in DNA gene sequences, such that the prediction accuracy of models is significantly limited. Besides, most models have been built in terms of a single methylation type. To address the above-mentioned issues, a deep learning-based method was proposed in this study for DNA methylation site prediction, termed the MEDCNN model. The MEDCNN model is capable of extracting feature information from gene sequences in three dimensions (i.e., positional information, biological information, and chemical information). Moreover, the proposed method employs a convolutional neural network model with double convolutional layers and double fully connected layers while iteratively updating the gradient descent algorithm using the cross-entropy loss function to increase the prediction accuracy of the model. Besides, the MEDCNN model can predict different types of DNA methylation sites. As indicated by the experimental results,the deep learning method based on coding from multiple dimensions outperformed single coding methods, and the MEDCNN model was highly applicable and outperformed existing models in predicting DNA methylation between different species. As revealed by the above-described findings, the MEDCNN model can be effective in predicting DNA methylation sites.
Collapse
Affiliation(s)
- Wenxing Hu
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, Jiangxi, China
| | - Lixin Guan
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, Jiangxi, China
| | - Mengshan Li
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, Jiangxi, China
| |
Collapse
|
15
|
Khan SR, Sakib S, Rahman MS, Samee MAH. DeepBend: An interpretable model of DNA bendability. iScience 2023; 26:105945. [PMID: 36866046 PMCID: PMC9971889 DOI: 10.1016/j.isci.2023.105945] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 12/05/2022] [Accepted: 01/05/2023] [Indexed: 01/09/2023] Open
Abstract
The bendability of genomic DNA impacts chromatin packaging and protein-DNA binding. However, we do not have a comprehensive understanding of the motifs influencing DNA bendability. Recent high-throughput technologies such as Loop-Seq offer an opportunity to address this gap but the lack of accurate and interpretable machine learning models still remains. Here we introduce DeepBend, a convolutional neural network model with convolutions designed to directly capture the motifs underlying DNA bendability and their periodic occurrences or relative arrangements that modulate bendability. DeepBend consistently performs on par with alternative models while giving an extra edge through mechanistic interpretations. Besides confirming the known motifs of DNA bendability, DeepBend also revealed several novel motifs and showed how the spatial patterns of motif occurrences influence bendability. DeepBend's genome-wide prediction of bendability further showed how bendability is linked to chromatin conformation and revealed the motifs controlling the bendability of topologically associated domains and their boundaries.
Collapse
Affiliation(s)
- Samin Rahman Khan
- Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Sadman Sakib
- Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - M. Sohel Rahman
- Bangladesh University of Engineering and Technology, Dhaka, Bangladesh,Corresponding author
| | | |
Collapse
|
16
|
DNA Sequence-Dependent Properties of Nucleosome Positioning in Regions of Distinct Chromatin States in Mouse Embryonic Stem Cells. Int J Mol Sci 2022; 23:ijms232214488. [PMID: 36430966 PMCID: PMC9693356 DOI: 10.3390/ijms232214488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 11/18/2022] [Accepted: 11/19/2022] [Indexed: 11/23/2022] Open
Abstract
Chromatin architecture is orchestrated, and plays crucial roles during the developmental process by regulating gene expression. In embryonic stem cells (ESCs), three types of chromatin states, including active, repressive and poised states, were previously identified and characterized with specific chromatin modification marks and different transcription activity, but it is largely unknown how nucleosomes are organized in these chromatin states. In this study, by using a DNA deformation energy model, we investigated the sequence-dependent nucleosome organization within the chromatin states in mouse ESCs. The results revealed that: (1) compared with poised genes, active genes are characterized with a higher level of nucleosome occupancy around their transcription start sites (TSS) and transcription termination sites (TTS), and both types of genes do not have a nucleosome-depleted region at their TTS, contrasting with the MNase-seq based result; (2) based on our previous DNA bending energy model, we developed an improved model capable of predicting both rotational positioning and nucleosome occupancy determined by a chemical mapping approach; (3) DNA bending-energy-based analyses demonstrated that the fragile nucleosomes positioned at both gene ends could be explained largely by enhanced rotational positioning signals encoded in DNA, but nucleosome phasing around the TSS of active genes was not determined by sequence preference; (4) the nucleosome occupancy landscape around the binding sites of some developmentally important transcription factors known to bind with different chromatin contexts, was also successfully predicted; (5) the difference of nucleosome occupancy around the TSS between CpG-rich and CpG-poor promoters was partly captured by our sequence-dependent model. Taken together, by developing an improved deformation-energy-based model, we revealed some sequence-dependent properties of the nucleosome arrangements in regions of distinct chromatin states in mouse ESCs.
Collapse
|
17
|
Zhang T, Tang Q, Nie F, Zhao Q, Chen W. DeepLncPro: an interpretable convolutional neural network model for identifying long non-coding RNA promoters. Brief Bioinform 2022; 23:6754194. [PMID: 36209437 DOI: 10.1093/bib/bbac447] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 09/14/2022] [Accepted: 09/17/2022] [Indexed: 12/14/2022] Open
Abstract
Long non-coding RNA (lncRNA) plays important roles in a series of biological processes. The transcription of lncRNA is regulated by its promoter. Hence, accurate identification of lncRNA promoter will be helpful to understand its regulatory mechanisms. Since experimental techniques remain time consuming for gnome-wide promoter identification, developing computational tools to identify promoters are necessary. However, only few computational methods have been proposed for lncRNA promoter prediction and their performances still have room to be improved. In the present work, a convolutional neural network based model, called DeepLncPro, was proposed to identify lncRNA promoters in human and mouse. Comparative results demonstrated that DeepLncPro was superior to both state-of-the-art machine learning methods and existing models for identifying lncRNA promoters. Furthermore, DeepLncPro has the ability to extract and analyze transcription factor binding motifs from lncRNAs, which made it become an interpretable model. These results indicate that the DeepLncPro can server as a powerful tool for identifying lncRNA promoters. An open-source tool for DeepLncPro was provided at https://github.com/zhangtian-yang/DeepLncPro.
Collapse
Affiliation(s)
- Tianyang Zhang
- School of Life Sciences, North China University of Science and Technology
| | - Qiang Tang
- School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine
| | - Fulei Nie
- School of Life Sciences, North China University of Science and Technology
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning
| | - Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine
| |
Collapse
|