1
|
Masoudi-Sobhanzadeh Y, Li S, Peng Y, Panchenko AR. Interpretable deep residual network uncovers nucleosome positioning and associated features. Nucleic Acids Res 2024:gkae623. [PMID: 39036965 DOI: 10.1093/nar/gkae623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 05/31/2024] [Accepted: 07/04/2024] [Indexed: 07/23/2024] Open
Abstract
Nucleosomes represent elementary building units of eukaryotic chromosomes and consist of DNA wrapped around a histone octamer flanked by linker DNA segments. Nucleosomes are central in epigenetic pathways and their genomic positioning is associated with regulation of gene expression, DNA replication, DNA methylation and DNA repair, among other functions. Building on prior discoveries that DNA sequences noticeably affect nucleosome positioning, our objective is to identify nucleosome positions and related features across entire genome. Here, we introduce an interpretable framework based on the concepts of deep residual networks (NuPoSe). Trained on high-coverage human experimental MNase-seq data, NuPoSe is able to learn sequence and structural patterns associated with nucleosome organization in human genome. NuPoSe can be also applied to unseen data from different organisms and cell types. Our findings point to 43 informative features, most of them constitute tri-nucleotides, di-nucleotides and one tetra-nucleotide. Most features are significantly associated with the nucleosomal structural characteristics, namely, periodicity of nucleosomal DNA and its location with respect to a histone octamer. Importantly, we show that features derived from the 27 bp linker DNA flanking nucleosomes contribute up to 10% to the quality of the prediction model. This, along with the comprehensive training sets, deep-learning architecture, and feature selection method, may contribute to the NuPoSe's 80-89% classification accuracy on different independent datasets.
Collapse
Affiliation(s)
| | - Shuxiang Li
- Department of Pathology and Molecular Medicine, Queen's University, Kingston, K7L3N6, Canada
| | - Yunhui Peng
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan, 430079, China
| | - Anna R Panchenko
- Department of Pathology and Molecular Medicine, Queen's University, Kingston, K7L3N6, Canada
- Department of Biology and Molecular Sciences, Queen's University, Kingston, K7L3N6, Canada
- School of Computing, Queen's University, Kingston, K7L3N6, Canada
- Ontario Institute of Cancer Research, Toronto, M5G 0A3, Canada
| |
Collapse
|
2
|
Shi K, Chen Y, Liu R, Fu X, Guo H, Gao T, Wang S, Dou L, Wang J, Wu Y, Yu J, Yu H. NFIC mediates m6A mRNA methylation to orchestrate transcriptional and post-transcriptional regulation to represses malignant phenotype of non-small cell lung cancer cells. Cancer Cell Int 2024; 24:223. [PMID: 38943137 PMCID: PMC11212411 DOI: 10.1186/s12935-024-03414-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 06/22/2024] [Indexed: 07/01/2024] Open
Abstract
BACKGROUND Multiple genetic and epigenetic regulatory mechanisms are crucial in the development and tumorigenesis process. Transcriptional regulation often involves intricate relationships and networks with post-transcriptional regulatory molecules, impacting the spatial and temporal expression of genes. However, the synergistic relationship between transcription factors and N6-methyladenosine (m6A) modification in regulating gene expression, as well as their influence on the mechanisms underlying the occurrence and progression of non-small cell lung cancer (NSCLC), requires further investigation. The present study aimed to investigate the synergistic relationship between transcription factors and m6A modification on NSCLC. METHODS The transcription factor NFIC and its potential genes was screened by analyzing publicly available datasets (ATAC-seq, DNase-seq, and RNA-seq). The association of NFIC and its potential target genes were validated through ChIP-qPCR and dual-luciferase reporter assays. Additionally, the roles of NFIC and its potential genes in NSCLC were detected in vitro and in vivo through silencing and overexpression assays. RESULTS Based on multi-omics data, the transcription factor NFIC was identified as a potential tumor suppressor of NSCLC. NFIC was significantly downregulated in both NSCLC tissues and cells, and when NFIC was overexpressed, the malignant phenotype and total m6A content of NSCLC cells was suppressed, while the PI3K/AKT pathway was inactivated. Additionally, we discovered that NFIC inhibits the expression of METTL3 by directly binding to its promoter region, and METTL3 regulates the expression of KAT2A, a histone acetyltransferase, by methylating the m6A site in the 3'UTR of KAT2A mRNA in NSCLC cells. Intriguingly, NFIC was also found to negatively regulate the expression of KAT2A by directly binding to its promoter region. CONCLUSIONS Our findings demonstrated that NFIC suppresses the malignant phenotype of NSCLC cells by regulating gene expression at both the transcriptional and post-transcriptional levels. A deeper comprehension of the genetic and epigenetic regulatory mechanisms in tumorigenesis would be beneficial for the development of personalized treatment strategies.
Collapse
Affiliation(s)
- Kesong Shi
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, 010020, Inner Mongolia, China
| | - Yani Chen
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, 010020, Inner Mongolia, China
| | - Ruihua Liu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, 010020, Inner Mongolia, China
| | - Xinyao Fu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, 010020, Inner Mongolia, China
| | - Hua Guo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, 010020, Inner Mongolia, China
| | - Tian Gao
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, 010020, Inner Mongolia, China
| | - Shu Wang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, 010020, Inner Mongolia, China
| | - Le Dou
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, 010020, Inner Mongolia, China
| | - Jiemin Wang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, 010020, Inner Mongolia, China
| | - Yuan Wu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, 010020, Inner Mongolia, China
| | - Jiale Yu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, 010020, Inner Mongolia, China
| | - Haiquan Yu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, School of Life Sciences, Inner Mongolia University, Hohhot, 010020, Inner Mongolia, China.
| |
Collapse
|
3
|
Wang Q, Zhang J, Liu Z, Duan Y, Li C. Integrative approaches based on genomic techniques in the functional studies on enhancers. Brief Bioinform 2023; 25:bbad442. [PMID: 38048082 PMCID: PMC10694556 DOI: 10.1093/bib/bbad442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 10/22/2023] [Accepted: 11/08/2023] [Indexed: 12/05/2023] Open
Abstract
With the development of sequencing technology and the dramatic drop in sequencing cost, the functions of noncoding genes are being characterized in a wide variety of fields (e.g. biomedicine). Enhancers are noncoding DNA elements with vital transcription regulation functions. Tens of thousands of enhancers have been identified in the human genome; however, the location, function, target genes and regulatory mechanisms of most enhancers have not been elucidated thus far. As high-throughput sequencing techniques have leapt forwards, omics approaches have been extensively employed in enhancer research. Multidimensional genomic data integration enables the full exploration of the data and provides novel perspectives for screening, identification and characterization of the function and regulatory mechanisms of unknown enhancers. However, multidimensional genomic data are still difficult to integrate genome wide due to complex varieties, massive amounts, high rarity, etc. To facilitate the appropriate methods for studying enhancers with high efficacy, we delineate the principles, data processing modes and progress of various omics approaches to study enhancers and summarize the applications of traditional machine learning and deep learning in multi-omics integration in the enhancer field. In addition, the challenges encountered during the integration of multiple omics data are addressed. Overall, this review provides a comprehensive foundation for enhancer analysis.
Collapse
Affiliation(s)
- Qilin Wang
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Junyou Zhang
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Zhaoshuo Liu
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Yingying Duan
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Chunyan Li
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
- Key Laboratory of Big Data-Based Precision Medicine (Ministry of Industry and Information Technology), Beihang University, Beijing 100191, China
- Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Beihang University, Beijing 100191, China
| |
Collapse
|
4
|
Wu YL, Lin ZJ, Li CC, Lin X, Shan SK, Guo B, Zheng MH, Li F, Yuan LQ, Li ZH. Epigenetic regulation in metabolic diseases: mechanisms and advances in clinical study. Signal Transduct Target Ther 2023; 8:98. [PMID: 36864020 PMCID: PMC9981733 DOI: 10.1038/s41392-023-01333-7] [Citation(s) in RCA: 58] [Impact Index Per Article: 58.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 01/02/2023] [Accepted: 01/18/2023] [Indexed: 03/04/2023] Open
Abstract
Epigenetics regulates gene expression and has been confirmed to play a critical role in a variety of metabolic diseases, such as diabetes, obesity, non-alcoholic fatty liver disease (NAFLD), osteoporosis, gout, hyperthyroidism, hypothyroidism and others. The term 'epigenetics' was firstly proposed in 1942 and with the development of technologies, the exploration of epigenetics has made great progresses. There are four main epigenetic mechanisms, including DNA methylation, histone modification, chromatin remodelling, and noncoding RNA (ncRNA), which exert different effects on metabolic diseases. Genetic and non-genetic factors, including ageing, diet, and exercise, interact with epigenetics and jointly affect the formation of a phenotype. Understanding epigenetics could be applied to diagnosing and treating metabolic diseases in the clinic, including epigenetic biomarkers, epigenetic drugs, and epigenetic editing. In this review, we introduce the brief history of epigenetics as well as the milestone events since the proposal of the term 'epigenetics'. Moreover, we summarise the research methods of epigenetics and introduce four main general mechanisms of epigenetic modulation. Furthermore, we summarise epigenetic mechanisms in metabolic diseases and introduce the interaction between epigenetics and genetic or non-genetic factors. Finally, we introduce the clinical trials and applications of epigenetics in metabolic diseases.
Collapse
Affiliation(s)
- Yan-Lin Wu
- National Clinical Research Center for Metabolic Disease, Department of Metabolism and Endocrinology, The Second Xiangya Hospital, Central South University, Changsha, Hunan, 410011, China
| | - Zheng-Jun Lin
- Department of Orthopaedics, The Second Xiangya Hospital, Central South University, Changsha, Hunan, 410011, China.,Hunan Key Laboratory of Tumor Models and Individualized Medicine, The Second Xiangya Hospital, Central South University, Changsha, Hunan, 410011, China
| | - Chang-Chun Li
- National Clinical Research Center for Metabolic Disease, Department of Metabolism and Endocrinology, The Second Xiangya Hospital, Central South University, Changsha, Hunan, 410011, China
| | - Xiao Lin
- Department of Radiology, The Second Xiangya Hospital, Central South University, Changsha, Hunan, 410011, China
| | - Su-Kang Shan
- National Clinical Research Center for Metabolic Disease, Department of Metabolism and Endocrinology, The Second Xiangya Hospital, Central South University, Changsha, Hunan, 410011, China
| | - Bei Guo
- National Clinical Research Center for Metabolic Disease, Department of Metabolism and Endocrinology, The Second Xiangya Hospital, Central South University, Changsha, Hunan, 410011, China
| | - Ming-Hui Zheng
- National Clinical Research Center for Metabolic Disease, Department of Metabolism and Endocrinology, The Second Xiangya Hospital, Central South University, Changsha, Hunan, 410011, China
| | - Fuxingzi Li
- National Clinical Research Center for Metabolic Disease, Department of Metabolism and Endocrinology, The Second Xiangya Hospital, Central South University, Changsha, Hunan, 410011, China
| | - Ling-Qing Yuan
- National Clinical Research Center for Metabolic Disease, Department of Metabolism and Endocrinology, The Second Xiangya Hospital, Central South University, Changsha, Hunan, 410011, China.
| | - Zhi-Hong Li
- Department of Orthopaedics, The Second Xiangya Hospital, Central South University, Changsha, Hunan, 410011, China. .,Hunan Key Laboratory of Tumor Models and Individualized Medicine, The Second Xiangya Hospital, Central South University, Changsha, Hunan, 410011, China.
| |
Collapse
|
5
|
Song C, Li W, Wang Z. The Landscape of Liver Chromatin Accessibility and Conserved Non-coding Elements in Larimichthys crocea, Nibea albiflora, and Lateolabrax maculatus. MARINE BIOTECHNOLOGY (NEW YORK, N.Y.) 2022; 24:763-775. [PMID: 35895229 DOI: 10.1007/s10126-022-10142-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 07/08/2022] [Indexed: 06/15/2023]
Abstract
Large yellow croaker (Larimichthys crocea), yellow drum (Nibea albiflora), and Chinese seabass (Lateolabrax maculatus) are important economic marine fishes in China. The conserved non-coding elements (CNEs) in the liver tissues of the three kinds of fish are directly or indirectly involved in the regulation of gene expression and affect liver functions. However, the fishes' CNEs and even chromatin accessibility landscape have not been effectively investigated. Hence, this study established the landscapes of the fishes' genome-wide chromatin accessibility and CNEs by detecting regions of the open chromatin in their livers using an assay for transposase-accessible chromatin by high-throughput sequencing (ATAC-seq) and comparative genomics approach. The results showed that Smad1, Sp1, and Foxl1 transcription factor binding motifs were considerably enriched in the chromatin accessibility landscape in the liver of the three species, and the three transcription factors (TFs) had a wide range of common targets. The hypothetical gene set was targeted by one, two, or all three TFs, which was much higher than would be expected for an accidental outcome. The gene sets near the CNEs were mainly enriched through processes such as a macromolecule metabolic process and ribonucleoprotein complex biogenesis. The active CNEs were found in the promoter regions of genes such as ap1g1, hax1, and ndufs2. And 5 CNEs were predicted to be highly conserved active enhancers. These results demonstrated that Smad1, Sp1, and Foxl1 might be related to the liver function in the three fishes. In addition, we found a series of ATAC-seq-labeled CNEs located in the gene promoter regions, and highly conserved H3k27ac + -labeled CNEs located in the liver function genes. The highly conserved nature of these regulatory elements suggests that they play important roles in the liver in fish. This study mined the landscape of chromatin accessibility and CNEs of three important economic fishes to fill the knowledge gaps in this field. Moreover, the work provides useful data for the industrial application and theoretical research of these three fish species.
Collapse
Affiliation(s)
- Chaowei Song
- Key Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture and Rural Affairs, Jimei University, Xiamen, China
| | - Wanbo Li
- Key Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture and Rural Affairs, Jimei University, Xiamen, China
| | - Zhiyong Wang
- Key Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture and Rural Affairs, Jimei University, Xiamen, China.
- Laboratory for Marine Fisheries Science and Food Production Processes, National Laboratory for Marine Science and Technology, Qingdao, China.
| |
Collapse
|
6
|
Wu T, Hu E, Xu S, Chen M, Guo P, Dai Z, Feng T, Zhou L, Tang W, Zhan L, Fu X, Liu S, Bo X, Yu G. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation (N Y) 2021; 2:100141. [PMID: 34557778 PMCID: PMC8454663 DOI: 10.1016/j.xinn.2021.100141] [Citation(s) in RCA: 2649] [Impact Index Per Article: 883.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2021] [Accepted: 06/29/2021] [Indexed: 12/15/2022] Open
Abstract
Functional enrichment analysis is pivotal for interpreting high-throughput omics data in life science. It is crucial for this type of tool to use the latest annotation databases for as many organisms as possible. To meet these requirements, we present here an updated version of our popular Bioconductor package, clusterProfiler 4.0. This package has been enhanced considerably compared with its original version published 9 years ago. The new version provides a universal interface for functional enrichment analysis in thousands of organisms based on internally supported ontologies and pathways as well as annotation data provided by users or derived from online databases. It also extends the dplyr and ggplot2 packages to offer tidy interfaces for data operation and visualization. Other new features include gene set enrichment analysis and comparison of enrichment results from multiple gene lists. We anticipate that clusterProfiler 4.0 will be applied to a wide range of scenarios across diverse organisms. clusterProfiler supports exploring functional characteristics of both coding and non-coding genomics data for thousands of species with up-to-date gene annotation It provides a universal interface for gene functional annotation from a variety of sources and thus can be applied in diverse scenarios It provides a tidy interface to access, manipulate, and visualize enrichment results to help users achieve efficient data interpretation Datasets obtained from multiple treatments and time points can be analyzed and compared in a single run, easily revealing functional consensus and differences among distinct conditions
Collapse
Affiliation(s)
- Tianzhi Wu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Erqiang Hu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Shuangbin Xu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Meijun Chen
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Pingfan Guo
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Zehan Dai
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Tingze Feng
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Lang Zhou
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Wenli Tang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Li Zhan
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Xiaocong Fu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Shanshan Liu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Xiaochen Bo
- Department of Biotechnology, Beijing Institute of Radiation Medicine, Beijing 100850, China
| | - Guangchuang Yu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China.,Guangdong Provincial Key Laboratory of Proteomics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China.,Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou 510515, China
| |
Collapse
|
7
|
Krützfeldt LM, Schubach M, Kircher M. The impact of different negative training data on regulatory sequence predictions. PLoS One 2020; 15:e0237412. [PMID: 33259518 PMCID: PMC7707526 DOI: 10.1371/journal.pone.0237412] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Accepted: 11/12/2020] [Indexed: 01/08/2023] Open
Abstract
Regulatory regions, like promoters and enhancers, cover an estimated 5–15% of the human genome. Changes to these sequences are thought to underlie much of human phenotypic variation and a substantial proportion of genetic causes of disease. However, our understanding of their functional encoding in DNA is still very limited. Applying machine or deep learning methods can shed light on this encoding and gapped k-mer support vector machines (gkm-SVMs) or convolutional neural networks (CNNs) are commonly trained on putative regulatory sequences. Here, we investigate the impact of negative sequence selection on model performance. By training gkm-SVM and CNN models on open chromatin data and corresponding negative training dataset, both learners and two approaches for negative training data are compared. Negative sets use either genomic background sequences or sequence shuffles of the positive sequences. Model performance was evaluated on three different tasks: predicting elements active in a cell-type, predicting cell-type specific elements, and predicting elements' relative activity as measured from independent experimental data. Our results indicate strong effects of the negative training data, with genomic backgrounds showing overall best results. Specifically, models trained on highly shuffled sequences perform worse on the complex tasks of tissue-specific activity and quantitative activity prediction, and seem to learn features of artificial sequences rather than regulatory activity. Further, we observe that insufficient matching of genomic background sequences results in model biases. While CNNs achieved and exceeded the performance of gkm-SVMs for larger training datasets, gkm-SVMs gave robust and best results for typical training dataset sizes without the need of hyperparameter optimization.
Collapse
Affiliation(s)
- Louisa-Marie Krützfeldt
- Charité–Universitätsmedizin Berlin, Berlin, Germany
- Berlin Institute of Health (BIH), Berlin, Germany
| | - Max Schubach
- Charité–Universitätsmedizin Berlin, Berlin, Germany
- Berlin Institute of Health (BIH), Berlin, Germany
| | - Martin Kircher
- Charité–Universitätsmedizin Berlin, Berlin, Germany
- Berlin Institute of Health (BIH), Berlin, Germany
- * E-mail:
| |
Collapse
|