1
|
Zhou X, Zhou L, Qian F, Chen J, Zhang Y, Yu Z, Zhang J, Yang Y, Li Y, Song C, Wang Y, Shang D, Dong L, Zhu J, Li C, Wang Q. TFTG: A comprehensive database for human transcription factors and their targets. Comput Struct Biotechnol J 2024; 23:1877-1885. [PMID: 38707542 PMCID: PMC11068477 DOI: 10.1016/j.csbj.2024.04.036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 04/15/2024] [Accepted: 04/15/2024] [Indexed: 05/07/2024] Open
Abstract
Transcription factors (TFs) are major contributors to gene transcription, especially in controlling cell-specific gene expression and disease occurrence and development. Uncovering the relationship between TFs and their target genes is critical to understanding the mechanism of action of TFs. With the development of high-throughput sequencing techniques, a large amount of TF-related data has accumulated, which can be used to identify their target genes. In this study, we developed TFTG (Transcription Factor and Target Genes) database (http://tf.liclab.net/TFTG), which aimed to provide a large number of available human TF-target gene resources by multiple strategies, besides performing a comprehensive functional and epigenetic annotations and regulatory analyses of TFs. We identified extensive available TF-target genes by collecting and processing TF-associated ChIP-seq datasets, perturbation RNA-seq datasets and motifs. We also obtained experimentally confirmed relationships between TF and target genes from available resources. Overall, the target genes of TFs were obtained through integrating the relevant data of various TFs as well as fourteen identification strategies. Meanwhile, TFTG was embedded with user-friendly search, analysis, browsing, downloading and visualization functions. TFTG is designed to be a convenient resource for exploring human TF-target gene regulations, which will be useful for most users in the TF and gene expression regulation research.
Collapse
Affiliation(s)
- Xinyuan Zhou
- The First Affiliated Hospital & Hunan Provincial Key Laboratory of Multi-omics And Artificial Intelligence of Cardiovascular Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- College of Artificial Intelligence and Big Data For Medical Sciences, Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, China
| | - Liwei Zhou
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Fengcui Qian
- The First Affiliated Hospital & Hunan Provincial Key Laboratory of Multi-omics And Artificial Intelligence of Cardiovascular Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences & MOE Key Lab of Rare Pediatric Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
| | - Jiaxin Chen
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Yuexin Zhang
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
| | - Zhengmin Yu
- School of Computer, University of South China, Hengyang, Hunan 421001, China
| | - Jian Zhang
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Yongsan Yang
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Yanyu Li
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Chao Song
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
| | - Yuezhu Wang
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Desi Shang
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
| | - Longlong Dong
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Jiang Zhu
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Chunquan Li
- The First Affiliated Hospital & Hunan Provincial Key Laboratory of Multi-omics And Artificial Intelligence of Cardiovascular Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- Hunan Provincial Maternal and Child Health Care Hospital, National Health Commission Key Laboratory of Birth Defect Research and Prevention, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences & MOE Key Lab of Rare Pediatric Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- School of Computer, University of South China, Hengyang, Hunan 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- The First Affiliated Hospital, Institute of Cardiovascular Disease, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- The First Affiliated Hospital, Department of Cardiology, Hengyang Medical School, University of South China, Hengyang, China
| | - Qiuyu Wang
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
| |
Collapse
|
2
|
Castilho RM, Castilho LS, Palomares BH, Squarize CH. Determinants of Chromatin Organization in Aging and Cancer-Emerging Opportunities for Epigenetic Therapies and AI Technology. Genes (Basel) 2024; 15:710. [PMID: 38927646 DOI: 10.3390/genes15060710] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Revised: 05/21/2024] [Accepted: 05/26/2024] [Indexed: 06/28/2024] Open
Abstract
This review article critically examines the pivotal role of chromatin organization in gene regulation, cellular differentiation, disease progression and aging. It explores the dynamic between the euchromatin and heterochromatin, coded by a complex array of histone modifications that orchestrate essential cellular processes. We discuss the pathological impacts of chromatin state misregulation, particularly in cancer and accelerated aging conditions such as progeroid syndromes, and highlight the innovative role of epigenetic therapies and artificial intelligence (AI) in comprehending and harnessing the histone code toward personalized medicine. In the context of aging, this review explores the use of AI and advanced machine learning (ML) algorithms to parse vast biological datasets, leading to the development of predictive models for epigenetic modifications and providing a framework for understanding complex regulatory mechanisms, such as those governing cell identity genes. It supports innovative platforms like CEFCIG for high-accuracy predictions and tools like GridGO for tailored ChIP-Seq analysis, which are vital for deciphering the epigenetic landscape. The review also casts a vision on the prospects of AI and ML in oncology, particularly in the personalization of cancer therapy, including early diagnostics and treatment optimization for diseases like head and neck and colorectal cancers by harnessing computational methods, AI advancements and integrated clinical data for a transformative impact on healthcare outcomes.
Collapse
Affiliation(s)
- Rogerio M Castilho
- Laboratory of Epithelial Biology, Department of Periodontics and Oral Medicine, School of Dentistry, University of Michigan, Ann Arbor, MI 48109-1078, USA
- Rogel Cancer Center, University of Michigan, Ann Arbor, MI 48109-1078, USA
| | - Leonard S Castilho
- Laboratory of Epithelial Biology, Department of Periodontics and Oral Medicine, School of Dentistry, University of Michigan, Ann Arbor, MI 48109-1078, USA
| | - Bruna H Palomares
- Oral Diagnosis Department, Piracicaba School of Dentistry, State University of Campinas, Piracicaba 13414-903, Sao Paulo, Brazil
| | - Cristiane H Squarize
- Laboratory of Epithelial Biology, Department of Periodontics and Oral Medicine, School of Dentistry, University of Michigan, Ann Arbor, MI 48109-1078, USA
- Rogel Cancer Center, University of Michigan, Ann Arbor, MI 48109-1078, USA
| |
Collapse
|
3
|
Gao Z, Jiang R, Chen S. OpenAnnotateApi: Python and R packages to efficiently annotate and analyze chromatin accessibility of genomic regions. BIOINFORMATICS ADVANCES 2024; 4:vbae055. [PMID: 38645715 PMCID: PMC11031356 DOI: 10.1093/bioadv/vbae055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 03/20/2024] [Accepted: 04/08/2024] [Indexed: 04/23/2024]
Abstract
Summary Chromatin accessibility serves as a critical measurement of physical contact between nuclear macromolecules and DNA sequence, providing valuable insights into the comprehensive landscape of regulatory mechanisms, thus we previously developed the OpenAnnotate web server. However, as an increasing number of epigenomic analysis software tools emerged, web-based annotation often faced limitations and inconveniences when integrated into these software pipelines. To address these issues, we here develop two software packages named OpenAnnotatePy and OpenAnnotateR. In addition to web-based functionalities, these packages encompass supplementary features, including the capability for simultaneous annotation across multiple cell types, advanced searching of systems, tissues and cell types, and converting the result to the data structure of mainstream tools. Moreover, we applied the packages to various scenarios, including cell type revealing, regulatory element prediction, and integration into mainstream single-cell ATAC-seq analysis pipelines including EpiScanpy, Signac, and ArchR. We anticipate that OpenAnnotateApi will significantly facilitate the deciphering of gene regulatory mechanisms, and offer crucial assistance in the field of epigenomic studies. Availability and implementation OpenAnnotateApi for R is available at https://github.com/ZjGaothu/OpenAnnotateR and for Python is available at https://github.com/ZjGaothu/OpenAnnotatePy.
Collapse
Affiliation(s)
- Zijing Gao
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| |
Collapse
|
4
|
Peng G, Liu B, Zheng M, Zhang L, Li H, Liu M, Liang Y, Chen T, Luo X, Shi X, Ren J, Zheng Y. TSCRE: a comprehensive database for tumor-specific cis-regulatory elements. NAR Cancer 2024; 6:zcad063. [PMID: 38213995 PMCID: PMC10782923 DOI: 10.1093/narcan/zcad063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 11/18/2023] [Accepted: 12/31/2023] [Indexed: 01/13/2024] Open
Abstract
Cis-regulatory elements (CREs) and super cis-regulatory elements (SCREs) are non-coding DNA regions which influence the transcription of nearby genes and play critical roles in development. Dysregulated CRE and SCRE activities have been reported to alter the expression of oncogenes and tumor suppressors, thereby regulating cancer hallmarks. To address the strong need for a comprehensive catalogue of dysregulated CREs and SCREs in human cancers, we present TSCRE (http://tscre.zsqylab.com/), an open resource providing tumor-specific and cell type-specific CREs and SCREs derived from the re-analysis of publicly available histone modification profiles. Currently, TSCRE contains 1 864 941 dysregulated CREs and 68 253 dysregulated SCREs identified from 1366 human patient samples spanning 17 different cancer types and 9 histone marks. Over 95% of these elements have been validated in public resources. TSCRE offers comprehensive annotations for each element, including associated genes, expression patterns, clinical prognosis, somatic mutations, transcript factor binding sites, cancer-type specificity, and drug response. Additionally, TSCRE integrates pathway and transcript factor enrichment analyses for each study, enabling in-depth functional and mechanistic investigations. Furthermore, TSCRE provides an interactive interface for users to explore any CRE and SCRE of interest. We believe TSCRE will be a highly valuable platform for the community to discover candidate cancer biomarkers.
Collapse
Affiliation(s)
- Guanjie Peng
- Clinical Big Data Research Center, Scientific Research Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, School of Life Sciences, Sun Yat-sen University, Guangzhou 510060, China
- Guangzhou Municipal and Guangdong Provincial Key Laboratory of Protein Modification and Degradation, Affiliated Cancer Hospital of Guangzhou Medical University, State Key Laboratory of Respiratory Disease, School of Basic Medical Sciences, Guangzhou Medical University, Guangzhou 510120, China
| | - Bingyuan Liu
- Clinical Big Data Research Center, Scientific Research Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, School of Life Sciences, Sun Yat-sen University, Guangzhou 510060, China
- Guangzhou Municipal and Guangdong Provincial Key Laboratory of Protein Modification and Degradation, Affiliated Cancer Hospital of Guangzhou Medical University, State Key Laboratory of Respiratory Disease, School of Basic Medical Sciences, Guangzhou Medical University, Guangzhou 510120, China
| | - Mohan Zheng
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, School of Life Sciences, Sun Yat-sen University, Guangzhou 510060, China
| | - Luowanyue Zhang
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, School of Life Sciences, Sun Yat-sen University, Guangzhou 510060, China
| | - Huiqin Li
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, School of Life Sciences, Sun Yat-sen University, Guangzhou 510060, China
| | - Mengni Liu
- Clinical Big Data Research Center, Scientific Research Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China
| | - Yuan Liang
- Clinical Big Data Research Center, Scientific Research Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China
| | - Tianjian Chen
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, School of Life Sciences, Sun Yat-sen University, Guangzhou 510060, China
| | - Xiaotong Luo
- Guangdong Institute of Gastroenterology, Department of General Surgery, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Diseases, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou 510060, China
| | - Xianping Shi
- Guangzhou Municipal and Guangdong Provincial Key Laboratory of Protein Modification and Degradation, Affiliated Cancer Hospital of Guangzhou Medical University, State Key Laboratory of Respiratory Disease, School of Basic Medical Sciences, Guangzhou Medical University, Guangzhou 510120, China
| | - Jian Ren
- State Key Laboratory of Oncology in South China, Cancer Center, Collaborative Innovation Center for Cancer Medicine, School of Life Sciences, Sun Yat-sen University, Guangzhou 510060, China
| | - Yueyuan Zheng
- Clinical Big Data Research Center, Scientific Research Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China
| |
Collapse
|
5
|
Gao Z, Liu Q, Zeng W, Jiang R, Wong WH. EpiGePT: a Pretrained Transformer model for epigenomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.07.15.549134. [PMID: 37502861 PMCID: PMC10370089 DOI: 10.1101/2023.07.15.549134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
The inherent similarities between natural language and biological sequences have given rise to great interest in adapting the transformer-based large language models (LLMs) underlying recent breakthroughs in natural language processing (references), for applications in genomics. However, current LLMs for genomics suffer from several limitations such as the inability to include chromatin interactions in the training data, and the inability to make prediction in new cellular contexts not represented in the training data. To mitigate these problems, we propose EpiGePT, a transformer-based pretrained language model for predicting context-specific epigenomic signals and chromatin contacts. By taking the context-specific activities of transcription factors (TFs) and 3D genome interactions into consideration, EpiGePT offers wider applicability and deeper biological insights than models trained on DNA sequence only. In a series of experiments, EpiGePT demonstrates superior performance in a diverse set of epigenomic signals prediction tasks when compared to existing methods. In particular, our model enables cross-cell-type prediction of long-range interactions and offers insight on the functional impact of genetic variants under different cellular contexts. These new capabilities will enhance the usefulness of LLM in the study of gene regulatory mechanisms. We provide free online prediction service of EpiGePT through http://health.tsinghua.edu.cn/epigept/.
Collapse
Affiliation(s)
- Zijing Gao
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Qiao Liu
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
| | - Wanwen Zeng
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Wing Hung Wong
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
- Department of Biomedical Data Science, Bio-X Program, Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
6
|
Feng X, Liu S, Li K, Bu F, Yuan H. NCAD v1.0: a database for non-coding variant annotation and interpretation. J Genet Genomics 2024; 51:230-242. [PMID: 38142743 DOI: 10.1016/j.jgg.2023.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 12/15/2023] [Accepted: 12/18/2023] [Indexed: 12/26/2023]
Abstract
The application of whole genome sequencing is expanding in clinical diagnostics across various genetic disorders, and the significance of non-coding variants in penetrant diseases is increasingly being demonstrated. Therefore, it is urgent to improve the diagnostic yield by exploring the pathogenic mechanisms of variants in non-coding regions. However, the interpretation of non-coding variants remains a significant challenge, due to the complex functional regulatory mechanisms of non-coding regions and the current limitations of available databases and tools. Hence, we develop the non-coding variant annotation database (NCAD, http://www.ncawdb.net/), encompassing comprehensive insights into 665,679,194 variants, regulatory elements, and element interaction details. Integrating data from 96 sources, spanning both GRCh37 and GRCh38 versions, NCAD v1.0 provides vital information to support the genetic diagnosis of non-coding variants, including allele frequencies of 12 diverse populations, with a particular focus on the population frequency information for 230,235,698 variants in 20,964 Chinese individuals. Moreover, it offers prediction scores for variant functionality, five categories of regulatory elements, and four types of non-coding RNAs. With its rich data and comprehensive coverage, NCAD serves as a valuable platform, empowering researchers and clinicians with profound insights into non-coding regulatory mechanisms while facilitating the interpretation of non-coding variants.
Collapse
Affiliation(s)
- Xiaoshu Feng
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China
| | - Sihan Liu
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China
| | - Ke Li
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China
| | - Fengxiao Bu
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China.
| | - Huijun Yuan
- Institute of Rare Diseases, West China Hospital, Sichuan University, Chengdu, Sichuan 610044, China.
| |
Collapse
|
7
|
Zhu Z, Zhou Q, Sun Y, Lai F, Wang Z, Hao Z, Li G. MethMarkerDB: a comprehensive cancer DNA methylation biomarker database. Nucleic Acids Res 2024; 52:D1380-D1392. [PMID: 37889076 PMCID: PMC10767949 DOI: 10.1093/nar/gkad923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/21/2023] [Accepted: 10/10/2023] [Indexed: 10/28/2023] Open
Abstract
DNA methylation plays a crucial role in tumorigenesis and tumor progression, sparking substantial interest in the clinical applications of cancer DNA methylation biomarkers. Cancer-related whole-genome bisulfite sequencing (WGBS) data offers a promising approach to precisely identify these biomarkers with differentially methylated regions (DMRs). However, currently there is no dedicated resource for cancer DNA methylation biomarkers with WGBS data. Here, we developed a comprehensive cancer DNA methylation biomarker database (MethMarkerDB, https://methmarkerdb.hzau.edu.cn/), which integrated 658 WGBS datasets, incorporating 724 curated DNA methylation biomarker genes from 1425 PubMed published articles. Based on WGBS data, we documented 5.4 million DMRs from 13 common types of cancer as candidate DNA methylation biomarkers. We provided search and annotation functions for these DMRs with different resources, such as enhancers and SNPs, and developed diagnostic and prognostic models for further biomarker evaluation. With the database, we not only identified known DNA methylation biomarkers, but also identified 781 hypermethylated and 5245 hypomethylated pan-cancer DMRs, corresponding to 693 and 2172 genes, respectively. These novel potential pan-cancer DNA methylation biomarkers hold significant clinical translational value. We hope that MethMarkerDB will help identify novel cancer DNA methylation biomarkers and propel the clinical application of these biomarkers.
Collapse
Affiliation(s)
- Zhixian Zhu
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Qiangwei Zhou
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Yuanhui Sun
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Fuming Lai
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zhenji Wang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zhigang Hao
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Guoliang Li
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
8
|
Edrei Y, Levy R, Kaye D, Marom A, Radlwimmer B, Hellman A. Methylation-directed regulatory networks determine enhancing and silencing of mutation disease driver genes and explain inter-patient expression variation. Genome Biol 2023; 24:264. [PMID: 38012713 PMCID: PMC10683314 DOI: 10.1186/s13059-023-03094-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Accepted: 10/23/2023] [Indexed: 11/29/2023] Open
Abstract
BACKGROUND Common diseases manifest differentially between patients, but the genetic origin of this variation remains unclear. To explore possible involvement of gene transcriptional-variation, we produce a DNA methylation-oriented, driver-gene-wide dataset of regulatory elements in human glioblastomas and study their effect on inter-patient gene expression variation. RESULTS In 175 of 177 analyzed gene regulatory domains, transcriptional enhancers and silencers are intermixed. Under experimental conditions, DNA methylation induces enhancers to alter their enhancing effects or convert into silencers, while silencers are affected inversely. High-resolution mapping of the association between DNA methylation and gene expression in intact genomes reveals methylation-related regulatory units (average size = 915.1 base-pairs). Upon increased methylation of these units, their target-genes either increased or decreased in expression. Gene-enhancing and silencing units constitute cis-regulatory networks of genes. Mathematical modeling of the networks highlights indicative methylation sites, which signified the effect of key regulatory units, and add up to make the overall transcriptional effect of the network. Methylation variation in these sites effectively describe inter-patient expression variation and, compared with DNA sequence-alterations, appears as a major contributor of gene-expression variation among glioblastoma patients. CONCLUSIONS We describe complex cis-regulatory networks, which determine gene expression by summing the effects of positive and negative transcriptional inputs. In these networks, DNA methylation induces both enhancing and silencing effects, depending on the context. The revealed mechanism sheds light on the regulatory role of DNA methylation, explains inter-individual gene-expression variation, and opens the way for monitoring the driving forces behind deferential courses of cancer and other diseases.
Collapse
Affiliation(s)
- Yifat Edrei
- Department of Developmental Biology and Cancer Research, The Institute for Medical Research Israel-Canada (IMRIC), The Hebrew University-Hadassah Medical School, 9112102, Jerusalem, Israel
| | - Revital Levy
- Department of Developmental Biology and Cancer Research, The Institute for Medical Research Israel-Canada (IMRIC), The Hebrew University-Hadassah Medical School, 9112102, Jerusalem, Israel
| | - Daniel Kaye
- Department of Developmental Biology and Cancer Research, The Institute for Medical Research Israel-Canada (IMRIC), The Hebrew University-Hadassah Medical School, 9112102, Jerusalem, Israel
| | - Anat Marom
- Department of Developmental Biology and Cancer Research, The Institute for Medical Research Israel-Canada (IMRIC), The Hebrew University-Hadassah Medical School, 9112102, Jerusalem, Israel
| | - Bernhard Radlwimmer
- Division of Molecular Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Asaf Hellman
- Department of Developmental Biology and Cancer Research, The Institute for Medical Research Israel-Canada (IMRIC), The Hebrew University-Hadassah Medical School, 9112102, Jerusalem, Israel.
| |
Collapse
|
9
|
Pan JH, Du PF. SilenceREIN: seeking silencers on anchors of chromatin loops by deep graph neural networks. Brief Bioinform 2023; 25:bbad494. [PMID: 38168841 PMCID: PMC10782921 DOI: 10.1093/bib/bbad494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 11/09/2023] [Accepted: 12/04/2023] [Indexed: 01/05/2024] Open
Abstract
Silencers are repressive cis-regulatory elements that play crucial roles in transcriptional regulation. Experimental methods for identifying silencers are always costly and time-consuming. Computational methods, which relies on genomic sequence features, have been introduced as alternative approaches. However, silencers do not have significant epigenomic signature. Therefore, we explore a new way to computationally identify silencers, by incorporating chromatin structural information. We propose the SilenceREIN method, which focuses on finding silencers on anchors of chromatin loops. By using graph neural networks, we extracted chromatin structural information from a regulatory element interaction network. SilenceREIN integrated the chromatin structural information with linear genomic signatures to find silencers. The predictive performance of SilenceREIN is comparable or better than other states-of-the-art methods. We performed a genome-wide scanning to systematically find silencers in human genome. Results suggest that silencers are widespread on anchors of chromatin loops. In addition, enrichment analysis of transcription factor binding motif support our prediction results. As far as we can tell, this is the first attempt to incorporate chromatin structural information in finding silencers. All datasets and source codes of SilenceREIN have been deposited in a GitHub repository (https://github.com/JianHPan/SilenceREIN).
Collapse
Affiliation(s)
- Jian-Hua Pan
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Pu-Feng Du
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| |
Collapse
|
10
|
Zhang T, Li L, Sun H, Xu D, Wang G. DeepICSH: a complex deep learning framework for identifying cell-specific silencers and their strength from the human genome. Brief Bioinform 2023; 24:bbad316. [PMID: 37643374 DOI: 10.1093/bib/bbad316] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 07/25/2023] [Accepted: 08/11/2023] [Indexed: 08/31/2023] Open
Abstract
Silencers are noncoding DNA sequence fragments located on the genome that suppress gene expression. The variation of silencers in specific cells is closely related to gene expression and cancer development. Computational approaches that exclusively rely on DNA sequence information for silencer identification fail to account for the cell specificity of silencers, resulting in diminished accuracy. Despite the discovery of several transcription factors and epigenetic modifications associated with silencers on the genome, there is still no definitive biological signal or combination thereof to fully characterize silencers, posing challenges in selecting suitable biological signals for their identification. Therefore, we propose a sophisticated deep learning framework called DeepICSH, which is based on multiple biological data sources. Specifically, DeepICSH leverages a deep convolutional neural network to automatically capture biologically relevant signal combinations strongly associated with silencers, originating from a diverse array of biological signals. Furthermore, the utilization of attention mechanisms facilitates the scoring and visualization of these signal combinations, whereas the employment of skip connections facilitates the fusion of multilevel sequence features and signal combinations, thereby empowering the accurate identification of silencers within specific cells. Extensive experiments on HepG2 and K562 cell line data sets demonstrate that DeepICSH outperforms state-of-the-art methods in silencer identification. Notably, we introduce for the first time a deep learning framework based on multi-omics data for classifying strong and weak silencers, achieving favorable performance. In conclusion, DeepICSH shows great promise for advancing the study and analysis of silencers in complex diseases. The source code is available at https://github.com/lyli1013/DeepICSH.
Collapse
Affiliation(s)
- Tianjiao Zhang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| | - Liangyu Li
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| | - Hailong Sun
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| | - Dali Xu
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| | - Guohua Wang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| |
Collapse
|
11
|
Ni P, Wu S, Su Z. Underlying causes for prevalent false positives and false negatives in STARR-seq data. NAR Genom Bioinform 2023; 5:lqad085. [PMID: 37745976 PMCID: PMC10516709 DOI: 10.1093/nargab/lqad085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 08/23/2023] [Accepted: 09/12/2023] [Indexed: 09/26/2023] Open
Abstract
Self-transcribing active regulatory region sequencing (STARR-seq) and its variants have been widely used to characterize enhancers. However, it has been reported that up to 87% of STARR-seq peaks are located in repressive chromatin and are not functional in the tested cells. While some of the STARR-seq peaks in repressive chromatin might be active in other cell/tissue types, some others might be false positives. Meanwhile, many active enhancers may not be identified by the current STARR-seq methods. Although methods have been proposed to mitigate systematic errors caused by the use of plasmid vectors, the artifacts due to the intrinsic limitations of current STARR-seq methods are still prevalent and the underlying causes are not fully understood. Based on predicted cis-regulatory modules (CRMs) and non-CRMs in the human genome as well as predicted active CRMs and non-active CRMs in a few human cell lines/tissues with STARR-seq data available, we reveal prevalent false positives and false negatives in STARR-seq peaks generated by major variants of STARR-seq methods and possible underlying causes. Our results will help design strategies to improve STARR-seq methods and interpret the results.
Collapse
Affiliation(s)
- Pengyu Ni
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Siwen Wu
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Zhengchang Su
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| |
Collapse
|
12
|
Stefan K, Barski A. Cis-regulatory atlas of primary human CD4+ T cells. BMC Genomics 2023; 24:253. [PMID: 37170195 PMCID: PMC10173520 DOI: 10.1186/s12864-023-09288-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Accepted: 03/31/2023] [Indexed: 05/13/2023] Open
Abstract
Cis-regulatory elements (CRE) are critical for coordinating gene expression programs that dictate cell-specific differentiation and homeostasis. Recently developed self-transcribing active regulatory region sequencing (STARR-Seq) has allowed for genome-wide annotation of functional CREs. Despite this, STARR-Seq assays are only employed in cell lines, in part, due to difficulties in delivering reporter constructs. Herein, we implemented and validated a STARR-Seq-based screen in human CD4+ T cells using a non-integrating lentiviral transduction system. Lenti-STARR-Seq is the first example of a genome-wide assay of CRE function in human primary cells, identifying thousands of functional enhancers and negative regulatory elements (NREs) in human CD4+ T cells. We find an unexpected difference in nucleosome organization between enhancers and NRE: enhancers are located between nucleosomes, whereas NRE are occupied by nucleosomes in their endogenous locations. We also describe chromatin modification, eRNA production, and transcription factor binding at both enhancers and NREs. Our findings support the idea of silencer repurposing as enhancers in alternate cell types. Collectively, these data suggest that Lenti-STARR-Seq is a successful approach for CRE screening in primary human cell types, and provides an atlas of functional CREs in human CD4+ T cells.
Collapse
Affiliation(s)
- Kurtis Stefan
- Division of Allergy & Immunology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 7028, Cincinnati, OH, 45229-3026, USA
- Medical Scientist Training Program (MSTP), University of Cincinnati College of Medicine, Cincinnati, OH, 45267, USA
| | - Artem Barski
- Division of Allergy & Immunology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 7028, Cincinnati, OH, 45229-3026, USA.
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, 45229-3026, USA.
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, 45267, USA.
| |
Collapse
|
13
|
Ding K, Sun S, Luo Y, Long C, Zhai J, Zhai Y, Wang G. PlantCADB: A Comprehensive Plant Chromatin Accessibility Database. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:311-323. [PMID: 36328151 PMCID: PMC10626055 DOI: 10.1016/j.gpb.2022.10.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/08/2022] [Revised: 09/25/2022] [Accepted: 10/24/2022] [Indexed: 11/16/2022]
Abstract
Chromatin accessibility landscapes are essential for detecting regulatory elements, illustrating the corresponding regulatory networks, and, ultimately, understanding the molecular basis underlying key biological processes. With the advancement of sequencing technologies, a large volume of chromatin accessibility data has been accumulated and integrated for humans and other mammals. These data have greatly advanced the study of disease pathogenesis, cancer survival prognosis, and tissue development. To advance the understanding of molecular mechanisms regulating plant key traits and biological processes, we developed a comprehensive plant chromatin accessibility database (PlantCADB) from 649 samples of 37 species. These samples are abiotic stress-related (such as heat, cold, drought, and salt; 159 samples), development-related (232 samples), and/or tissue-specific (376 samples). Overall, 18,339,426 accessible chromatin regions (ACRs) were compiled. These ACRs were annotated with genomic information, associated genes, transcription factor footprint, motif, and single-nucleotide polymorphisms (SNPs). Additionally, PlantCADB provides various tools to visualize ACRs and corresponding annotations. It thus forms an integrated, annotated, and analyzed plant-related chromatin accessibility resource, which can aid in better understanding genetic regulatory networks underlying development, important traits, stress adaptations, and evolution.PlantCADB is freely available at https://bioinfor.nefu.edu.cn/PlantCADB/.
Collapse
Affiliation(s)
- Ke Ding
- State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin 150040, China; College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Shanwen Sun
- College of Life Science, Northeast Forestry University, Harbin 150040, China
| | - Yang Luo
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Chaoyue Long
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Jingwen Zhai
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Yixiao Zhai
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Guohua Wang
- State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin 150040, China; College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China.
| |
Collapse
|
14
|
Liu Q, Zeng W, Zhang W, Wang S, Chen H, Jiang R, Zhou M, Zhang S. Deep generative modeling and clustering of single cell Hi-C data. Brief Bioinform 2023; 24:6858951. [PMID: 36458445 DOI: 10.1093/bib/bbac494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 09/28/2022] [Accepted: 10/18/2022] [Indexed: 12/05/2022] Open
Abstract
Deciphering 3D genome conformation is important for understanding gene regulation and cellular function at a spatial level. The recent advances of single cell Hi-C technologies have enabled the profiling of the 3D architecture of DNA within individual cell, which allows us to study the cell-to-cell variability of 3D chromatin organization. Computational approaches are in urgent need to comprehensively analyze the sparse and heterogeneous single cell Hi-C data. Here, we proposed scDEC-Hi-C, a new framework for single cell Hi-C analysis with deep generative neural networks. scDEC-Hi-C outperforms existing methods in terms of single cell Hi-C data clustering and imputation. Moreover, the generative power of scDEC-Hi-C could help unveil the differences of chromatin architecture across cell types. We expect that scDEC-Hi-C could shed light on deepening our understanding of the complex mechanism underlying the formation of chromatin contacts.
Collapse
Affiliation(s)
- Qiao Liu
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
| | - Wanwen Zeng
- College of Software, Nankai University, Tianjin 300071, China
| | - Wei Zhang
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Sicheng Wang
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Hongyang Chen
- The Research Center for Intelligent Network, Zhejiang Lab, Hangzhou 311121, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Mu Zhou
- SenseBrain Research, San Jose, CA 95131, USA
| | - Shaoting Zhang
- Shanghai Artificial Intelligence Laboratory, Shanghai 200240, China
| |
Collapse
|
15
|
Zhou Q, Cheng S, Zheng S, Wang Z, Guan P, Zhu Z, Huang X, Zhou C, Li G. ChromLoops: a comprehensive database for specific protein-mediated chromatin loops in diverse organisms. Nucleic Acids Res 2023; 51:D57-D69. [PMID: 36243984 PMCID: PMC9825580 DOI: 10.1093/nar/gkac893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 09/14/2022] [Accepted: 10/03/2022] [Indexed: 01/29/2023] Open
Abstract
Chromatin loops (or chromatin interactions) are important elements of chromatin structures. Disruption of chromatin loops is associated with many diseases, such as cancer and polydactyly. A few methods, including ChIA-PET, HiChIP and PLAC-Seq, have been proposed to detect high-resolution, specific protein-mediated chromatin loops. With rapid progress in 3D genomic research, ChIA-PET, HiChIP and PLAC-Seq datasets continue to accumulate, and effective collection and processing for these datasets are urgently needed. Here, we developed a comprehensive, multispecies and specific protein-mediated chromatin loop database (ChromLoops, https://3dgenomics.hzau.edu.cn/chromloops), which integrated 1030 ChIA-PET, HiChIP and PLAC-Seq datasets from 13 species, and documented 1 491 416 813 high-quality chromatin loops. We annotated genes and regions overlapping with chromatin loop anchors with rich functional annotations, such as regulatory elements (enhancers, super-enhancers and silencers), variations (common SNPs, somatic SNPs and eQTLs), and transcription factor binding sites. Moreover, we identified genes with high-frequency chromatin interactions in the collected species. In particular, we identified genes with high-frequency interactions in cancer samples. We hope that ChromLoops will provide a new platform for studying chromatin interaction regulation in relation to biological processes and disease.
Collapse
Affiliation(s)
- Qiangwei Zhou
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China.,Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Sheng Cheng
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China.,Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Shanshan Zheng
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China.,Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zhenji Wang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China.,Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Pengpeng Guan
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China.,Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zhixian Zhu
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China.,Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Xingyu Huang
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China.,Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Cong Zhou
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China.,Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Guoliang Li
- National Key Laboratory of Crop Genetic Improvement, Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China.,Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Farming for Agricultural Animals, 3D Genomics Research Center, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
16
|
Pang B, van Weerd JH, Hamoen FL, Snyder MP. Identification of non-coding silencer elements and their regulation of gene expression. Nat Rev Mol Cell Biol 2022; 24:383-395. [DOI: 10.1038/s41580-022-00549-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/29/2022] [Indexed: 11/09/2022]
|
17
|
Cross-species enhancer prediction using machine learning. Genomics 2022; 114:110454. [PMID: 36030022 DOI: 10.1016/j.ygeno.2022.110454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 07/28/2022] [Accepted: 08/16/2022] [Indexed: 11/21/2022]
Abstract
Cis-regulatory elements (CREs) are non-coding parts of the genome that play a critical role in gene expression regulation. Enhancers, as an important example of CREs, interact with genes to influence complex traits like disease, heat tolerance and growth rate. Much of what is known about enhancers come from studies of humans and a few model organisms like mouse, with little known about other mammalian species. Previous studies have attempted to identify enhancers in less studied mammals using comparative genomics but with limited success. Recently, Machine Learning (ML) techniques have shown promising results to predict enhancer regions. Here, we investigated the ability of ML methods to identify enhancers in three non-model mammalian species (cattle, pig and dog) using human and mouse enhancer data from VISTA and publicly available ChIP-seq. We tested nine models, using four different representations of the DNA sequences in cross-species prediction using both the VISTA dataset and species-specific ChIP-seq data. We identified between 809,399 and 877,278 enhancer-like regions (ELRs) in the study species (11.6-13.7% of each genome). These predictions were close to the ~8% proportion of ELRs that covered the human genome. We propose that our ML methods have predictive ability for identifying enhancers in non-model mammalian species. We have provided a list of high confidence enhancers at https://github.com/DaviesCentreInformatics/Cross-species-enhancer-prediction and believe these enhancers will be of great use to the community.
Collapse
|
18
|
Liu Q, Hua K, Zhang X, Wong WH, Jiang R. DeepCAGE: Incorporating Transcription Factors in Genome-wide Prediction of Chromatin Accessibility. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:496-507. [PMID: 35293310 PMCID: PMC9801045 DOI: 10.1016/j.gpb.2021.08.015] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 05/31/2021] [Accepted: 09/27/2021] [Indexed: 01/26/2023]
Abstract
Although computational approaches have been complementing high-throughput biological experiments for the identification of functional regions in the human genome, it remains a great challenge to systematically decipher interactions between transcription factors (TFs) and regulatory elements to achieve interpretable annotations of chromatin accessibility across diverse cellular contexts. To solve this problem, we propose DeepCAGE, a deep learning framework that integrates sequence information and binding statuses of TFs, for the accurate prediction of chromatin accessible regions at a genome-wide scale in a variety of cell types. DeepCAGE takes advantage of a densely connected deep convolutional neural network architecture to automatically learn sequence signatures of known chromatin accessible regions and then incorporates such features with expression levels and binding activities of human core TFs to predict novel chromatin accessible regions. In a series of systematic comparisons with existing methods, DeepCAGE exhibits superior performance in not only the classification but also the regression of chromatin accessibility signals. In a detailed analysis of TF activities, DeepCAGE successfully extracts novel binding motifs and measures the contribution of a TF to the regulation with respect to a specific locus in a certain cell type. When applied to whole-genome sequencing data analysis, our method successfully prioritizes putative deleterious variants underlying a human complex trait and thus provides insights into the understanding of disease-associated genetic variants. DeepCAGE can be downloaded from https://github.com/kimmo1019/DeepCAGE.
Collapse
Affiliation(s)
- Qiao Liu
- Ministry of Education Key Laboratory of Bioinformatics; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China,Department of Statistics, Stanford University, Stanford, CA 94305, USA
| | - Kui Hua
- Ministry of Education Key Laboratory of Bioinformatics; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xuegong Zhang
- Ministry of Education Key Laboratory of Bioinformatics; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Wing Hung Wong
- Department of Statistics, Stanford University, Stanford, CA 94305, USA,Corresponding authors.
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China,Corresponding authors.
| |
Collapse
|
19
|
Yang D, Chung T, Kim D. DeepLUCIA: predicting tissue-specific chromatin loops using Deep Learning-based Universal Chromatin Interaction Annotator. Bioinformatics 2022; 38:3501-3512. [PMID: 35640981 DOI: 10.1093/bioinformatics/btac373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 04/17/2022] [Accepted: 05/27/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The importance of chromatin loops in gene regulation is broadly accepted. There are mainly two approaches to predict chromatin loops: transcription factor (TF) binding-dependent approach and genomic variation-based approach. However, neither of these approaches provides an adequate understanding of gene regulation in human tissues. To address this issue, we developed a deep learning-based chromatin loop prediction model called DeepLUCIA (Deep Learning-based Universal Chromatin Interaction Annotator). RESULTS Although DeepLUCIA does not use TF binding profile data which previous TF binding-dependent methods critically rely on, its prediction accuracies are comparable to those of the previous TF binding-dependent methods. More importantly, DeepLUCIA enables the tissue-specific chromatin loop predictions from tissue-specific epigenomes that cannot be handled by genomic variation-based approach. We demonstrated the utility of the DeepLUCIA by predicting several novel target genes of SNPs identified in genome-wide association studies targeting Brugada syndrome, COVID-19 severity, and age-related macular degeneration. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dongchan Yang
- Department of Bio and Brain Engineering, KAIST, Daejeon, 34141, Republic of Korea
| | - Taesu Chung
- Biotechnology & Healthcare Examination Division, KIPO, Daejeon, 35208, Republic of Korea
| | - Dongsup Kim
- Department of Bio and Brain Engineering, KAIST, Daejeon, 34141, Republic of Korea
| |
Collapse
|
20
|
Moon S, Lee H. MOMA: a multi-task attention learning algorithm for multi-omics data interpretation and classification. Bioinformatics 2022; 38:2287-2296. [PMID: 35157023 PMCID: PMC10060719 DOI: 10.1093/bioinformatics/btac080] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 01/01/2022] [Accepted: 02/08/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Accurate diagnostic classification and biological interpretation are important in biology and medicine, which are data-rich sciences. Thus, integration of different data types is necessary for the high predictive accuracy of clinical phenotypes, and more comprehensive analyses for predicting the prognosis of complex diseases are required. RESULTS Here, we propose a novel multi-task attention learning algorithm for multi-omics data, termed MOMA, which captures important biological processes for high diagnostic performance and interpretability. MOMA vectorizes features and modules using a geometric approach and focuses on important modules in multi-omics data via an attention mechanism. Experiments using public data on Alzheimer's disease and cancer with various classification tasks demonstrated the superior performance of this approach. The utility of MOMA was also verified using a comparison experiment with an attention mechanism that was turned on or off and biological analysis. AVAILABILITY AND IMPLEMENTATION The source codes are available at https://github.com/dmcb-gist/MOMA. SUPPLEMENTARY INFORMATION Supplementary materials are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sehwan Moon
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju 61005, South Korea
| | - Hyunju Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, Gwangju 61005, South Korea
| |
Collapse
|
21
|
He N, Wang W, Fang C, Tan Y, Li L, Hou C. Integration of Count Difference and Curve Similarity in Negative Regulatory Element Detection. Front Genet 2022; 13:818344. [PMID: 35251128 PMCID: PMC8896116 DOI: 10.3389/fgene.2022.818344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 01/20/2022] [Indexed: 12/05/2022] Open
Abstract
Negative regulatory elements (NREs) down-regulate gene expression by inhibiting the activities of promoters or enhancers. The repressing activity of NREs can be measured globally by massively parallel reporter assays (MPRAs). However, most existing algorithms are designed for the statistical detection of positively enriched signals in MPRA datasets. To identify reduced signals in MPRA experiments, we designed a NRE identification program, fast-NR, by integrating the count and graphic features of sequenced reads to detect NREs using datasets generated by experiments of self-transcribing active regulatory region sequencing (STARR-seq). Fast-NR identified hundreds of silencers in human K562 cells that can be validated by independent methods.
Collapse
Affiliation(s)
- Na He
- Harbin Institute of Technology, Harbin, China
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
- *Correspondence: Chunhui Hou, ; Na He,
| | - Wenjing Wang
- School of Life Science and State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Hong Kong, Hong Kong SAR, China
| | - Chao Fang
- Cancer Centre, Faculty of Health Sciences, University of Macau, Macao, China
| | - Yongjian Tan
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
| | - Li Li
- Department of Bioinformatics, Huazhong Agricultural University, Wuhan, China
- Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Chunhui Hou
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen, China
- *Correspondence: Chunhui Hou, ; Na He,
| |
Collapse
|
22
|
Chen X, Chen S, Song S, Gao Z, Hou L, Zhang X, Lv H, Jiang R. Cell type annotation of single-cell chromatin accessibility data via supervised Bayesian embedding. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-021-00432-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
23
|
Leveraging cell-type-specific regulatory networks to interpret genetic variants in abdominal aortic aneurysm. Proc Natl Acad Sci U S A 2022; 119:2115601119. [PMID: 34930827 PMCID: PMC8740683 DOI: 10.1073/pnas.2115601119] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/22/2021] [Indexed: 12/17/2022] Open
Abstract
Abdominal aortic aneurysm (AAA) is a common and severe disease with major genetic risk factors. In this study we generated enhancer-promoter contact data to identify regulatory elements in AAA-relevant cell types and identified changes in their predicted chromatin accessibility between AAA patients and controls. We integrated this information with disease-associated variants in regulatory elements and gene bodies to further understand the etiology and pathogenetic mechanisms of AAA. Our study combined whole-genome sequencing data with gene regulatory relations in disease-relevant cell types to reveal the important roles of the interleukin 6 pathway and ERG and KLF regulation in AAA pathogenesis. Abdominal aortic aneurysm (AAA) is a common degenerative cardiovascular disease whose pathobiology is not clearly understood. The cellular heterogeneity and cell-type-specific gene regulation of vascular cells in human AAA have not been well-characterized. Here, we performed analysis of whole-genome sequencing data in AAA patients versus controls with the aim of detecting disease-associated variants that may affect gene regulation in human aortic smooth muscle cells (AoSMC) and human aortic endothelial cells (HAEC), two cell types of high relevance to AAA disease. To support this analysis, we generated H3K27ac HiChIP data for these cell types and inferred cell-type-specific gene regulatory networks. We observed that AAA-associated variants were most enriched in regulatory regions in AoSMC, compared with HAEC and CD4+ cells. The cell-type-specific regulation defined by this HiChIP data supported the importance of ERG and the KLF family of transcription factors in AAA disease. The analysis of regulatory elements that contain noncoding variants and also are differentially open between AAA patients and controls revealed the significance of the interleukin-6-mediated signaling pathway. This finding was further validated by including information from the deleteriousness effect of nonsynonymous single-nucleotide variants in AAA patients and additional control data from the Medical Genome Reference Bank dataset. These results shed important insights into AAA pathogenesis and provide a model for cell-type-specific analysis of disease-associated variants.
Collapse
|
24
|
Gao T, Zheng Z, Pan Y, Zhu C, Wei F, Yuan J, Sun R, Fang S, Wang N, Zhou Y, Qian J. scEnhancer: a single-cell enhancer resource with annotation across hundreds of tissue/cell types in three species. Nucleic Acids Res 2021; 50:D371-D379. [PMID: 34761274 PMCID: PMC8728125 DOI: 10.1093/nar/gkab1032] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Revised: 10/04/2021] [Accepted: 10/19/2021] [Indexed: 12/14/2022] Open
Abstract
Previous studies on enhancers and their target genes were largely based on bulk samples that represent ‘average’ regulatory activities from a large population of millions of cells, masking the heterogeneity and important effects from the sub-populations. In recent years, single-cell sequencing technology has enabled the profiling of open chromatin accessibility at the single-cell level (scATAC-seq), which can be used to annotate the enhancers and promoters in specific cell types. A comprehensive resource is highly desirable for exploring how the enhancers regulate the target genes at the single-cell level. Hence, we designed a single-cell database scEnhancer (http://enhanceratlas.net/scenhancer/), covering 14 527 776 enhancers and 63 658 600 enhancer-gene interactions from 1 196 906 single cells across 775 tissue/cell types in three species. An unsupervised learning method was employed to sort and combine tens or hundreds of single cells in each tissue/cell type to obtain the consensus enhancers. In addition, we utilized a cis-regulatory network algorithm to identify the enhancer-gene connections. Finally, we provided a user-friendly platform with seven useful modules to search, visualize, and browse the enhancers/genes. This database will facilitate the research community towards a functional analysis of enhancers at the single-cell level.
Collapse
Affiliation(s)
- Tianshun Gao
- Big Data Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China.,Scientific Research Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China
| | - Zilong Zheng
- Big Data Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China
| | - Yihang Pan
- Big Data Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China.,Scientific Research Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China
| | - Chengming Zhu
- Scientific Research Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China
| | - Fuxin Wei
- Department of Orthopaedics, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China
| | - Jinqiu Yuan
- Big Data Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China.,Scientific Research Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China
| | - Rui Sun
- Big Data Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China.,Scientific Research Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China
| | - Shuo Fang
- Big Data Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China.,Department of Oncology, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China
| | - Nan Wang
- Scientific Research Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China
| | - Yang Zhou
- Big Data Center, The Seventh Affiliated Hospital of Sun Yat-sen University, Shenzhen 518107, P.R. China
| | - Jiang Qian
- The Wilmer Eye Institute, Johns Hopkins School of Medicine, Baltimore, MD 21231, USA.,The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA
| |
Collapse
|
25
|
Chen S, Liu Q, Cui X, Feng Z, Li C, Wang X, Zhang X, Wang Y, Jiang R. OpenAnnotate: a web server to annotate the chromatin accessibility of genomic regions. Nucleic Acids Res 2021; 49:W483-W490. [PMID: 33999180 PMCID: PMC8262705 DOI: 10.1093/nar/gkab337] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 04/12/2021] [Accepted: 04/20/2021] [Indexed: 12/13/2022] Open
Abstract
Chromatin accessibility, as a powerful marker of active DNA regulatory elements, provides valuable information for understanding regulatory mechanisms. The revolution in high-throughput methods has accumulated massive chromatin accessibility profiles in public repositories. Nevertheless, utilization of these data is hampered by cumbersome collection, time-consuming processing, and manual chromatin accessibility (openness) annotation of genomic regions. To fill this gap, we developed OpenAnnotate (http://health.tsinghua.edu.cn/openannotate/) as the first web server for efficiently annotating openness of massive genomic regions across various biosample types, tissues, and biological systems. In addition to the annotation resource from 2729 comprehensive profiles of 614 biosample types of human and mouse, OpenAnnotate provides user-friendly functionalities, ultra-efficient calculation, real-time browsing, intuitive visualization, and elaborate application notebooks. We show its unique advantages compared to existing databases and toolkits by effectively revealing cell type-specificity, identifying regulatory elements and 3D chromatin contacts, deciphering gene functional relationships, inferring functions of transcription factors, and unprecedentedly promoting single-cell data analyses. We anticipate OpenAnnotate will provide a promising avenue for researchers to construct a more holistic perspective to understand regulatory mechanisms.
Collapse
Affiliation(s)
- Shengquan Chen
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Qiao Liu
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xuejian Cui
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Zhanying Feng
- CEMS, NCMIS, MDIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
| | - Chunquan Li
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Xiaowo Wang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xuegong Zhang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Yong Wang
- CEMS, NCMIS, MDIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Bioinformatics Division at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| |
Collapse
|
26
|
Transcriptional Silencers: Driving Gene Expression with the Brakes On. Trends Genet 2021; 37:514-527. [PMID: 33712326 DOI: 10.1016/j.tig.2021.02.002] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Revised: 02/01/2021] [Accepted: 02/02/2021] [Indexed: 12/15/2022]
Abstract
Silencers are regulatory DNA elements that reduce transcription from their target promoters; they are the repressive counterparts of enhancers. Although discovered decades ago, and despite evidence of their importance in development and disease, silencers have been much less studied than enhancers. Recently, however, a series of papers have reported systematic studies of silencers in various model systems. Silencers are often bifunctional regulatory elements that can also act as enhancers, depending on cellular context, and are enriched for expression quantitative trait loci (eQTLs) and disease-associated variants. There is not yet evidence of a 'silencer chromatin signature', in the distribution of histone modifications or associated proteins, that is common to all silencers; instead, silencers may fall into various subclasses, acting by distinct (and possibly overlapping) mechanisms.
Collapse
|
27
|
Chen S, Gan M, Lv H, Jiang R. DeepCAPE: A Deep Convolutional Neural Network for the Accurate Prediction of Enhancers. GENOMICS PROTEOMICS & BIOINFORMATICS 2021; 19:565-577. [PMID: 33581335 PMCID: PMC9040020 DOI: 10.1016/j.gpb.2019.04.006] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Revised: 03/15/2019] [Accepted: 04/29/2019] [Indexed: 12/12/2022]
Abstract
The establishment of a landscape of enhancers across human cells is crucial to deciphering the mechanism of gene regulation, cell differentiation, and disease development. High-throughput experimental approaches, which contain successfully reported enhancers in typical cell lines, are still too costly and time-consuming to perform systematic identification of enhancers specific to different cell lines. Existing computational methods, capable of predicting regulatory elements purely relying on DNA sequences, lack the power of cell line-specific screening. Recent studies have suggested that chromatin accessibility of a DNA segment is closely related to its potential function in regulation, and thus may provide useful information in identifying regulatory elements. Motivated by the aforementioned understanding, we integrate DNA sequences and chromatin accessibility data to accurately predict enhancers in a cell line-specific manner. We proposed DeepCAPE, a deep convolutional neural network to predict enhancers via the integration of DNA sequences and DNase-seq data. Benefitting from the well-designed feature extraction mechanism and skip connection strategy, our model not only consistently outperforms existing methods in the imbalanced classification of cell line-specific enhancers against background sequences, but also has the ability to self-adapt to different sizes of datasets. Besides, with the adoption of auto-encoder, our model is capable of making cross-cell line predictions. We further visualize kernels of the first convolutional layer and show the match of identified sequence signatures and known motifs. We finally demonstrate the potential ability of our model to explain functional implications of putative disease-associated genetic variants and discriminate disease-related enhancers. The source code and detailed tutorial of DeepCAPE are freely available at https://github.com/ShengquanChen/DeepCAPE.
Collapse
Affiliation(s)
- Shengquan Chen
- MOE Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Mingxin Gan
- Department of Management Science and Engineering, School of Economics and Management, University of Science and Technology Beijing, Beijing 100083, China
| | - Hairong Lv
- MOE Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Rui Jiang
- MOE Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China.
| |
Collapse
|