1
|
Zhong J, O’Brien A, Patel M, Eiser D, Mobaraki M, Collins I, Wang L, Guo K, TruongVo T, Jermusyk A, O’Neill M, Dill CD, Wells AD, Leonard ME, Pippin JA, Grant SF, Zhang T, Andresson T, Connelly KE, Shi J, Arda HE, Hoskins JW, Amundadottir LT. Large-scale multi-omic analysis identifies noncoding somatic driver mutations and nominates ZFP36L2 as a driver gene for pancreatic ductal adenocarcinoma. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.09.22.24314165. [PMID: 39371173 PMCID: PMC11451821 DOI: 10.1101/2024.09.22.24314165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/08/2024]
Abstract
Identification of somatic driver mutations in the noncoding genome remains challenging. To comprehensively characterize noncoding driver mutations for pancreatic ductal adenocarcinoma (PDAC), we first created genome-scale maps of accessible chromatin regions (ACRs) and histone modification marks (HMMs) in pancreatic cell lines and purified pancreatic acinar and duct cells. Integration with whole-genome mutation calls from 506 PDACs revealed 314 ACRs/HMMs significantly enriched with 3,614 noncoding somatic mutations (NCSMs). Functional assessment using massively parallel reporter assays (MPRA) identified 178 NCSMs impacting reporter activity (19.45% of those tested). Focused luciferase validation confirmed negative effects on gene regulatory activity for NCSMs near CDKN2A and ZFP36L2. For the latter, CRISPR interference (CRISPRi) further identified ZFP36L2 as a target gene (16.0 - 24.0% reduced expression, P = 0.023-0.0047) with disrupted KLF9 binding likely mediating the effect. Our integrative approach provides a catalog of potentially functional noncoding driver mutations and nominates ZFP36L2 as a PDAC driver gene.
Collapse
Affiliation(s)
- Jun Zhong
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Aidan O’Brien
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD, USA
- The Patrick G Johnston Centre for Cancer Research, Queen’s University Belfast, Belfast, UK
| | - Minal Patel
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Daina Eiser
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Michael Mobaraki
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Irene Collins
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Li Wang
- Laboratory of Receptor Biology and Gene Expression, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Konnie Guo
- Laboratory of Receptor Biology and Gene Expression, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - ThucNhi TruongVo
- Laboratory of Receptor Biology and Gene Expression, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Ashley Jermusyk
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Maura O’Neill
- Protein Characterization Laboratory, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research Inc., Frederick, MD, USA
| | - Courtney D. Dill
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Andrew D. Wells
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Michelle E. Leonard
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - James A. Pippin
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Struan F.A. Grant
- Division of Human Genetics and Center for Applied Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA; Department of Genetics, Department of Pediatrics, and Institute for Diabetes, Obesity and Metabolism, Perelman School of Medicine, Philadelphia, PA, USA
| | - Tongwu Zhang
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Thorkell Andresson
- Protein Characterization Laboratory, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research Inc., Frederick, MD, USA
| | - Katelyn E. Connelly
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Jianxin Shi
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD, USA
| | - H. Efsun Arda
- Laboratory of Receptor Biology and Gene Expression, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Jason W. Hoskins
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Laufey T. Amundadottir
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD, USA
| |
Collapse
|
2
|
Alicea B, Bastani S, Gordon NK, Crawford-Young S, Gordon R. The Molecular Basis of Differentiation Wave Activity in Embryogenesis. Biosystems 2024; 243:105272. [PMID: 39033973 DOI: 10.1016/j.biosystems.2024.105272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 07/10/2024] [Accepted: 07/11/2024] [Indexed: 07/23/2024]
Abstract
As development varies greatly across the tree of life, it may seem difficult to suggest a model that proposes a single mechanism for understanding collective cell behaviors and the coordination of tissue formation. Here we propose a mechanism called differentiation waves, which unify many disparate results involving developmental systems from across the tree of life. We demonstrate how a relatively simple model of differentiation proceeds not from function-related molecular mechanisms, but from so-called differentiation waves. A phenotypic model of differentiation waves is introduced, and its relation to molecular mechanisms is proposed. These waves contribute to a differentiation tree, which is an alternate way of viewing cell lineage and local action of the molecular factors. We construct a model of differentiation wave-related molecular mechanisms (genome, epigenome, and proteome) based on bioinformatic data from the nematode Caenorhabditis elegans. To validate this approach across different modes of development, we evaluate protein expression across different types of development by comparing Caenorhabditis elegans with several model organisms: fruit flies (Drosophila melanogaster), yeast (Saccharomyces cerevisiae), and mouse (Mus musculus). Inspired by gene regulatory networks, two Models of Interactive Contributions (fully-connected MICs and ordered MICs) are used to suggest potential genomic contributions to differentiation wave-related proteins. This, in turn, provides a framework for understanding differentiation and development.
Collapse
Affiliation(s)
- Bradly Alicea
- Orthogonal Research and Education Lab, Champaign-Urbana, IL, USA; OpenWorm Foundation, Boston, MA, USA; University of Illinois Urbana-Champaign, USA.
| | - Suroush Bastani
- Orthogonal Research and Education Lab, Champaign-Urbana, IL, USA.
| | | | | | - Richard Gordon
- Gulf Specimen Marine Laboratory & Aquarium, Panacea, FL, USA.
| |
Collapse
|
3
|
Perna S, Pinoli P, Ceri S, Wong L. A comparative analysis of ENCODE and Cistrome in the context of TF binding signal. BMC Genomics 2024; 25:817. [PMID: 39210256 PMCID: PMC11363379 DOI: 10.1186/s12864-024-10668-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 07/25/2024] [Indexed: 09/04/2024] Open
Abstract
BACKGROUND With the rise of publicly available genomic data repositories, it is now common for scientists to rely on computational models and preprocessed data, either as control or to discover new knowledge. However, different repositories adhere to the different principles and guidelines, and data processing plays a significant role in the quality of the resulting datasets. Two popular repositories for transcription factor binding sites data - ENCODE and Cistrome - process the same biological samples in alternative ways, and their results are not always consistent. Moreover, the output format of the processing (BED narrowPeak) exposes a feature, the signalValue, which is seldom used in consistency checks, but can offer valuable insight on the quality of the data. RESULTS We provide evidence that data points with high signalValue(s) (top 25% of values) are more likely to be consistent between ENCODE and Cistrome in human cell lines K562, GM12878, and HepG2. In addition, we show that filtering according to said high values improves the quality of predictions for a machine learning algorithm that detects transcription factor interactions based only on positional information. Finally, we provide a set of practices and guidelines, based on the signalValue feature, for scientists who wish to compare and merge narrowPeaks from ENCODE and Cistrome. CONCLUSIONS The signalValue feature is an informative feature that can be effectively used to highlight consistent areas of overlap between different sources of TF binding sites that expose it. Its applicability extends to downstream to positional machine learning algorithms, making it a powerful tool for performance tweaking and data aggregation.
Collapse
Affiliation(s)
- Stefano Perna
- Lee Kong Chian School of Medicine, Nanyang Technological University, 9 Nanyang Drive, 636921, Singapore, Singapore.
| | - Pietro Pinoli
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, 32 Piazza Leonardo da Vinci, 20133, Milano, Italy
| | - Stefano Ceri
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, 32 Piazza Leonardo da Vinci, 20133, Milano, Italy
| | - Limsoon Wong
- School of Computing, National University of Singapore, 13 Computing Drive, 117417, Singapore, Singapore
| |
Collapse
|
4
|
Hudaiberdiev S, Ovcharenko I. Functional characteristics and computational model of abundant hyperactive loci in the human genome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.02.05.527203. [PMID: 36945558 PMCID: PMC10028745 DOI: 10.1101/2023.02.05.527203] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Enhancers and promoters are classically considered to be bound by a small set of TFs in a sequence-specific manner. This assumption has come under increasing skepticism as the datasets of ChIP-seq assays of TFs have expanded. In particular, high-occupancy target (HOT) loci attract hundreds of TFs with often no detectable correlation between ChIP-seq peaks and DNA-binding motif presence. Here, we used a set of 1,003 TF ChIP-seq datasets (HepG2, K562, H1) to analyze the patterns of ChIP-seq peak co-occurrence in combination with functional genomics datasets. We identified 43,891 HOT loci forming at the promoter (53%) and enhancer (47%) regions. HOT promoters regulate housekeeping genes, whereas HOT enhancers are involved in tissue-specific process regulation. HOT loci form the foundation of human super-enhancers and evolve under strong negative selection, with some of these loci being located in ultraconserved regions. Sequence-based classification analysis of HOT loci suggested that their formation is driven by the sequence features, and the density of mapped ChIP-seq peaks across TF-bound loci correlates with sequence features and the expression level of flanking genes. Based on the affinities to bind to promoters and enhancers we detected 5 distinct clusters of TFs that form the core of the HOT loci. We report an abundance of HOT loci in the human genome and a commitment of 51% of all TF ChIP-seq binding events to HOT locus formation thus challenging the classical model of enhancer activity and propose a model of HOT locus formation based on the existence of large transcriptional condensates.
Collapse
Affiliation(s)
- Sanjarbek Hudaiberdiev
- National Institute for Biotechnology and Information, National Library of Medicine, National Institutes of Health. Bethesda, MD
| | - Ivan Ovcharenko
- National Institute for Biotechnology and Information, National Library of Medicine, National Institutes of Health. Bethesda, MD
| |
Collapse
|
5
|
Li Y, Tan M, Akkari-Henić A, Zhang L, Kip M, Sun S, Sepers JJ, Xu N, Ariyurek Y, Kloet SL, Davis RP, Mikkers H, Gruber JJ, Snyder MP, Li X, Pang B. Genome-wide Cas9-mediated screening of essential non-coding regulatory elements via libraries of paired single-guide RNAs. Nat Biomed Eng 2024; 8:890-908. [PMID: 38778183 PMCID: PMC11310080 DOI: 10.1038/s41551-024-01204-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 03/27/2024] [Indexed: 05/25/2024]
Abstract
The functions of non-coding regulatory elements (NCREs), which constitute a major fraction of the human genome, have not been systematically studied. Here we report a method involving libraries of paired single-guide RNAs targeting both ends of an NCRE as a screening system for the Cas9-mediated deletion of thousands of NCREs genome-wide to study their functions in distinct biological contexts. By using K562 and 293T cell lines and human embryonic stem cells, we show that NCREs can have redundant functions, and that many ultra-conserved elements have silencer activity and play essential roles in cell growth and in cellular responses to drugs (notably, the ultra-conserved element PAX6_Tarzan may be critical for heart development, as removing it from human embryonic stem cells led to defects in cardiomyocyte differentiation). The high-throughput screen, which is compatible with single-cell sequencing, may allow for the identification of druggable NCREs.
Collapse
Affiliation(s)
- Yufeng Li
- Department of Cell and Chemical Biology, Leiden University Medical Center, Leiden, the Netherlands
| | - Minkang Tan
- Department of Cell and Chemical Biology, Leiden University Medical Center, Leiden, the Netherlands
| | - Almira Akkari-Henić
- Department of Cell and Chemical Biology, Leiden University Medical Center, Leiden, the Netherlands
| | - Limin Zhang
- Department of Cell and Chemical Biology, Leiden University Medical Center, Leiden, the Netherlands
| | - Maarten Kip
- Department of Cell and Chemical Biology, Leiden University Medical Center, Leiden, the Netherlands
| | - Shengnan Sun
- Department of Cell and Chemical Biology, Leiden University Medical Center, Leiden, the Netherlands
| | - Jorian J Sepers
- Department of Cell and Chemical Biology, Leiden University Medical Center, Leiden, the Netherlands
| | - Ningning Xu
- Department of Cell and Chemical Biology, Leiden University Medical Center, Leiden, the Netherlands
| | - Yavuz Ariyurek
- Leiden Genome Technology Center, Department of Human Genetics, Leiden University Medical Center, Leiden, the Netherlands
| | - Susan L Kloet
- Leiden Genome Technology Center, Department of Human Genetics, Leiden University Medical Center, Leiden, the Netherlands
| | - Richard P Davis
- Department of Anatomy and Embryology, The Novo Nordisk Foundation Center for Stem Cell Medicine (reNEW), Leiden University Medical Center, Leiden, the Netherlands
| | - Harald Mikkers
- Department of Cell and Chemical Biology, Leiden University Medical Center, Leiden, the Netherlands
| | - Joshua J Gruber
- Department of Internal Medicine, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | | | - Xiao Li
- Department of Biochemistry, The Center for RNA Science and Therapeutics, Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH, USA.
| | - Baoxu Pang
- Department of Cell and Chemical Biology, Leiden University Medical Center, Leiden, the Netherlands.
| |
Collapse
|
6
|
Paired CRISPR screening libraries for studying the function of the non-coding genome at scale. Nat Biomed Eng 2024; 8:806-807. [PMID: 38822173 DOI: 10.1038/s41551-024-01215-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2024]
|
7
|
Moyers BA, Partridge EC, Mackiewicz M, Betti MJ, Darji R, Meadows SK, Newberry KM, Brandsmeier LA, Wold BJ, Mendenhall EM, Myers RM. Characterization of human transcription factor function and patterns of gene regulation in HepG2 cells. Genome Res 2023; 33:1879-1892. [PMID: 37852782 PMCID: PMC10760452 DOI: 10.1101/gr.278205.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 10/13/2023] [Indexed: 10/20/2023]
Abstract
Transcription factors (TFs) are trans-acting proteins that bind cis-regulatory elements (CREs) in DNA to control gene expression. Here, we analyzed the genomic localization profiles of 529 sequence-specific TFs and 151 cofactors and chromatin regulators in the human cancer cell line HepG2, for a total of 680 broadly termed DNA-associated proteins (DAPs). We used this deep collection to model each TF's impact on gene expression, and identified a cohort of 26 candidate transcriptional repressors. We examine high occupancy target (HOT) sites in the context of three-dimensional genome organization and show biased motif placement in distal-promoter connections involving HOT sites. We also found a substantial number of closed chromatin regions with multiple DAPs bound, and explored their properties, finding that a MAFF/MAFK TF pair correlates with transcriptional repression. Altogether, these analyses provide novel insights into the regulatory logic of the human cell line HepG2 genome and show the usefulness of large genomic analyses for elucidation of individual TF functions.
Collapse
Affiliation(s)
- Belle A Moyers
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | | | - Mark Mackiewicz
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | - Michael J Betti
- Vanderbilt University Medical Center, Nashville, Tennessee 37232, USA
| | - Roshan Darji
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | - Sarah K Meadows
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA
| | | | | | - Barbara J Wold
- Merkin Institute for Translational Research, California Institute of Technology, Pasadena, California 91125, USA
| | - Eric M Mendenhall
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA;
| | - Richard M Myers
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA;
| |
Collapse
|
8
|
Cascianelli S, Ceddia G, Marchesi A, Masseroli M. Identification of transcription factor high accumulation DNA zones. BMC Bioinformatics 2023; 24:395. [PMID: 37864168 PMCID: PMC10590011 DOI: 10.1186/s12859-023-05528-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 10/10/2023] [Indexed: 10/22/2023] Open
Abstract
BACKGROUND Transcription factors (TF) play a crucial role in the regulation of gene transcription; alterations of their activity and binding to DNA areas are strongly involved in cancer and other disease onset and development. For proper biomedical investigation, it is hence essential to correctly trace TF dense DNA areas, having multiple bindings of distinct factors, and select DNA high occupancy target (HOT) zones, showing the highest accumulation of such bindings. Indeed, systematic and replicable analysis of HOT zones in a large variety of cells and tissues would allow further understanding of their characteristics and could clarify their functional role. RESULTS Here, we propose, thoroughly explain and discuss a full computational procedure to study in-depth DNA dense areas of transcription factor accumulation and identify HOT zones. This methodology, developed as a computationally efficient parametric algorithm implemented in an R/Bioconductor package, uses a systematic approach with two alternative methods to examine transcription factor bindings and provide comparative and fully-reproducible assessments. It offers different resolutions by introducing three distinct types of accumulation, which can analyze DNA from single-base to region-oriented levels, and a moving window, which can estimate the influence of the neighborhood for each DNA base under exam. CONCLUSIONS We quantitatively assessed the full procedure by using our implemented software package, named TFHAZ, in two example applications of biological interest, proving its full reliability and relevance.
Collapse
Affiliation(s)
- Silvia Cascianelli
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, 20133 Milan, Italy
| | - Gaia Ceddia
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
| | - Alberto Marchesi
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, 20133 Milan, Italy
| | - Marco Masseroli
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, 20133 Milan, Italy
| |
Collapse
|
9
|
Zhu I, Landsman D. Clustered and diverse transcription factor binding underlies cell type specificity of enhancers for housekeeping genes. Genome Res 2023; 33:1662-1672. [PMID: 37884340 PMCID: PMC10691539 DOI: 10.1101/gr.278130.123] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 09/12/2023] [Indexed: 10/28/2023]
Abstract
Housekeeping genes are considered to be regulated by common enhancers across different tissues. Here we report that most of the commonly expressed mouse or human genes across different cell types, including more than half of the previously identified housekeeping genes, are associated with cell type-specific enhancers. Furthermore, the binding of most transcription factors (TFs) is cell type-specific. We reason that these cell type specificities are causally related to the collective TF recruitment at regulatory sites, as TFs tend to bind to regions associated with many other TFs and each cell type has a unique repertoire of expressed TFs. Based on binding profiles of hundreds of TFs from HepG2, K562, and GM12878 cells, we show that 80% of all TF peaks overlapping H3K27ac signals are in the top 20,000-23,000 most TF-enriched H3K27ac peak regions, and approximately 12,000-15,000 of these peaks are enhancers (nonpromoters). Those enhancers are mainly cell type-specific and include those linked to the majority of commonly expressed genes. Moreover, we show that the top 15,000 most TF-enriched regulatory sites in HepG2 cells, associated with about 200 TFs, can be predicted largely from the binding profile of as few as 30 TFs. Through motif analysis, we show that major enhancers harbor diverse and clustered motifs from a combination of available TFs uniquely present in each cell type. We propose a mechanism that explains how the highly focused TF binding at regulatory sites results in cell type specificity of enhancers for housekeeping and commonly expressed genes.
Collapse
Affiliation(s)
- Iris Zhu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - David Landsman
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| |
Collapse
|
10
|
Chen J, Higgins MJ, Hu Q, Khoury T, Liu S, Ambrosone CB, Gong Z. DNA methylation differences in noncoding regions in ER negative breast tumors between Black and White women. Front Oncol 2023; 13:1167815. [PMID: 37293596 PMCID: PMC10244512 DOI: 10.3389/fonc.2023.1167815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 05/09/2023] [Indexed: 06/10/2023] Open
Abstract
Introduction Incidence of estrogen receptor (ER)-negative breast cancer, an aggressive tumor subtype associated with worse prognosis, is higher among African American/Black women than other US racial and ethnic groups. The reasons for this disparity remain poorly understood but may be partially explained by differences in the epigenetic landscape. Methods We previously conducted genome-wide DNA methylation profiling of ER- breast tumors from Black and White women and identified a large number of differentially methylated loci (DML) by race. Our initial analysis focused on DML mapping to protein-coding genes. In this study, motivated by increasing appreciation for the biological importance of the non-protein coding genome, we focused on 96 DMLs mapping to intergenic and noncoding RNA regions, using paired Illumina Infinium Human Methylation 450K array and RNA-seq data to assess the relationship between CpG methylation and RNA expression of genes located up to 1Mb away from the CpG site. Results Twenty-three (23) DMLs were significantly correlated with the expression of 36 genes (FDR<0.05), with some DMLs associated with the expression of single gene and others associated with more than one gene. One DML (cg20401567), hypermethylated in ER- tumors from Black versus White women, mapped to a putative enhancer/super-enhancer element located 1.3 Kb downstream of HOXB2. Increased methylation at this CpG correlated with decreased expression of HOXB2 (Rho=-0.74, FDR<0.001) and other HOXB/HOXB-AS genes. Analysis of an independent set of 207 ER- breast cancers from TCGA similarly confirmed hypermethylation at cg20401567 and reduced HOXB2 expression in tumors from Black versus White women (Rho=-0.75, FDR<0.001). Discussion Our findings indicate that epigenetic differences in ER- tumors between Black and White women are linked to altered gene expression and may hold functional significance in breast cancer pathogenesis.
Collapse
Affiliation(s)
- Jianhong Chen
- Department of Cancer Prevention and Control, Roswell Park Comprehensive Cancer Center, Buffalo, NY, United States
| | - Michael J. Higgins
- Department of Molecular and Cellular Biology, Roswell Park Comprehensive Cancer Center, Buffalo, NY, United States
| | - Qiang Hu
- Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, United States
| | - Thaer Khoury
- Department of Pathology & Laboratory Medicine, Roswell Park Comprehensive Cancer Center, Buffalo, NY, United States
| | - Song Liu
- Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, United States
| | - Christine B. Ambrosone
- Department of Cancer Prevention and Control, Roswell Park Comprehensive Cancer Center, Buffalo, NY, United States
| | - Zhihong Gong
- Department of Cancer Prevention and Control, Roswell Park Comprehensive Cancer Center, Buffalo, NY, United States
| |
Collapse
|
11
|
Alakuş TB. A Novel Repetition Frequency-Based DNA Encoding Scheme to Predict Human and Mouse DNA Enhancers with Deep Learning. Biomimetics (Basel) 2023; 8:218. [PMID: 37366813 DOI: 10.3390/biomimetics8020218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 05/18/2023] [Accepted: 05/22/2023] [Indexed: 06/28/2023] Open
Abstract
Recent studies have shown that DNA enhancers have an important role in the regulation of gene expression. They are responsible for different important biological elements and processes such as development, homeostasis, and embryogenesis. However, experimental prediction of these DNA enhancers is time-consuming and costly as it requires laboratory work. Therefore, researchers started to look for alternative ways and started to apply computation-based deep learning algorithms to this field. Yet, the inconsistency and unsuccessful prediction performance of computational-based approaches among various cell lines led to the investigation of these approaches as well. Therefore, in this study, a novel DNA encoding scheme was proposed, and solutions were sought to the problems mentioned and DNA enhancers were predicted with BiLSTM. The study consisted of four different stages for two scenarios. In the first stage, DNA enhancer data were obtained. In the second stage, DNA sequences were converted to numerical representations by both the proposed encoding scheme and various DNA encoding schemes including EIIP, integer number, and atomic number. In the third stage, the BiLSTM model was designed, and the data were classified. In the final stage, the performance of DNA encoding schemes was determined by accuracy, precision, recall, F1-score, CSI, MCC, G-mean, Kappa coefficient, and AUC scores. In the first scenario, it was determined whether the DNA enhancers belonged to humans or mice. As a result of the prediction process, the highest performance was achieved with the proposed DNA encoding scheme, and an accuracy of 92.16% and an AUC score of 0.85 were calculated, respectively. The closest accuracy score to the proposed scheme was obtained with the EIIP DNA encoding scheme and the result was observed as 89.14%. The AUC score of this scheme was measured as 0.87. Among the remaining DNA encoding schemes, the atomic number showed an accuracy score of 86.61%, while this rate decreased to 76.96% with the integer scheme. The AUC values of these schemes were 0.84 and 0.82, respectively. In the second scenario, it was determined whether there was a DNA enhancer and, if so, it was decided to which species this enhancer belonged. In this scenario, the highest accuracy score was obtained with the proposed DNA encoding scheme and the result was 84.59%. Moreover, the AUC score of the proposed scheme was determined as 0.92. EIIP and integer DNA encoding schemes showed accuracy scores of 77.80% and 73.68%, respectively, while their AUC scores were close to 0.90. The most ineffective prediction was performed with the atomic number and the accuracy score of this scheme was calculated as 68.27%. Finally, the AUC score of this scheme was 0.81. At the end of the study, it was observed that the proposed DNA encoding scheme was successful and effective in predicting DNA enhancers.
Collapse
Affiliation(s)
- Talha Burak Alakuş
- Department of Software Engineering, Faculty of Engineering, Kırklareli University, 39100 Kırklareli, Turkey
| |
Collapse
|
12
|
Omar M, Dinalankara W, Mulder L, Coady T, Zanettini C, Imada EL, Younes L, Geman D, Marchionni L. Using biological constraints to improve prediction in precision oncology. iScience 2023; 26:106108. [PMID: 36852282 PMCID: PMC9958363 DOI: 10.1016/j.isci.2023.106108] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 12/20/2022] [Accepted: 01/28/2023] [Indexed: 02/05/2023] Open
Abstract
Many gene signatures have been developed by applying machine learning (ML) on omics profiles, however, their clinical utility is often hindered by limited interpretability and unstable performance. Here, we show the importance of embedding prior biological knowledge in the decision rules yielded by ML approaches to build robust classifiers. We tested this by applying different ML algorithms on gene expression data to predict three difficult cancer phenotypes: bladder cancer progression to muscle-invasive disease, response to neoadjuvant chemotherapy in triple-negative breast cancer, and prostate cancer metastatic progression. We developed two sets of classifiers: mechanistic, by restricting the training to features capturing specific biological mechanisms; and agnostic, in which the training did not use any a priori biological information. Mechanistic models had a similar or better testing performance than their agnostic counterparts, with enhanced interpretability. Our findings support the use of biological constraints to develop robust gene signatures with high translational potential.
Collapse
Affiliation(s)
- Mohamed Omar
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10065, USA
| | - Wikum Dinalankara
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10065, USA
| | - Lotte Mulder
- Technical University Delft, 2628 CD Delft, the Netherlands
| | - Tendai Coady
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10065, USA
| | - Claudio Zanettini
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10065, USA
| | - Eddie Luidy Imada
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10065, USA
| | - Laurent Younes
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Donald Geman
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Luigi Marchionni
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10065, USA
| |
Collapse
|
13
|
Labani M, Beheshti A, Argha A, Alinejad-Rokny H. A Comprehensive Investigation of Genomic Variants in Prostate Cancer Reveals 30 Putative Regulatory Variants. Int J Mol Sci 2023; 24:2472. [PMID: 36768794 PMCID: PMC9916892 DOI: 10.3390/ijms24032472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 01/18/2023] [Accepted: 01/23/2023] [Indexed: 01/31/2023] Open
Abstract
Prostate cancer (PC) is the most frequently diagnosed non-skin cancer in the world. Previous studies have shown that genomic alterations represent the most common mechanism for molecular alterations responsible for the development and progression of PC. This highlights the importance of identifying functional genomic variants for early detection in high-risk PC individuals. Great efforts have been made to identify common protein-coding genetic variations; however, the impact of non-coding variations, including regulatory genetic variants, is not well understood. Identification of these variants and the underlying target genes will be a key step in improving the detection and treatment of PC. To gain an understanding of the functional impact of genetic variants, and in particular, regulatory variants in PC, we developed an integrative pipeline (AGV) that uses whole genome/exome sequences, GWAS SNPs, chromosome conformation capture data, and ChIP-Seq signals to investigate the potential impact of genomic variants on the underlying target genes in PC. We identified 646 putative regulatory variants, of which 30 significantly altered the expression of at least one protein-coding gene. Our analysis of chromatin interactions data (Hi-C) revealed that the 30 putative regulatory variants could affect 131 coding and non-coding genes. Interestingly, our study identified the 131 protein-coding genes that are involved in disease-related pathways, including Reactome and MSigDB, for most of which targeted treatment options are currently available. Notably, our analysis revealed several non-coding RNAs, including RP11-136K7.2 and RAMP2-AS1, as potential enhancer elements of the protein-coding genes CDH12 and EZH1, respectively. Our results provide a comprehensive map of genomic variants in PC and reveal their potential contribution to prostate cancer progression and development.
Collapse
Affiliation(s)
- Mahdieh Labani
- BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, NSW 2052, Australia
- Data Analytic Lab, Department of Computing, Macquarie University, Sydney, NSW 2109, Australia
| | - Amin Beheshti
- Data Analytic Lab, Department of Computing, Macquarie University, Sydney, NSW 2109, Australia
| | - Ahmadreza Argha
- The Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, NSW 2052, Australia
| | - Hamid Alinejad-Rokny
- BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, NSW 2052, Australia
- UNSW Data Science Hub, The University of New South Wales, Sydney, NSW 2052, Australia
- Health Data Analytics Program, Centre for Applied AI, Macquarie University, Sydney, NSW 2109, Australia
| |
Collapse
|
14
|
Cappelletti L, Petrini A, Gliozzo J, Casiraghi E, Schubach M, Kircher M, Valentini G. Boosting tissue-specific prediction of active cis-regulatory regions through deep learning and Bayesian optimization techniques. BMC Bioinformatics 2022; 23:154. [PMID: 36510125 PMCID: PMC9743524 DOI: 10.1186/s12859-022-04582-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 01/20/2022] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Cis-regulatory regions (CRRs) are non-coding regions of the DNA that fine control the spatio-temporal pattern of transcription; they are involved in a wide range of pivotal processes such as the development of specific cell-lines/tissues and the dynamic cell response to physiological stimuli. Recent studies showed that genetic variants occurring in CRRs are strongly correlated with pathogenicity or deleteriousness. Considering the central role of CRRs in the regulation of physiological and pathological conditions, the correct identification of CRRs and of their tissue-specific activity status through Machine Learning methods plays a major role in dissecting the impact of genetic variants on human diseases. Unfortunately, the problem is still open, though some promising results have been already reported by (deep) machine-learning based methods that predict active promoters and enhancers in specific tissues or cell lines by encoding epigenetic or spectral features directly extracted from DNA sequences. RESULTS We present the experiments we performed to compare two Deep Neural Networks, a Feed-Forward Neural Network model working on epigenomic features, and a Convolutional Neural Network model working only on genomic sequence, targeted to the identification of enhancer- and promoter-activity in specific cell lines. While performing experiments to understand how the experimental setup influences the prediction performance of the methods, we particularly focused on (1) automatic model selection performed by Bayesian optimization and (2) exploring different data rebalancing setups for reducing negative unbalancing effects. CONCLUSIONS Results show that (1) automatic model selection by Bayesian optimization improves the quality of the learner; (2) data rebalancing considerably impacts the prediction performance of the models; test set rebalancing may provide over-optimistic results, and should therefore be cautiously applied; (3) despite working on sequence data, convolutional models obtain performance close to those of feed forward models working on epigenomic information, which suggests that also sequence data carries informative content for CRR-activity prediction. We therefore suggest combining both models/data types in future works.
Collapse
Affiliation(s)
- Luca Cappelletti
- grid.4708.b0000 0004 1757 2822AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Milan, Italy
| | - Alessandro Petrini
- grid.4708.b0000 0004 1757 2822AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Milan, Italy
| | - Jessica Gliozzo
- grid.4708.b0000 0004 1757 2822AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Milan, Italy
| | - Elena Casiraghi
- grid.4708.b0000 0004 1757 2822AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Milan, Italy
| | - Max Schubach
- grid.6363.00000 0001 2218 4662Berlin Institute of Health at Charité, Universitätsmedizin Berlin, Berlin, Germany
| | - Martin Kircher
- grid.6363.00000 0001 2218 4662Berlin Institute of Health at Charité, Universitätsmedizin Berlin, Berlin, Germany
| | - Giorgio Valentini
- grid.4708.b0000 0004 1757 2822AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, Milan, Italy ,European Laboratory for Learning and Intelligent Systems (ELLIS), Berlin, Germany ,CINI National Laboratory of Artificial Intelligence and Intelligent Systems (AIIS), Rome, Italy ,grid.4708.b0000 0004 1757 2822Data Science Research Center, Università degli Studi di Milano, Milan, Italy
| |
Collapse
|
15
|
Giacoman-Lozano M, Meléndez-Ramírez C, Martinez-Ledesma E, Cuevas-Diaz Duran R, Velasco I. Epigenetics of neural differentiation: Spotlight on enhancers. Front Cell Dev Biol 2022; 10:1001701. [PMID: 36313573 PMCID: PMC9606577 DOI: 10.3389/fcell.2022.1001701] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Accepted: 10/03/2022] [Indexed: 11/28/2022] Open
Abstract
Neural induction, both in vivo and in vitro, includes cellular and molecular changes that result in phenotypic specialization related to specific transcriptional patterns. These changes are achieved through the implementation of complex gene regulatory networks. Furthermore, these regulatory networks are influenced by epigenetic mechanisms that drive cell heterogeneity and cell-type specificity, in a controlled and complex manner. Epigenetic marks, such as DNA methylation and histone residue modifications, are highly dynamic and stage-specific during neurogenesis. Genome-wide assessment of these modifications has allowed the identification of distinct non-coding regulatory regions involved in neural cell differentiation, maturation, and plasticity. Enhancers are short DNA regulatory regions that bind transcription factors (TFs) and interact with gene promoters to increase transcriptional activity. They are of special interest in neuroscience because they are enriched in neurons and underlie the cell-type-specificity and dynamic gene expression profiles. Classification of the full epigenomic landscape of neural subtypes is important to better understand gene regulation in brain health and during diseases. Advances in novel next-generation high-throughput sequencing technologies, genome editing, Genome-wide association studies (GWAS), stem cell differentiation, and brain organoids are allowing researchers to study brain development and neurodegenerative diseases with an unprecedented resolution. Herein, we describe important epigenetic mechanisms related to neurogenesis in mammals. We focus on the potential roles of neural enhancers in neurogenesis, cell-fate commitment, and neuronal plasticity. We review recent findings on epigenetic regulatory mechanisms involved in neurogenesis and discuss how sequence variations within enhancers may be associated with genetic risk for neurological and psychiatric disorders.
Collapse
Affiliation(s)
- Mayela Giacoman-Lozano
- Tecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Monterrey, NL, Mexico
| | - César Meléndez-Ramírez
- Instituto de Fisiología Celular—Neurociencias, Universidad Nacional Autónoma de Mexico, Mexico City, Mexico
- Laboratorio de Reprogramación Celular, Instituto Nacional de Neurología y Neurocirugía “Manuel Velasco Suárez”, Mexico City, Mexico
| | - Emmanuel Martinez-Ledesma
- Tecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Monterrey, NL, Mexico
- Tecnologico de Monterrey, The Institute for Obesity Research, Monterrey, NL, Mexico
| | - Raquel Cuevas-Diaz Duran
- Tecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Monterrey, NL, Mexico
- *Correspondence: Raquel Cuevas-Diaz Duran, ; Iván Velasco,
| | - Iván Velasco
- Instituto de Fisiología Celular—Neurociencias, Universidad Nacional Autónoma de Mexico, Mexico City, Mexico
- Laboratorio de Reprogramación Celular, Instituto Nacional de Neurología y Neurocirugía “Manuel Velasco Suárez”, Mexico City, Mexico
- *Correspondence: Raquel Cuevas-Diaz Duran, ; Iván Velasco,
| |
Collapse
|
16
|
Regulation associated modules reflect 3D genome modularity associated with chromatin activity. Nat Commun 2022; 13:5281. [PMID: 36075900 PMCID: PMC9458634 DOI: 10.1038/s41467-022-32911-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Accepted: 08/19/2022] [Indexed: 12/02/2022] Open
Abstract
The 3D genome has been shown to be organized into modules including topologically associating domains (TADs) and compartments that are primarily defined by spatial contacts from Hi-C. There exists a gap to investigate whether and how the spatial modularity of the chromatin is related to the functional modularity resulting from chromatin activity. Despite histone modifications reflecting chromatin activity, inferring spatial modularity of the genome directly from the histone modification patterns has not been well explored. Here, we report that histone modifications show a modular pattern (referred to as regulation associated modules, RAMs) that reflects spatial chromatin modularity. Enhancer-promoter interactions, loop anchors, super-enhancer clusters and extrachromosomal DNAs (ecDNAs) are found to occur more often within the same RAMs than within the same TADs. Consistently, compared to the TAD boundaries, deletions of RAM boundaries perturb the chromatin structure more severely (may even cause cell death) and somatic variants in cancer samples are more enriched in RAM boundaries. These observations suggest that RAMs reflect a modular organization of the 3D genome at a scale better aligned with chromatin activity, providing a bridge connecting the structural and functional modularity of the genome.
Collapse
|
17
|
Marchal C, Defossez PA, Miotto B. Context-dependent CpG methylation directs cell-specific binding of transcription factor ZBTB38. Epigenetics 2022; 17:2122-2143. [PMID: 36000449 DOI: 10.1080/15592294.2022.2111135] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
DNA methylation on CpGs regulates transcription in mammals, both by decreasing the binding of methylation-repelled factors and by increasing the binding of methylation-attracted factors. Among the latter, zinc finger proteins have the potential to bind methylated CpGs in a sequence-specific context. The protein ZBTB38 is unique in that it has two independent sets of zinc fingers, which recognize two different methylated consensus sequences in vitro. Here, we identify the binding sites of ZBTB38 in a human cell line, and show that they contain the two methylated consensus sequences identified in vitro. In addition, we show that the distribution of ZBTB38 sites is highly unusual: while 10% of the ZBTB38 sites are also bound by CTCF, the other 90% of sites reside in closed chromatin and are not bound by any of the other factors mapped in our model cell line. Finally, a third of ZBTB38 sites are found upstream of long and active CpG islands. Our work therefore validates ZBTB38 as a methyl-DNA binder in vivo and identifies its unique distribution in the genome.
Collapse
Affiliation(s)
- Claire Marchal
- Université Paris Cité, Institut Cochin, INSERM, CNRS, Paris, France
| | | | - Benoit Miotto
- Université Paris Cité, Institut Cochin, INSERM, CNRS, Paris, France
| |
Collapse
|
18
|
Salama SR. The Complexity of the Mammalian Transcriptome. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1363:11-22. [PMID: 35220563 DOI: 10.1007/978-3-030-92034-0_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Draft genome assemblies for multiple mammalian species combined with new technologies to map transcripts from diverse RNA samples to these genomes developed in the early 2000s revealed that the mammalian transcriptome was vastly larger and more complex than previously anticipated. Efforts to comprehensively catalog the identity and features of transcripts present in a variety of species, tissues and cell lines revealed that a large fraction of the mammalian genome is transcribed in at least some settings. A large number of these transcripts encode long non-coding RNAs (lncRNAs). Many lncRNAs overlap or are anti-sense to protein coding genes and others overlap small RNAs. However, a large number are independent of any previously known mRNA or small RNA. While the functions of a majority of these lncRNAs are unknown, many appear to play roles in gene regulation. Many lncRNAs have species-specific and cell type specific expression patterns and their evolutionary origins are varied. While technological challenges have hindered getting a full picture of the diversity and transcript structure of all of the transcripts arising from lncRNA loci, new technologies including single molecule nanopore sequencing and single cell RNA sequencing promise to generate a comprehensive picture of the mammalian transcriptome.
Collapse
Affiliation(s)
- Sofie R Salama
- UC Santa Cruz Genomics Institute, Department of Biomolecular Engineering and Howard Hughes Medical Institute, University of California, Santa Cruz, Santa Cruz, CA, USA.
| |
Collapse
|
19
|
Ding B, Liu Y, Liu Z, Zheng L, Xu P, Chen Z, Wu P, Zhao Y, Pan Q, Guo Y, Wei W, Wang W. Noncoding loci without epigenomic signals can be essential for maintaining global chromatin organization and cell viability. SCIENCE ADVANCES 2021; 7:eabi6020. [PMID: 34731001 PMCID: PMC8565911 DOI: 10.1126/sciadv.abi6020] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Accepted: 09/15/2021] [Indexed: 06/13/2023]
Abstract
Most noncoding regions of the human genome do not harbor any annotated element and are even not marked with any epigenomic or protein binding signal. However, an overlooked aspect of their possible role in stabilizing 3D chromatin organization has not been extensively studied. To illuminate their structural importance, we started with the noncoding regions forming many 3D contacts (referred to as hubs) and performed a CRISPR library screening to identify dozens of hubs essential for cell viability. Hi-C and single-cell transcriptomic analyses showed that their deletion could significantly alter chromatin organization and affect the expressions of distal genes. This study revealed the 3D structural importance of noncoding loci that are not associated with any functional element, providing a previously unknown mechanistic understanding of disease-associated genetic variations (GVs). Furthermore, our analyses also suggest a possible approach to develop therapeutics targeting disease-specific noncoding regions that are critical for disease cell survival.
Collapse
Affiliation(s)
- Bo Ding
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, CA 92093-0359, USA
| | - Ying Liu
- Biomedical Pioneering Innovation Center, Beijing Advanced Innovation Center for Genomics, Peking-Tsinghua Center for Life Sciences, Peking University Genome Editing Research Center, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing, China
| | - Zhiheng Liu
- Biomedical Pioneering Innovation Center, Beijing Advanced Innovation Center for Genomics, Peking-Tsinghua Center for Life Sciences, Peking University Genome Editing Research Center, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing, China
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Lina Zheng
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093-0359, USA
| | - Ping Xu
- Biomedical Pioneering Innovation Center, Beijing Advanced Innovation Center for Genomics, Peking-Tsinghua Center for Life Sciences, Peking University Genome Editing Research Center, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing, China
| | - Zhao Chen
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, CA 92093-0359, USA
| | - Peiyao Wu
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, CA 92093-0359, USA
| | - Ying Zhao
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, CA 92093-0359, USA
| | - Qian Pan
- Biomedical Pioneering Innovation Center, Beijing Advanced Innovation Center for Genomics, Peking-Tsinghua Center for Life Sciences, Peking University Genome Editing Research Center, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing, China
| | - Yu Guo
- Biomedical Pioneering Innovation Center, Beijing Advanced Innovation Center for Genomics, Peking-Tsinghua Center for Life Sciences, Peking University Genome Editing Research Center, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing, China
| | - Wensheng Wei
- Biomedical Pioneering Innovation Center, Beijing Advanced Innovation Center for Genomics, Peking-Tsinghua Center for Life Sciences, Peking University Genome Editing Research Center, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing, China
| | - Wei Wang
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, CA 92093-0359, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093-0359, USA
- Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA 92093-0359, USA
| |
Collapse
|
20
|
|
21
|
Ni P, Su Z. Accurate prediction of cis-regulatory modules reveals a prevalent regulatory genome of humans. NAR Genom Bioinform 2021; 3:lqab052. [PMID: 34159315 PMCID: PMC8210889 DOI: 10.1093/nargab/lqab052] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 05/01/2021] [Accepted: 06/14/2021] [Indexed: 02/07/2023] Open
Abstract
cis-regulatory modules(CRMs) formed by clusters of transcription factor (TF) binding sites (TFBSs) are as important as coding sequences in specifying phenotypes of humans. It is essential to categorize all CRMs and constituent TFBSs in the genome. In contrast to most existing methods that predict CRMs in specific cell types using epigenetic marks, we predict a largely cell type agonistic but more comprehensive map of CRMs and constituent TFBSs in the gnome by integrating all available TF ChIP-seq datasets. Our method is able to partition 77.47% of genome regions covered by available 6092 datasets into a CRM candidate (CRMC) set (56.84%) and a non-CRMC set (43.16%). Intriguingly, the predicted CRMCs are under strong evolutionary constraints, while the non-CRMCs are largely selectively neutral, strongly suggesting that the CRMCs are likely cis-regulatory, while the non-CRMCs are not. Our predicted CRMs are under stronger evolutionary constraints than three state-of-the-art predictions (GeneHancer, EnhancerAtlas and ENCODE phase 3) and substantially outperform them for recalling VISTA enhancers and non-coding ClinVar variants. We estimated that the human genome might encode about 1.47M CRMs and 68M TFBSs, comprising about 55% and 22% of the genome, respectively; for both of which, we predicted 80%. Therefore, the cis-regulatory genome appears to be more prevalent than originally thought.
Collapse
Affiliation(s)
- Pengyu Ni
- Department of Bioinformatics and Genomics, the University of North Carolina at Charlotte, 9201 University City Boulevard, Charlotte, NC 28223, USA
| | - Zhengchang Su
- Department of Bioinformatics and Genomics, the University of North Carolina at Charlotte, 9201 University City Boulevard, Charlotte, NC 28223, USA
| |
Collapse
|
22
|
Spiegel J, Cuesta SM, Adhikari S, Hänsel-Hertsch R, Tannahill D, Balasubramanian S. G-quadruplexes are transcription factor binding hubs in human chromatin. Genome Biol 2021; 22:117. [PMID: 33892767 PMCID: PMC8063395 DOI: 10.1186/s13059-021-02324-z] [Citation(s) in RCA: 120] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 03/24/2021] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND The binding of transcription factors (TF) to genomic targets is critical in the regulation of gene expression. Short, double-stranded DNA sequence motifs are routinely implicated in TF recruitment, but many questions remain on how binding site specificity is governed. RESULTS Herein, we reveal a previously unappreciated role for DNA secondary structures as key features for TF recruitment. In a systematic, genome-wide study, we discover that endogenous G-quadruplex secondary structures (G4s) are prevalent TF binding sites in human chromatin. Certain TFs bind G4s with affinities comparable to double-stranded DNA targets. We demonstrate that, in a chromatin context, this binding interaction is competed out with a small molecule. Notably, endogenous G4s are prominent binding sites for a large number of TFs, particularly at promoters of highly expressed genes. CONCLUSIONS Our results reveal a novel non-canonical mechanism for TF binding whereby G4s operate as common binding hubs for many different TFs to promote increased transcription.
Collapse
Affiliation(s)
- Jochen Spiegel
- Cancer Research UK Cambridge Institute, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK
| | - Sergio Martínez Cuesta
- Cancer Research UK Cambridge Institute, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK
- Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK
- Present Address: Data Sciences and Quantitative Biology, Discovery Sciences, AstraZeneca, Cambridge, UK
| | - Santosh Adhikari
- Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK
| | - Robert Hänsel-Hertsch
- Cancer Research UK Cambridge Institute, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK
- Present Address: Center for Molecular Medicine Cologne, University of Cologne, 50931, Cologne, Germany
| | - David Tannahill
- Cancer Research UK Cambridge Institute, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK
| | - Shankar Balasubramanian
- Cancer Research UK Cambridge Institute, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK.
- Department of Chemistry, University of Cambridge, Cambridge, CB2 1EW, UK.
- School of Clinical Medicine, University of Cambridge, Cambridge, CB2 0SP, UK.
| |
Collapse
|
23
|
Seo J, Koçak DD, Bartelt LC, Williams CA, Barrera A, Gersbach CA, Reddy TE. AP-1 subunits converge promiscuously at enhancers to potentiate transcription. Genome Res 2021; 31:538-550. [PMID: 33674350 PMCID: PMC8015846 DOI: 10.1101/gr.267898.120] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Accepted: 02/17/2021] [Indexed: 12/12/2022]
Abstract
The AP-1 transcription factor (TF) dimer contributes to many biological processes and environmental responses. AP-1 can be composed of many interchangeable subunits. Unambiguously determining the binding locations of these subunits in the human genome is challenging because of variable antibody specificity and affinity. Here, we definitively establish the genome-wide binding patterns of five AP-1 subunits by using CRISPR to introduce a common antibody tag on each subunit. We find limited evidence for strong dimerization preferences between subunits at steady state and find that, under a stimulus, dimerization patterns reflect changes in the transcriptome. Further, our analysis suggests that canonical AP-1 motifs indiscriminately recruit all AP-1 subunits to genomic sites, which we term AP-1 hotspots. We find that AP-1 hotspots are predictive of cell type–specific gene expression and of genomic responses to glucocorticoid signaling (more so than super-enhancers) and are significantly enriched in disease-associated genetic variants. Together, these results support a model where promiscuous binding of many AP-1 subunits to the same genomic location play a key role in regulating cell type–specific gene expression and environmental responses.
Collapse
Affiliation(s)
- Jungkyun Seo
- Department of Biostatistics and Bioinformatics, Division of Integrative Genomics, Duke University Medical Center, Durham, North Carolina 27708, USA.,Computational Biology and Bioinformatics Graduate Program, Duke University, Durham, North Carolina 27708, USA.,Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA.,Center for Advanced Genomic Technologies, Duke University, Durham, North Carolina 27708, USA
| | - D Dewran Koçak
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA.,Center for Advanced Genomic Technologies, Duke University, Durham, North Carolina 27708, USA.,Department of Biomedical Engineering, Duke University, Durham, North Carolina 27708, USA
| | - Luke C Bartelt
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA.,University Program in Genetics and Genomics, Duke University, Durham, North Carolina 27708, USA
| | - Courtney A Williams
- Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA.,Center for Advanced Genomic Technologies, Duke University, Durham, North Carolina 27708, USA
| | - Alejandro Barrera
- Department of Biostatistics and Bioinformatics, Division of Integrative Genomics, Duke University Medical Center, Durham, North Carolina 27708, USA.,Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA.,Center for Advanced Genomic Technologies, Duke University, Durham, North Carolina 27708, USA
| | - Charles A Gersbach
- Computational Biology and Bioinformatics Graduate Program, Duke University, Durham, North Carolina 27708, USA.,Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA.,Center for Advanced Genomic Technologies, Duke University, Durham, North Carolina 27708, USA.,Department of Biomedical Engineering, Duke University, Durham, North Carolina 27708, USA.,University Program in Genetics and Genomics, Duke University, Durham, North Carolina 27708, USA.,Department of Surgery, Duke University Medical Center, Durham, North Carolina 27708, USA
| | - Timothy E Reddy
- Department of Biostatistics and Bioinformatics, Division of Integrative Genomics, Duke University Medical Center, Durham, North Carolina 27708, USA.,Computational Biology and Bioinformatics Graduate Program, Duke University, Durham, North Carolina 27708, USA.,Center for Genomic and Computational Biology, Duke University, Durham, North Carolina 27708, USA.,Center for Advanced Genomic Technologies, Duke University, Durham, North Carolina 27708, USA.,Department of Biomedical Engineering, Duke University, Durham, North Carolina 27708, USA.,University Program in Genetics and Genomics, Duke University, Durham, North Carolina 27708, USA.,Department of Molecular Genetics and Microbiology, Duke University, Durham, North Carolina 27708, USA
| |
Collapse
|
24
|
Ahmed MM, Tazyeen S, Alam A, Farooqui A, Ali R, Imam N, Tamkeen N, Ali S, Malik MZ, Ishrat R. Deciphering key genes in cardio-renal syndrome using network analysis. Bioinformation 2021; 17:86-100. [PMID: 34393423 PMCID: PMC8340714 DOI: 10.6026/97320630017086] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Revised: 12/31/2020] [Accepted: 01/26/2021] [Indexed: 12/23/2022] Open
Abstract
Cardio-renal syndrome (CRS) is a rapidly recognized clinical entity which refers to the inextricably connection between heart and renal impairment, whereby abnormality to one organ directly promotes deterioration of the other one. Biological markers help to gain insight into the pathological processes for early diagnosis with higher accuracy of CRS using known clinical findings. Therefore, it is of interest to identify target genes in associated pathways implicated linked to CRS. Hence, 119 CRS genes were extracted from the literature to construct the PPIN network. We used the MCODE tool to generate modules from network so as to select the top 10 modules from 23 available modules. The modules were further analyzed to identify 12 essential genes in the network. These biomarkers are potential emerging tools for understanding the pathophysiologic mechanisms for the early diagnosis of CRS. Ontological analysis shows that they are rich in MF protease binding and endo-peptidase inhibitor activity. Thus, this data help increase our knowledge on CRS to improve clinical management of the disease.
Collapse
Affiliation(s)
- Mohd Murshad Ahmed
- Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi-110025, India
| | - Safia Tazyeen
- Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi-110025, India
| | - Aftab Alam
- Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi-110025, India
| | - Anam Farooqui
- Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi-110025, India
| | - Rafat Ali
- Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi-110025, India
| | - Nikhat Imam
- Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi-110025, India
| | - Naaila Tamkeen
- Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi-110025, India
| | - Shahnawaz Ali
- Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi-110025, India
| | - Md Zubbair Malik
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi-1100067, India
| | - Romana Ishrat
- Centre for Interdisciplinary Research in Basic Sciences, Jamia Millia Islamia, New Delhi-110025, India
| |
Collapse
|
25
|
Supervised enhancer prediction with epigenetic pattern recognition and targeted validation. Nat Methods 2020; 17:807-814. [PMID: 32737473 PMCID: PMC8073243 DOI: 10.1038/s41592-020-0907-8] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2017] [Accepted: 06/18/2020] [Indexed: 12/20/2022]
Abstract
Enhancers are important noncoding elements, but they have been traditionally hard to characterize experimentally. The development of massively parallel assays allows the characterization of large numbers of enhancers for the first time. Here, we developed a framework using Drosophila STARR-seq to create shape-matching filters based on meta-profiles of epigenetic features. We integrated these features with supervised machine-learning algorithms to predict enhancers. We further demonstrated our model could be transferred to predict enhancers in mammals. We comprehensively validated the predictions using a combination of in vivo and in vitro approaches, involving transgenic assays in mouse and transduction-based reporter assays in human cell lines (153 enhancers in total). The results confirmed our model can accurately predict enhancers in different species without re-parameterization. Finally, we examined the transcription-factor binding patterns at predicted enhancers versus promoters. We demonstrated that these patterns enable the construction of a secondary model effectively discriminating between enhancers and promoters.
Collapse
|
26
|
Abel HJ, Larson DE, Regier AA, Chiang C, Das I, Kanchi KL, Layer RM, Neale BM, Salerno WJ, Reeves C, Buyske S, Matise TC, Muzny DM, Zody MC, Lander ES, Dutcher SK, Stitziel NO, Hall IM. Mapping and characterization of structural variation in 17,795 human genomes. Nature 2020; 583:83-89. [PMID: 32460305 PMCID: PMC7547914 DOI: 10.1038/s41586-020-2371-0] [Citation(s) in RCA: 160] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2018] [Accepted: 05/18/2020] [Indexed: 12/18/2022]
Abstract
A key goal of whole-genome sequencing for studies of human genetics is to interrogate all forms of variation, including single-nucleotide variants, small insertion or deletion (indel) variants and structural variants. However, tools and resources for the study of structural variants have lagged behind those for smaller variants. Here we used a scalable pipeline1 to map and characterize structural variants in 17,795 deeply sequenced human genomes. We publicly release site-frequency data to create the largest, to our knowledge, whole-genome-sequencing-based structural variant resource so far. On average, individuals carry 2.9 rare structural variants that alter coding regions; these variants affect the dosage or structure of 4.2 genes and account for 4.0-11.2% of rare high-impact coding alleles. Using a computational model, we estimate that structural variants account for 17.2% of rare alleles genome-wide, with predicted deleterious effects that are equivalent to loss-of-function coding alleles; approximately 90% of such structural variants are noncoding deletions (mean 19.1 per genome). We report 158,991 ultra-rare structural variants and show that 2% of individuals carry ultra-rare megabase-scale structural variants, nearly half of which are balanced or complex rearrangements. Finally, we infer the dosage sensitivity of genes and noncoding elements, and reveal trends that relate to element class and conservation. This work will help to guide the analysis and interpretation of structural variants in the era of whole-genome sequencing.
Collapse
Affiliation(s)
- Haley J Abel
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St Louis, MO, USA
| | - David E Larson
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St Louis, MO, USA
| | - Allison A Regier
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
- Department of Medicine, Washington University School of Medicine, St Louis, MO, USA
| | - Colby Chiang
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
| | - Indraniel Das
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
| | - Krishna L Kanchi
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
| | - Ryan M Layer
- BioFrontiers Institute, University of Colorado, Boulder, CO, USA
- Department of Computer Science, University of Colorado, Boulder, CO, USA
| | - Benjamin M Neale
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - William J Salerno
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | | | - Steven Buyske
- Department of Statistics, Rutgers University, Piscataway, NJ, USA
| | - Tara C Matise
- Department of Genetics, Rutgers University, Piscataway, NJ, USA
| | - Donna M Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | | | - Eric S Lander
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Susan K Dutcher
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St Louis, MO, USA
| | - Nathan O Stitziel
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St Louis, MO, USA
- Department of Medicine, Washington University School of Medicine, St Louis, MO, USA
| | - Ira M Hall
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, MO, USA.
- Department of Genetics, Washington University School of Medicine, St Louis, MO, USA.
- Department of Medicine, Washington University School of Medicine, St Louis, MO, USA.
| |
Collapse
|
27
|
IGAP-integrative genome analysis pipeline reveals new gene regulatory model associated with nonspecific TF-DNA binding affinity. Comput Struct Biotechnol J 2020; 18:1270-1286. [PMID: 32612751 PMCID: PMC7303559 DOI: 10.1016/j.csbj.2020.05.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 05/17/2020] [Accepted: 05/19/2020] [Indexed: 11/23/2022] Open
Abstract
The human genome is regulated in a multi-dimensional way. While biophysical factors like Non-specific Transcription factor Binding Affinity (nTBA) act at DNA sequence level, other factors act above sequence levels such as histone modifications and 3-D chromosomal interactions. This multidimensionality of regulation requires many of these factors for a proper understanding of the regulatory landscape of the human genome. Here, we propose a new biophysical model for estimating nTBA. Integration of nTBA with chromatin modifications and chromosomal interactions, using a new Integrative Genome Analysis Pipeline (IGAP), reveals additive effects of nTBA to regulatory DNA sequences and identifies three types of genomic zones in the human genome (Inactive Genomic Zones, Poised Genomic Zones, and Active Genomic Zones). It also unveils a novel long distance gene regulatory model: chromosomal interactions reduce the physical distance between the high occupancy target (HOT) regions that results in high nTBA to DNA in the area, which in turn attract TFs to such regions with higher binding potential. These findings will help to elucidate the three-dimensional diffusion process that TFs use during their search for the right targets.
Collapse
|
28
|
Abstract
A key goal of whole-genome sequencing for studies of human genetics is to interrogate all forms of variation, including single-nucleotide variants, small insertion or deletion (indel) variants and structural variants. However, tools and resources for the study of structural variants have lagged behind those for smaller variants. Here we used a scalable pipeline1 to map and characterize structural variants in 17,795 deeply sequenced human genomes. We publicly release site-frequency data to create the largest, to our knowledge, whole-genome-sequencing-based structural variant resource so far. On average, individuals carry 2.9 rare structural variants that alter coding regions; these variants affect the dosage or structure of 4.2 genes and account for 4.0-11.2% of rare high-impact coding alleles. Using a computational model, we estimate that structural variants account for 17.2% of rare alleles genome-wide, with predicted deleterious effects that are equivalent to loss-of-function coding alleles; approximately 90% of such structural variants are noncoding deletions (mean 19.1 per genome). We report 158,991 ultra-rare structural variants and show that 2% of individuals carry ultra-rare megabase-scale structural variants, nearly half of which are balanced or complex rearrangements. Finally, we infer the dosage sensitivity of genes and noncoding elements, and reveal trends that relate to element class and conservation. This work will help to guide the analysis and interpretation of structural variants in the era of whole-genome sequencing.
Collapse
|
29
|
Xu D, Gokcumen O, Khurana E. Loss-of-function tolerance of enhancers in the human genome. PLoS Genet 2020; 16:e1008663. [PMID: 32243438 PMCID: PMC7159235 DOI: 10.1371/journal.pgen.1008663] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Revised: 04/15/2020] [Accepted: 02/12/2020] [Indexed: 12/21/2022] Open
Abstract
Previous studies have surveyed the potential impact of loss-of-function (LoF) variants and identified LoF-tolerant protein-coding genes. However, the tolerance of human genomes to losing enhancers has not yet been evaluated. Here we present the catalog of LoF-tolerant enhancers using structural variants from whole-genome sequences. Using a conservative approach, we estimate that individual human genomes possess at least 28 LoF-tolerant enhancers on average. We assessed the properties of LoF-tolerant enhancers in a unified regulatory network constructed by integrating tissue-specific enhancers and gene-gene interactions. We find that LoF-tolerant enhancers tend to be more tissue-specific and regulate fewer and more dispensable genes relative to other enhancers. They are enriched in immune-related cells while enhancers with low LoF-tolerance are enriched in kidney and brain/neuronal stem cells. We developed a supervised learning approach to predict the LoF-tolerance of all enhancers, which achieved an area under the receiver operating characteristics curve (AUROC) of 98%. We predict 3,519 more enhancers would be likely tolerant to LoF and 129 enhancers that would have low LoF-tolerance. Our predictions are supported by a known set of disease enhancers and novel deletions from PacBio sequencing. The LoF-tolerance scores provided here will serve as an important reference for disease studies. Enhancers are elements where transcription factors bind and regulate the expression of protein-coding genes. Although multiple previous studies have focused on which genes can tolerate loss-of-function (LoF), none has systematically evaluated the tolerance of all enhancers in the human genome to LoF. Individual studies have shown a broad range of phenotypic effects of enhancer LoF. The phenotypic effects of enhancer LoF likely fall into a spectrum where deletion of LoF-tolerant enhancers would not elicit substantial phenotypic impact, while some enhancers are likely to cause fitness defects when deleted. Here we report a systematic computational approach that uses machine learning and properties of enhancers in a unified human regulatory network with tissue-specific annotations to predict the LoF-tolerance of all enhancers identified in the human genome. The LoF-tolerance scores of enhancers provided in this study can significantly facilitate the interpretation and prioritization of non-coding sequence variants for disease and functional studies.
Collapse
Affiliation(s)
- Duo Xu
- Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York, United States of America
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, New York, United States of America
- Englander Institute for Precision Medicine, New York Presbyterian Hospital-Weill Cornell Medicine, New York, New York, United States of America
- Meyer Cancer Center, Weill Cornell Medicine, New York, New York, United States of America
| | - Omer Gokcumen
- Department of Biological Sciences, University at Buffalo, The State University of New York, Buffalo, New York, United States of America
| | - Ekta Khurana
- Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York, United States of America
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, New York, United States of America
- Englander Institute for Precision Medicine, New York Presbyterian Hospital-Weill Cornell Medicine, New York, New York, United States of America
- Meyer Cancer Center, Weill Cornell Medicine, New York, New York, United States of America
- * E-mail:
| |
Collapse
|
30
|
Kim HJ, Osteil P, Humphrey SJ, Cinghu S, Oldfield AJ, Patrick E, Wilkie EE, Peng G, Suo S, Jothi R, Tam PPL, Yang P. Transcriptional network dynamics during the progression of pluripotency revealed by integrative statistical learning. Nucleic Acids Res 2020; 48:1828-1842. [PMID: 31853542 PMCID: PMC7038952 DOI: 10.1093/nar/gkz1179] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 12/02/2019] [Accepted: 12/09/2019] [Indexed: 12/12/2022] Open
Abstract
The developmental potential of cells, termed pluripotency, is highly dynamic and progresses through a continuum of naive, formative and primed states. Pluripotency progression of mouse embryonic stem cells (ESCs) from naive to formative and primed state is governed by transcription factors (TFs) and their target genes. Genomic techniques have uncovered a multitude of TF binding sites in ESCs, yet a major challenge lies in identifying target genes from functional binding sites and reconstructing dynamic transcriptional networks underlying pluripotency progression. Here, we integrated time-resolved ‘trans-omic’ datasets together with TF binding profiles and chromatin conformation data to identify target genes of a panel of TFs. Our analyses revealed that naive TF target genes are more likely to be TFs themselves than those of formative TFs, suggesting denser hierarchies among naive TFs. We also discovered that formative TF target genes are marked by permissive epigenomic signatures in the naive state, indicating that they are poised for expression prior to the initiation of pluripotency transition to the formative state. Finally, our reconstructed transcriptional networks pinpointed the precise timing from naive to formative pluripotency progression and enabled the spatiotemporal mapping of differentiating ESCs to their in vivo counterparts in developing embryos.
Collapse
Affiliation(s)
- Hani Jieun Kim
- Charles Perkins Centre, School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia.,Computational Systems Biology Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia.,School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, NSW 2006, Australia
| | - Pierre Osteil
- School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, NSW 2006, Australia.,Embryology Unit, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia
| | - Sean J Humphrey
- Charles Perkins Centre, School of Life and Environmental Sciences, University of Sydney, Sydney, NSW 2006, Australia
| | - Senthilkumar Cinghu
- Epigenetics & Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC 27709, USA
| | - Andrew J Oldfield
- Institute of Human Genetics, CNRS, University of Montpellier, Montpellier, France
| | - Ellis Patrick
- Charles Perkins Centre, School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia.,School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, NSW 2006, Australia.,Westmead Institute for Medical Research, University of Sydney, Westmead, NSW 2145, Australia
| | - Emilie E Wilkie
- Embryology Unit, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia
| | - Guangdun Peng
- CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China, and Guangzhou Regenerative Medicine and Health Guangdong Laboratory (GRMH-GDL), Guangzhou 510005, China
| | - Shengbao Suo
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard T.H. Chan School of Public Health, Boston, MA 02215, USA
| | - Raja Jothi
- Epigenetics & Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC 27709, USA
| | - Patrick P L Tam
- School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, NSW 2006, Australia.,Embryology Unit, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia
| | - Pengyi Yang
- Charles Perkins Centre, School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia.,Computational Systems Biology Group, Children's Medical Research Institute, University of Sydney, Westmead, NSW 2145, Australia.,School of Medical Sciences, Faculty of Medicine and Health, University of Sydney, NSW 2006, Australia
| |
Collapse
|
31
|
Xu T, Zheng X, Li B, Jin P, Qin Z, Wu H. A comprehensive review of computational prediction of genome-wide features. Brief Bioinform 2020; 21:120-134. [PMID: 30462144 PMCID: PMC10233247 DOI: 10.1093/bib/bby110] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2018] [Revised: 10/15/2018] [Accepted: 10/16/2018] [Indexed: 12/15/2022] Open
Abstract
There are significant correlations among different types of genetic, genomic and epigenomic features within the genome. These correlations make the in silico feature prediction possible through statistical or machine learning models. With the accumulation of a vast amount of high-throughput data, feature prediction has gained significant interest lately, and a plethora of papers have been published in the past few years. Here we provide a comprehensive review on these published works, categorized by the prediction targets, including protein binding site, enhancer, DNA methylation, chromatin structure and gene expression. We also provide discussions on some important points and possible future directions.
Collapse
Affiliation(s)
- Tianlei Xu
- Department of Mathematics and Computer Science, Emory University, Atlanta, GA, USA
| | - Xiaoqi Zheng
- Department of Mathematics, Shanghai Normal University, Shanghai, China
| | - Ben Li
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Peng Jin
- Department of Human Genetics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Zhaohui Qin
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| |
Collapse
|
32
|
Piggin CL, Roden DL, Law AMK, Molloy MP, Krisp C, Swarbrick A, Naylor MJ, Kalyuga M, Kaplan W, Oakes SR, Gallego-Ortega D, Clark SJ, Carroll JS, Bartonicek N, Ormandy CJ. ELF5 modulates the estrogen receptor cistrome in breast cancer. PLoS Genet 2020; 16:e1008531. [PMID: 31895944 PMCID: PMC6959601 DOI: 10.1371/journal.pgen.1008531] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Revised: 01/14/2020] [Accepted: 11/20/2019] [Indexed: 11/28/2022] Open
Abstract
Acquired resistance to endocrine therapy is responsible for half of the therapeutic failures in the treatment of breast cancer. Recent findings have implicated increased expression of the ETS transcription factor ELF5 as a potential modulator of estrogen action and driver of endocrine resistance, and here we provide the first insight into the mechanisms by which ELF5 modulates estrogen sensitivity. Using chromatin immunoprecipitation sequencing we found that ELF5 binding overlapped with FOXA1 and ER at super enhancers, enhancers and promoters, and when elevated, caused FOXA1 and ER to bind to new regions of the genome, in a pattern that replicated the alterations to the ER/FOXA1 cistrome caused by the acquisition of resistance to endocrine therapy. RNA sequencing demonstrated that these changes altered estrogen-driven patterns of gene expression, the expression of ER transcription-complex members, and 6 genes known to be involved in driving the acquisition of endocrine resistance. Using rapid immunoprecipitation mass spectrometry of endogenous proteins, and proximity ligation assays, we found that ELF5 interacted physically with members of the ER transcription complex, such as DNA-PKcs. We found 2 cases of endocrine-resistant brain metastases where ELF5 levels were greatly increased and ELF5 patterns of gene expression were enriched, compared to the matched primary tumour. Thus ELF5 alters ER-driven gene expression by modulating the ER/FOXA1 cistrome, by interacting with it, and by modulating the expression of members of the ER transcriptional complex, providing multiple mechanisms by which ELF5 can drive endocrine resistance.
Collapse
Affiliation(s)
- Catherine L. Piggin
- Garvan Institute of Medical Research and The Kinghorn Cancer Centre, Victoria Street Darlinghurst Sydney, NSW, Australia
- St Vincent’s Clinical School, Faculty of Medicine, UNSW Sydney, Australia
| | - Daniel L. Roden
- Garvan Institute of Medical Research and The Kinghorn Cancer Centre, Victoria Street Darlinghurst Sydney, NSW, Australia
- St Vincent’s Clinical School, Faculty of Medicine, UNSW Sydney, Australia
| | - Andrew M. K. Law
- Garvan Institute of Medical Research and The Kinghorn Cancer Centre, Victoria Street Darlinghurst Sydney, NSW, Australia
- St Vincent’s Clinical School, Faculty of Medicine, UNSW Sydney, Australia
| | - Mark P. Molloy
- Australian Proteome Analysis Facility, Macquarie University, Sydney, Australia
| | - Christoph Krisp
- Australian Proteome Analysis Facility, Macquarie University, Sydney, Australia
| | - Alexander Swarbrick
- Garvan Institute of Medical Research and The Kinghorn Cancer Centre, Victoria Street Darlinghurst Sydney, NSW, Australia
- St Vincent’s Clinical School, Faculty of Medicine, UNSW Sydney, Australia
| | - Matthew J. Naylor
- Garvan Institute of Medical Research and The Kinghorn Cancer Centre, Victoria Street Darlinghurst Sydney, NSW, Australia
- St Vincent’s Clinical School, Faculty of Medicine, UNSW Sydney, Australia
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Sydney, Australia
| | - Maria Kalyuga
- Garvan Institute of Medical Research and The Kinghorn Cancer Centre, Victoria Street Darlinghurst Sydney, NSW, Australia
- St Vincent’s Clinical School, Faculty of Medicine, UNSW Sydney, Australia
| | - Warren Kaplan
- Garvan Institute of Medical Research and The Kinghorn Cancer Centre, Victoria Street Darlinghurst Sydney, NSW, Australia
- St Vincent’s Clinical School, Faculty of Medicine, UNSW Sydney, Australia
| | - Samantha R. Oakes
- Garvan Institute of Medical Research and The Kinghorn Cancer Centre, Victoria Street Darlinghurst Sydney, NSW, Australia
- St Vincent’s Clinical School, Faculty of Medicine, UNSW Sydney, Australia
| | - David Gallego-Ortega
- Garvan Institute of Medical Research and The Kinghorn Cancer Centre, Victoria Street Darlinghurst Sydney, NSW, Australia
- St Vincent’s Clinical School, Faculty of Medicine, UNSW Sydney, Australia
| | - Susan J. Clark
- Garvan Institute of Medical Research and The Kinghorn Cancer Centre, Victoria Street Darlinghurst Sydney, NSW, Australia
- St Vincent’s Clinical School, Faculty of Medicine, UNSW Sydney, Australia
| | - Jason S. Carroll
- Cancer Research UK Cambridge Research Institute, Li Ka Shing Centre Robinson Way, Cambridge, United Kingdom
| | - Nenad Bartonicek
- Garvan Institute of Medical Research and The Kinghorn Cancer Centre, Victoria Street Darlinghurst Sydney, NSW, Australia
- St Vincent’s Clinical School, Faculty of Medicine, UNSW Sydney, Australia
| | - Christopher J. Ormandy
- Garvan Institute of Medical Research and The Kinghorn Cancer Centre, Victoria Street Darlinghurst Sydney, NSW, Australia
- St Vincent’s Clinical School, Faculty of Medicine, UNSW Sydney, Australia
| |
Collapse
|
33
|
Nguyen QH, Nguyen-Vo TH, Le NQK, Do TTT, Rahardja S, Nguyen BP. iEnhancer-ECNN: identifying enhancers and their strength using ensembles of convolutional neural networks. BMC Genomics 2019; 20:951. [PMID: 31874637 PMCID: PMC6929481 DOI: 10.1186/s12864-019-6336-3] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND Enhancers are non-coding DNA fragments which are crucial in gene regulation (e.g. transcription and translation). Having high locational variation and free scattering in 98% of non-encoding genomes, enhancer identification is, therefore, more complicated than other genetic factors. To address this biological issue, several in silico studies have been done to identify and classify enhancer sequences among a myriad of DNA sequences using computational advances. Although recent studies have come up with improved performance, shortfalls in these learning models still remain. To overcome limitations of existing learning models, we introduce iEnhancer-ECNN, an efficient prediction framework using one-hot encoding and k-mers for data transformation and ensembles of convolutional neural networks for model construction, to identify enhancers and classify their strength. The benchmark dataset from Liu et al.'s study was used to develop and evaluate the ensemble models. A comparative analysis between iEnhancer-ECNN and existing state-of-the-art methods was done to fairly assess the model performance. RESULTS Our experimental results demonstrates that iEnhancer-ECNN has better performance compared to other state-of-the-art methods using the same dataset. The accuracy of the ensemble model for enhancer identification (layer 1) and enhancer classification (layer 2) are 0.769 and 0.678, respectively. Compared to other related studies, improvements in the Area Under the Receiver Operating Characteristic Curve (AUC), sensitivity, and Matthews's correlation coefficient (MCC) of our models are remarkable, especially for the model of layer 2 with about 11.0%, 46.5%, and 65.0%, respectively. CONCLUSIONS iEnhancer-ECNN outperforms other previously proposed methods with significant improvement in most of the evaluation metrics. Strong growths in the MCC of both layers are highly meaningful in assuring the stability of our models.
Collapse
Affiliation(s)
- Quang H Nguyen
- School of Information and Communication Technology, Hanoi University of Science and Technology, 1 Dai Co Viet, Hanoi 100000, Vietnam
| | - Thanh-Hoang Nguyen-Vo
- School of Mathematics and Statistics, Victoria University of Wellington, Gate 7, Kelburn Parade, Wellington, 6142, New Zealand
| | - Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, Taipei Medical University, Keelung Road, Da'an Distric, Taipei City, 106, Taiwan (R.O.C.)
| | - Trang T T Do
- Institute of Research and Development, Duy Tan University, Danang 550000, Vietnam
| | - Susanto Rahardja
- School of Marine Science and Technology, Northwestern Polytechnical University, 127 West Youyi Road, Xi'an 710072, China.
| | - Binh P Nguyen
- School of Mathematics and Statistics, Victoria University of Wellington, Gate 7, Kelburn Parade, Wellington, 6142, New Zealand.
| |
Collapse
|
34
|
Wreczycka K, Franke V, Uyar B, Wurmus R, Bulut S, Tursun B, Akalin A. HOT or not: examining the basis of high-occupancy target regions. Nucleic Acids Res 2019; 47:5735-5745. [PMID: 31114922 PMCID: PMC6582337 DOI: 10.1093/nar/gkz460] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Revised: 05/02/2019] [Accepted: 05/13/2019] [Indexed: 01/16/2023] Open
Abstract
High-occupancy target (HOT) regions are segments of the genome with unusually high number of transcription factor binding sites. These regions are observed in multiple species and thought to have biological importance due to high transcription factor occupancy. Furthermore, they coincide with house-keeping gene promoters and consequently associated genes are stably expressed across multiple cell types. Despite these features, HOT regions are solely defined using ChIP-seq experiments and shown to lack canonical motifs for transcription factors that are thought to be bound there. Although, ChIP-seq experiments are the golden standard for finding genome-wide binding sites of a protein, they are not noise free. Here, we show that HOT regions are likely to be ChIP-seq artifacts and they are similar to previously proposed ‘hyper-ChIPable’ regions. Using ChIP-seq data sets for knocked-out transcription factors, we demonstrate presence of false positive signals on HOT regions. We observe sequence characteristics and genomic features that are discriminatory of HOT regions, such as GC/CpG-rich k-mers, enrichment of RNA–DNA hybrids (R-loops) and DNA tertiary structures (G-quadruplex DNA). The artificial ChIP-seq enrichment on HOT regions could be associated to these discriminatory features. Furthermore, we propose strategies to deal with such artifacts for the future ChIP-seq studies.
Collapse
Affiliation(s)
- Katarzyna Wreczycka
- Bioinformatics and Omics Data Science Platform, Berlin Institute for Medical Systems Biology, Max-Delbrück Center for Molecular Medicine, Berlin, Germany
| | - Vedran Franke
- Bioinformatics and Omics Data Science Platform, Berlin Institute for Medical Systems Biology, Max-Delbrück Center for Molecular Medicine, Berlin, Germany
| | - Bora Uyar
- Bioinformatics and Omics Data Science Platform, Berlin Institute for Medical Systems Biology, Max-Delbrück Center for Molecular Medicine, Berlin, Germany
| | - Ricardo Wurmus
- Bioinformatics and Omics Data Science Platform, Berlin Institute for Medical Systems Biology, Max-Delbrück Center for Molecular Medicine, Berlin, Germany
| | - Selman Bulut
- Gene Regulation and Cell Fate Decision in C. elegans, Berlin Institute for Medical Systems Biology, Max-Delbrück Center for Molecular Medicine, Berlin, Germany
| | - Baris Tursun
- Gene Regulation and Cell Fate Decision in C. elegans, Berlin Institute for Medical Systems Biology, Max-Delbrück Center for Molecular Medicine, Berlin, Germany
| | - Altuna Akalin
- Bioinformatics and Omics Data Science Platform, Berlin Institute for Medical Systems Biology, Max-Delbrück Center for Molecular Medicine, Berlin, Germany
| |
Collapse
|
35
|
Hong C, Yip KY. Flexible k-mers with variable-length indels for identifying binding sequences of protein dimers. Brief Bioinform 2019. [DOI: 10.1093/bib/bbz101] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Abstract
Many DNA-binding proteins interact with partner proteins. Recently, based on the high-throughput consecutive affinity-purification systematic evolution of ligands by exponential enrichment (CAP-SELEX) method, many such protein pairs have been found to bind DNA with flexible spacing between their individual binding motifs. Most existing motif representations were not designed to capture such flexibly spaced regions. In order to computationally discover more co-binding events without prior knowledge about the identities of the co-binding proteins, a new representation is needed. We propose a new class of sequence patterns that flexibly model such variable regions and corresponding algorithms that identify co-bound sequences using these patterns. Based on both simulated and CAP-SELEX data, features derived from our sequence patterns lead to better classification performance than patterns that do not explicitly model the variable regions. We also show that even for standard ChIP-seq data, this new class of sequence patterns can help discover co-bound events in a subset of sequences in an unsupervised manner. The open-source software is available at https://github.com/kevingroup/glk-SVM.
Collapse
Affiliation(s)
- Chenyang Hong
- Department of Computer Science and Engineering at The Chinese University of Hong Kong
| | - Kevin Y Yip
- Department of Computer Science and Engineering at The Chinese University of Hong Kong
| |
Collapse
|
36
|
Hou Y, Zhang R, Sun X. Enhancer LncRNAs Influence Chromatin Interactions in Different Ways. Front Genet 2019; 10:936. [PMID: 31681405 PMCID: PMC6807612 DOI: 10.3389/fgene.2019.00936] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Accepted: 09/05/2019] [Indexed: 12/14/2022] Open
Abstract
More than 98% of the human genome does not encode proteins, and the vast majority of the noncoding regions have not been well studied. Some of these regions contain enhancers and functional non-coding RNAs. Previous research suggested that enhancer transcripts could be potent independent indicators of enhancer activity, and some enhancer lncRNAs (elncRNAs) have been proven to play critical roles in gene regulation. Here, we identified enhancer–promoter interactions from high-throughput chromosome conformation capture (Hi-C) data. We found that elncRNAs were highly enriched surrounding chromatin loop anchors. Additionally, the interaction frequency of elncRNA-associated enhancer–promoter pairs was significantly higher than the interaction frequency of other enhancer–promoter pairs, suggesting that elncRNAs may reinforce the interactions between enhancers and promoters. We also found that elncRNA expression levels were positively correlated with the interaction frequency of enhancer–promoter pairs. The promoters interacting with elncRNA-associated enhancers were rich in RNA polymerase II and YY1 transcription factor binding sites. We clustered enhancer–promoter pairs into different groups to reflect the different ways in which elncRNAs could influence enhancer–promoter pairs. Interestingly, G-quadruplexes were found to potentially mediate some enhancer–promoter interaction pairs, and the interaction frequency of these pairs was significantly higher than that of other enhancer–promoter pairs. We also found that the G-quadruplexes on enhancers were highly related to the expression of elncRNAs. G-quadruplexes located in the promoters of elncRNAs led to high expression of elncRNAs, whereas G-quadruplexes located in the gene bodies of elncRNAs generally resulted in low expression of elncRNAs.
Collapse
Affiliation(s)
- Yue Hou
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Rongxin Zhang
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Xiao Sun
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| |
Collapse
|
37
|
Hou Y, Li F, Zhang R, Li S, Liu H, Qin ZS, Sun X. Integrative characterization of G-Quadruplexes in the three-dimensional chromatin structure. Epigenetics 2019; 14:894-911. [PMID: 31177910 PMCID: PMC6691997 DOI: 10.1080/15592294.2019.1621140] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2019] [Revised: 05/05/2019] [Accepted: 05/14/2019] [Indexed: 12/14/2022] Open
Abstract
DNA molecules are highly compacted in the eukaryotic nucleus where distal regulatory elements reach their targets through three-dimensional chromosomal interactions. G-quadruplexes, stable four-stranded non-canonical DNA structures, can change local chromatin organization through the exclusion of nucleosomes. However, the relationship between G-quadruplexes and higher-order genome organization remains unknown. Here, we found that G-quadruplexes are significantly enriched at boundaries of topological associated domains (TADs). Architectural protein occupancy, which plays critical roles in the formation of TADs, was highly correlated with the content of G-quadruplexes at TAD boundaries. Moreover, adjacent boundaries containing G-quadruplexes frequently interacted with each other because of the high enrichment of architectural protein binding sites. Similar to CCCTC-binding factor (CTCF) binding sites, G-quadruplexes also showed strong insulation ability in the separation of adjacent regions. Additionally, the insulation ability of CTCF binding sites and TAD boundaries was significantly reinforced by G-quadruplexes. Furthermore, G-quadruplex motifs on different strands were associated with the orientation of CTCF binding sites. These findings suggest a potential role for G-quadruplexes in loop extrusion. The enrichment of transcription factor binding sites (TFBSs) around regulatory elements containing G-quadruplexes led to frequent interactions between regulatory elements containing G-quadruplexes. Intriguingly, more than 99% of G-quadruplexes overlapped with TFBSs. The binding sites of CTCF and cohesin proteins were preferentially located surrounding G-quadruplexes. Accordingly, we proposed a new mechanism of long-distance gene regulation in which G-quadruplexes are involved in distal interactions between enhancers and promoters.
Collapse
Affiliation(s)
- Yue Hou
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, Jiangsu, China
| | - Fuyu Li
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, Jiangsu, China
| | - Rongxin Zhang
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, Jiangsu, China
| | - Sheng Li
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, Jiangsu, China
| | - Hongde Liu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, Jiangsu, China
| | - Zhaohui S. Qin
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, Jiangsu, China
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA USA
| | - Xiao Sun
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, Jiangsu, China
| |
Collapse
|
38
|
Gheorghe M, Sandve GK, Khan A, Chèneby J, Ballester B, Mathelier A. A map of direct TF-DNA interactions in the human genome. Nucleic Acids Res 2019; 47:e21. [PMID: 30517703 PMCID: PMC6393237 DOI: 10.1093/nar/gky1210] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2018] [Revised: 10/31/2018] [Accepted: 11/20/2018] [Indexed: 12/11/2022] Open
Abstract
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is the most popular assay to identify genomic regions, called ChIP-seq peaks, that are bound in vivo by transcription factors (TFs). These regions are derived from direct TF-DNA interactions, indirect binding of the TF to the DNA (through a co-binding partner), nonspecific binding to the DNA, and noise/bias/artifacts. Delineating the bona fide direct TF-DNA interactions within the ChIP-seq peaks remains challenging. We developed a dedicated software, ChIP-eat, that combines computational TF binding models and ChIP-seq peaks to automatically predict direct TF-DNA interactions. Our work culminated with predicted interactions covering >4% of the human genome, obtained by uniformly processing 1983 ChIP-seq peak data sets from the ReMap database for 232 unique TFs. The predictions were a posteriori assessed using protein binding microarray and ChIP-exo data, and were predominantly found in high quality ChIP-seq peaks. The set of predicted direct TF-DNA interactions suggested that high-occupancy target regions are likely not derived from direct binding of the TFs to the DNA. Our predictions derived co-binding TFs supported by protein-protein interaction data and defined cis-regulatory modules enriched for disease- and trait-associated SNPs. We provide this collection of direct TF-DNA interactions and cis-regulatory modules through the UniBind web-interface (http://unibind.uio.no).
Collapse
Affiliation(s)
- Marius Gheorghe
- Centre for Molecular Medicine Norway (NCMM), University of Oslo, Oslo, Norway
| | | | - Aziz Khan
- Centre for Molecular Medicine Norway (NCMM), University of Oslo, Oslo, Norway
| | - Jeanne Chèneby
- Aix Marseille Université, INSERM, TAGC, Marseille, France
| | | | - Anthony Mathelier
- Centre for Molecular Medicine Norway (NCMM), University of Oslo, Oslo, Norway.,Department of Cancer Genetics, Institute for Cancer Research, Radiumhospitalet, Oslo, Norway
| |
Collapse
|
39
|
Lewis MW, Li S, Franco HL. Transcriptional control by enhancers and enhancer RNAs. Transcription 2019; 10:171-186. [PMID: 31791217 PMCID: PMC6948965 DOI: 10.1080/21541264.2019.1695492] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 11/14/2019] [Accepted: 11/15/2019] [Indexed: 11/02/2022] Open
Abstract
The regulation of gene expression is a fundamental cellular process and its misregulation is a key component of disease. Enhancers are one of the most salient regulatory elements in the genome and help orchestrate proper spatiotemporal gene expression during development, in homeostasis, and in response to signaling. Notably, molecular aberrations at enhancers, such as translocations and single nucleotide polymorphisms, are emerging as an important source of human variation and susceptibility to disease. Herein we discuss emerging paradigms addressing how genes are regulated by enhancers, common features of active enhancers, and how non-coding enhancer RNAs (eRNAs) can direct gene expression programs that underlie cellular phenotypes. We survey the current evidence, which suggests that eRNAs can bind to transcription factors, mediate enhancer-promoter interactions, influence RNA Pol II elongation, and act as decoys for repressive cofactors. Furthermore, we discuss current methodologies for the identification of eRNAs and novel approaches to elucidate their functions.
Collapse
Affiliation(s)
- Michael W. Lewis
- The Lineberger Comprehensive Cancer Center, Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Shen Li
- The Lineberger Comprehensive Cancer Center, Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Hector L. Franco
- The Lineberger Comprehensive Cancer Center, Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| |
Collapse
|
40
|
Benton ML, Talipineni SC, Kostka D, Capra JA. Genome-wide enhancer annotations differ significantly in genomic distribution, evolution, and function. BMC Genomics 2019; 20:511. [PMID: 31221079 PMCID: PMC6585034 DOI: 10.1186/s12864-019-5779-x] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Accepted: 05/07/2019] [Indexed: 12/28/2022] Open
Abstract
Background Non-coding gene regulatory enhancers are essential to transcription in mammalian cells. As a result, a large variety of experimental and computational strategies have been developed to identify cis-regulatory enhancer sequences. Given the differences in the biological signals assayed, some variation in the enhancers identified by different methods is expected; however, the concordance of enhancers identified by different methods has not been comprehensively evaluated. This is critically needed, since in practice, most studies consider enhancers identified by only a single method. Here, we compare enhancer sets from eleven representative strategies in four biological contexts. Results All sets we evaluated overlap significantly more than expected by chance; however, there is significant dissimilarity in their genomic, evolutionary, and functional characteristics, both at the element and base-pair level, within each context. The disagreement is sufficient to influence interpretation of candidate SNPs from GWAS studies, and to lead to disparate conclusions about enhancer and disease mechanisms. Most regions identified as enhancers are supported by only one method, and we find limited evidence that regions identified by multiple methods are better candidates than those identified by a single method. As a result, we cannot recommend the use of any single enhancer identification strategy in all settings. Conclusions Our results highlight the inherent complexity of enhancer biology and identify an important challenge to mapping the genetic architecture of complex disease. Greater appreciation of how the diverse enhancer identification strategies in use today relate to the dynamic activity of gene regulatory regions is needed to enable robust and reproducible results. Electronic supplementary material The online version of this article (10.1186/s12864-019-5779-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mary Lauren Benton
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, 37235, USA
| | - Sai Charan Talipineni
- Department of Developmental Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15201, USA
| | - Dennis Kostka
- Department of Developmental Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15201, USA. .,Department of Computational & Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15201, USA.
| | - John A Capra
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, 37235, USA. .,Departments of Biological Sciences and Computer Science, Vanderbilt Genetics Institute, Center for Structural Biology, Vanderbilt University, Nashville, TN, 37235, USA.
| |
Collapse
|
41
|
Hariprakash JM, Ferrari F. Computational Biology Solutions to Identify Enhancers-target Gene Pairs. Comput Struct Biotechnol J 2019; 17:821-831. [PMID: 31316726 PMCID: PMC6611831 DOI: 10.1016/j.csbj.2019.06.012] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Revised: 06/04/2019] [Accepted: 06/11/2019] [Indexed: 12/12/2022] Open
Abstract
Enhancers are non-coding regulatory elements that are distant from their target gene. Their characterization still remains elusive especially due to challenges in achieving a comprehensive pairing of enhancers and target genes. A number of computational biology solutions have been proposed to address this problem leveraging the increasing availability of functional genomics data and the improved mechanistic understanding of enhancer action. In this review we focus on computational methods for genome-wide definition of enhancer-target gene pairs. We outline the different classes of methods, as well as their main advantages and limitations. The types of information integrated by each method, along with details on their applicability are presented and discussed. We especially highlight the technical challenges that are still unresolved and hamper the effective achievement of a satisfactory and comprehensive solution. We expect this field will keep evolving in the coming years due to the ever-growing availability of data and increasing insights into enhancers crucial role in regulating genome functionality.
Collapse
Affiliation(s)
| | - Francesco Ferrari
- IFOM, The FIRC Institute of Molecular Oncology, Milan, Italy
- Institute of Molecular Genetics, National Research Council, Pavia, Italy
| |
Collapse
|
42
|
Liu EM, Martinez-Fundichely A, Diaz BJ, Aronson B, Cuykendall T, MacKay M, Dhingra P, Wong EWP, Chi P, Apostolou E, Sanjana NE, Khurana E. Identification of Cancer Drivers at CTCF Insulators in 1,962 Whole Genomes. Cell Syst 2019; 8:446-455.e8. [PMID: 31078526 PMCID: PMC6917527 DOI: 10.1016/j.cels.2019.04.001] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2017] [Revised: 11/20/2018] [Accepted: 04/02/2019] [Indexed: 12/15/2022]
Abstract
Recent studies have shown that mutations at non-coding elements, such as promoters and enhancers, can act as cancer drivers. However, an important class of non-coding elements, namely CTCF insulators, has been overlooked in the previous driver analyses. We used insulator annotations from CTCF and cohesin ChIA-PET and analyzed somatic mutations in 1,962 whole genomes from 21 cancer types. Using the heterogeneous patterns of transcription-factor-motif disruption, functional impact, and recurrence of mutations, we developed a computational method that revealed 21 insulators showing signals of positive selection. In particular, mutations in an insulator in multiple cancer types, including 16% of melanoma samples, are associated with TGFB1 up-regulation. Using CRISPR-Cas9, we find that alterations at two of the most frequently mutated regions in this insulator increase cell growth by 40%-50%, supporting the role of this boundary element as a cancer driver. Thus, our study reveals several CTCF insulators as putative cancer drivers.
Collapse
Affiliation(s)
- Eric Minwei Liu
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10065, USA; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA; Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Alexander Martinez-Fundichely
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10065, USA; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA; Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Bianca Jay Diaz
- New York Genome Center, New York, NY 10013, USA; Department of Biology, New York University, New York, NY 10003, USA
| | - Boaz Aronson
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10065, USA; Department of Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Tawny Cuykendall
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10065, USA; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA; Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Matthew MacKay
- Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Priyanka Dhingra
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10065, USA; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA; Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Elissa W P Wong
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Ping Chi
- Department of Medicine, Weill Cornell Medicine, New York, NY 10021, USA; Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Effie Apostolou
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10065, USA; Department of Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Neville E Sanjana
- New York Genome Center, New York, NY 10013, USA; Department of Biology, New York University, New York, NY 10003, USA
| | - Ekta Khurana
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10065, USA; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10065, USA; Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA; Caryl and Israel Englander Institute for Precision Medicine, New York Presbyterian Hospital, Weill Cornell Medicine, New York, NY 10065, USA.
| |
Collapse
|
43
|
Zhao Y, Schaafsma E, Cheng C. Applications of ENCODE data to Systematic Analyses via Data Integration. ACTA ACUST UNITED AC 2019; 11:57-64. [PMID: 31011690 DOI: 10.1016/j.coisb.2018.08.010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Large-scale genomic data have been utilized to generate unprecedented biological findings and new hypotheses. To delineate functional elements in the human genome, the Encyclopedia of DNA Elements (ENCODE) project has generated an enormous amount of genomic data, yielding around 7,000 data profiles in different cell and tissue types. In this article, we reviewed the systematic analyses that have integrated ENCODE data with other data sources to reveal new biological insights, ranging from human genome annotation to the identification of new candidate drugs. These analyses demonstrate the critical impact of ENCODE data on basic biology and translational research.
Collapse
Affiliation(s)
- Yanding Zhao
- Department of Biomedical Data Science, The Geisel School of Medicine at Dartmouth College, One Medical Center Dr., Dartmouth-Hitchcock Medical Center, Lebanon, NH, United States, 03756.,Department of Molecular and Systems Biology, The Geisel School of Medicine at Dartmouth College, One Medical Center Dr., Dartmouth-Hitchcock Medical Center, Lebanon, NH, United States, 03756
| | - Evelien Schaafsma
- Department of Biomedical Data Science, The Geisel School of Medicine at Dartmouth College, One Medical Center Dr., Dartmouth-Hitchcock Medical Center, Lebanon, NH, United States, 03756.,Department of Molecular and Systems Biology, The Geisel School of Medicine at Dartmouth College, One Medical Center Dr., Dartmouth-Hitchcock Medical Center, Lebanon, NH, United States, 03756
| | - Chao Cheng
- Department of Biomedical Data Science, The Geisel School of Medicine at Dartmouth College, One Medical Center Dr., Dartmouth-Hitchcock Medical Center, Lebanon, NH, United States, 03756.,Department of Molecular and Systems Biology, The Geisel School of Medicine at Dartmouth College, One Medical Center Dr., Dartmouth-Hitchcock Medical Center, Lebanon, NH, United States, 03756.,Norris Cotton Cancer Center, The Geisel School of Medicine at Dartmouth College, One Medical Center Dr., Dartmouth-Hitchcock Medical Center, Lebanon, NH, United States, 03756
| |
Collapse
|
44
|
Chen JL, Zhang ZH, Li BX, Cai Z, Zhou QH. Bioinformatic and functional analysis of promoter region of human SLC25A13 gene. Gene 2019; 693:69-75. [PMID: 30708027 DOI: 10.1016/j.gene.2019.01.023] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2018] [Revised: 11/26/2018] [Accepted: 01/11/2019] [Indexed: 02/07/2023]
Abstract
The human SLC25A13 gene encodes the liver type aspartate/glutamate carrier isoform 2 (AGC2, commonly named as citrin), which plays a key role in the main NADH-shuttle of human hepatocyte. Biallelic SLC25A13 mutations result in Citrin deficiency (CD). In order to identify the important regulatory region of SLC25A13 gene and elucidate the way how potential promoter mutations affect the citrin expression, we performed promoter deletion analysis and established the reporter constructs of luciferase gene-carrying SLC25A13 promoter containing several mutations located in putative transcription factor-binding sites. The luciferase activities of all promoter constructs were measured using a Dual-Luciferase Reporter Assay System. Bioinformatic analysis showed that the promoter of SLC25A13 gene lacks TATA box and obviously typical initiator element, but contains a CCAAT box and two GC box. Promoter deletion analysis confirmed the region from -221 to -1 upstream ATG was essential for SLC25A13 to maintain the promoter activity. We utilized dual-luciferase reporter system as function analytical model to tentatively assess the effect of artificially constructed promoter mutations on citrin expression, and our analysis revealed that mutated putative CCAAT box and GC box could significantly affect the citrin expression. Our study confirmed the important SLC25A13 promoter regions that influenced citrin expression in HL7702 cells, and constructed a function analytical model. This work may be useful to further identify the pathogenic mutations leading to CD in the promoter region.
Collapse
Affiliation(s)
- Jun-Lin Chen
- First Affiliated Hospital, Biomedical Translational Research Institute, Jinan University, Guangzhou, China
| | - Zhan-Hui Zhang
- Clinical Medicine Research Institute, The First Affiliated Hospital, Jinan University, Guangzhou 510630, China.
| | - Bing-Xiao Li
- Department of Pediatrics, The First Affiliated Hospital, Jinan University, Guangzhou, Guangdong 510630, China
| | - Zhen Cai
- Biomedical Translational Research Institute, Jinan University, Guangzhou, Guangdong 510632, China
| | - Qing-Hua Zhou
- First Affiliated Hospital, Biomedical Translational Research Institute, Jinan University, Guangzhou, China
| |
Collapse
|
45
|
Ho EYK, Cao Q, Gu M, Chan RWL, Wu Q, Gerstein M, Yip KY. Shaping the nebulous enhancer in the era of high-throughput assays and genome editing. Brief Bioinform 2019; 21:836-850. [PMID: 30895290 DOI: 10.1093/bib/bbz030] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Revised: 02/15/2019] [Accepted: 02/26/2019] [Indexed: 01/22/2023] Open
Abstract
Since the 1st discovery of transcriptional enhancers in 1981, their textbook definition has remained largely unchanged in the past 37 years. With the emergence of high-throughput assays and genome editing, which are switching the paradigm from bottom-up discovery and testing of individual enhancers to top-down profiling of enhancer activities genome-wide, it has become increasingly evidenced that this classical definition has left substantial gray areas in different aspects. Here we survey a representative set of recent research articles and report the definitions of enhancers they have adopted. The results reveal that a wide spectrum of definitions is used usually without the definition stated explicitly, which could lead to difficulties in data interpretation and downstream analyses. Based on these findings, we discuss the practical implications and suggestions for future studies.
Collapse
Affiliation(s)
| | - Qin Cao
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong
| | - Mengting Gu
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, USA
| | - Ricky Wai-Lun Chan
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong
| | - Qiong Wu
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong.,School of Biomedical Sciences, The Chinese University of Hong Kong, Hong Kong
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, USA.,Program in Computational Biology and Bioinformatics.,Department of Computer Science, Yale University, New Haven, Connecticut, USA
| | - Kevin Y Yip
- Department of Biomedical Engineering.,Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong.,Hong Kong Bioinformatics Centre.,CUHK-BGI Innovation Institute of Trans-omics.,Hong Kong Institute of Diabetes and Obesity, The Chinese University of Hong Kong, Hong Kong
| |
Collapse
|
46
|
Samee MAH, Bruneau BG, Pollard KS. A De Novo Shape Motif Discovery Algorithm Reveals Preferences of Transcription Factors for DNA Shape Beyond Sequence Motifs. Cell Syst 2019; 8:27-42.e6. [PMID: 30660610 PMCID: PMC6368855 DOI: 10.1016/j.cels.2018.12.001] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2018] [Revised: 08/18/2018] [Accepted: 12/03/2018] [Indexed: 12/17/2022]
Abstract
DNA shape adds specificity to sequence motifs but has not been explored systematically outside this context. We hypothesized that DNA-binding proteins (DBPs) preferentially occupy DNA with specific structures ("shape motifs") regardless of whether or not these correspond to high information content sequence motifs. We present ShapeMF, a Gibbs sampling algorithm that identifies de novo shape motifs. Using binding data from hundreds of in vivo and in vitro experiments, we show that most DBPs have shape motifs and can occupy these in the absence of sequence motifs. This "shape-only binding" is common for many DBPs and in regions co-bound by multiple DBPs. When shape and sequence motifs co-occur, they can be overlapping, flanking, or separated by consistent spacing. Finally, DBPs within the same protein family have different shape motifs, explaining their distinct genome-wide occupancy despite having similar sequence motifs. These results suggest that shape motifs not only complement sequence motifs but also facilitate recognition of DNA beyond conventionally defined sequence motifs.
Collapse
Affiliation(s)
| | - Benoit G Bruneau
- Gladstone Institutes, San Francisco, CA 94158, USA; Department of Pediatrics and Cardiovascular Research Institute, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Katherine S Pollard
- Gladstone Institutes, San Francisco, CA 94158, USA; Department of Epidemiology & Biostatistics, Institute for Human Genetics, Quantitative Biology Institute, and Institute for Computational Health Sciences, University of California, San Francisco, San Francisco, CA 94158, USA; Chan-Zuckerberg Biohub, San Francisco, CA 94158, USA.
| |
Collapse
|
47
|
Lochovsky L, Zhang J, Gerstein M. MOAT: efficient detection of highly mutated regions with the Mutations Overburdening Annotations Tool. Bioinformatics 2019; 34:1031-1033. [PMID: 29121169 PMCID: PMC5860157 DOI: 10.1093/bioinformatics/btx700] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2017] [Accepted: 11/06/2017] [Indexed: 02/05/2023] Open
Abstract
Summary Identifying genomic regions with higher than expected mutation count is useful for cancer driver detection. Previous parametric approaches require numerous cell-type-matched covariates for accurate background mutation rate (BMR) estimation, which is not practical for many situations. Non-parametric, permutation-based approaches avoid this issue but usually suffer from considerable compute-time cost. Hence, we introduce Mutations Overburdening Annotations Tool (MOAT), a non-parametric scheme that makes no assumptions about mutation process except requiring that the BMR changes smoothly with genomic features. MOAT randomly permutes single-nucleotide variants, or target regions, on a relatively large scale to provide robust burden analysis. Furthermore, we show how we can do permutations in an efficient manner using graphics processing unit acceleration, speeding up the calculation by a factor of ∼250. Availability and implementation MOAT is available at moat.gersteinlab.org. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lucas Lochovsky
- Program in Computational Biology and Bioinformatics.,Department of Molecular Biophysics and Biochemistry
| | - Jing Zhang
- Program in Computational Biology and Bioinformatics.,Department of Molecular Biophysics and Biochemistry
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics.,Department of Molecular Biophysics and Biochemistry.,Department of Computer Science, Yale University, New Haven, CT 06520, USA
| |
Collapse
|
48
|
Völkel S, Stielow B, Finkernagel F, Berger D, Stiewe T, Nist A, Suske G. Transcription factor Sp2 potentiates binding of the TALE homeoproteins Pbx1:Prep1 and the histone-fold domain protein Nf-y to composite genomic sites. J Biol Chem 2018; 293:19250-19262. [PMID: 30337366 DOI: 10.1074/jbc.ra118.005341] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Revised: 10/17/2018] [Indexed: 11/06/2022] Open
Abstract
Different transcription factors operate together at promoters and enhancers to regulate gene expression. Transcription factors either bind directly to their target DNA or are tethered to it by other proteins. The transcription factor Sp2 serves as a paradigm for indirect genomic binding. It does not require its DNA-binding domain for genomic DNA binding and occupies target promoters independently of whether they contain a cognate DNA-binding motif. Hence, Sp2 is strikingly different from its closely related paralogs Sp1 and Sp3, but how Sp2 recognizes its targets is unknown. Here, we sought to gain more detailed insights into the genomic targeting mechanism of Sp2. ChIP-exo sequencing in mouse embryonic fibroblasts revealed genomic binding of Sp2 to a composite motif where a recognition sequence for TALE homeoproteins and a recognition sequence for the trimeric histone-fold domain protein nuclear transcription factor Y (Nf-y) are separated by 11 bp. We identified a complex consisting of the TALE homeobox protein Prep1, its partner PBX homeobox 1 (Pbx1), and Nf-y as the major partners in Sp2-promoter interactions. We found that the Pbx1:Prep1 complex together with Nf-y recruits Sp2 to co-occupied regulatory elements. In turn, Sp2 potentiates binding of Pbx1:Prep1 and Nf-y. We also found that the Sp-box, a short sequence motif close to the Sp2 N terminus, is crucial for Sp2's cofactor function. Our findings reveal a mechanism by which the DNA binding-independent activity of Sp2 potentiates genomic loading of Pbx1:Prep1 and Nf-y to composite motifs present in many promoters of highly expressed genes.
Collapse
Affiliation(s)
- Sara Völkel
- From the Institute of Molecular Biology and Tumor Research (IMT) and
| | - Bastian Stielow
- From the Institute of Molecular Biology and Tumor Research (IMT) and
| | | | - Dana Berger
- From the Institute of Molecular Biology and Tumor Research (IMT) and
| | - Thorsten Stiewe
- the Genomics Core Facility, Center for Tumor Biology and Immunology (ZTI), Philipps-University of Marburg, 35043 Marburg, Germany
| | - Andrea Nist
- the Genomics Core Facility, Center for Tumor Biology and Immunology (ZTI), Philipps-University of Marburg, 35043 Marburg, Germany
| | - Guntram Suske
- From the Institute of Molecular Biology and Tumor Research (IMT) and
| |
Collapse
|
49
|
Devailly G, Joshi A. Insights into mammalian transcription control by systematic analysis of ChIP sequencing data. BMC Bioinformatics 2018; 19:409. [PMID: 30453943 PMCID: PMC6245581 DOI: 10.1186/s12859-018-2377-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Background Transcription regulation is a major controller of gene expression dynamics during development and disease, where transcription factors (TFs) modulate expression of genes through direct or indirect DNA interaction. ChIP sequencing has become the most widely used technique to get a genome wide view of TF occupancy in a cell type of interest, mainly due to established standard protocols and a rapid decrease in the cost of sequencing. The number of available ChIP sequencing data sets in public domain is therefore ever increasing, including data generated by individual labs together with consortia such as the ENCODE project. Results A total of 1735 ChIP-sequencing datasets in mouse and human cell types and tissues were used to perform bioinformatic analyses to unravel diverse features of transcription control. 1- We used the Heat*seq webtool to investigate global relations across the ChIP-seq samples. 2- We demonstrated that factors have a specific genomic location preferences that are, for most factors, conserved across species. 3- Promoter proximal binding of factors was more conserved across cell types while the distal binding sites are more cell type specific. 4- We identified combinations of factors preferentially acting together in a cellular context. 5- Finally, by integrating the data with disease-associated gene loci from GWAS studies, we highlight the value of this data to associate novel regulators to disease. Conclusion In summary, we demonstrate how ChIP sequencing data integration and analysis is powerful to get new insights into mammalian transcription control and demonstrate the utility of various bioinformatic tools to generate novel testable hypothesis using this public resource.
Collapse
Affiliation(s)
- Guillaume Devailly
- Division of Developmental Biology, the Roslin Institute, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
| | - Anagha Joshi
- Division of Developmental Biology, the Roslin Institute, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| |
Collapse
|
50
|
Dong X, Liao Z, Gritsch D, Hadzhiev Y, Bai Y, Locascio JJ, Guennewig B, Liu G, Blauwendraat C, Wang T, Adler CH, Hedreen JC, Faull RLM, Frosch MP, Nelson PT, Rizzu P, Cooper AA, Heutink P, Beach TG, Mattick JS, Müller F, Scherzer CR. Enhancers active in dopamine neurons are a primary link between genetic variation and neuropsychiatric disease. Nat Neurosci 2018; 21:1482-1492. [PMID: 30224808 PMCID: PMC6334654 DOI: 10.1038/s41593-018-0223-0] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Accepted: 07/23/2018] [Indexed: 01/07/2023]
Abstract
Enhancers function as DNA logic gates and may control specialized functions of billions of neurons. Here we show a tailored program of noncoding genome elements active in situ in physiologically distinct dopamine neurons of the human brain. We found 71,022 transcribed noncoding elements, many of which were consistent with active enhancers and with regulatory mechanisms in zebrafish and mouse brains. Genetic variants associated with schizophrenia, addiction, and Parkinson's disease were enriched in these elements. Expression quantitative trait locus analysis revealed that Parkinson's disease-associated variants on chromosome 17q21 cis-regulate the expression of an enhancer RNA in dopamine neurons. This study shows that enhancers in dopamine neurons link genetic variation to neuropsychiatric traits.
Collapse
Affiliation(s)
- Xianjun Dong
- Precision Neurology Program, Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
- Center for Advanced Parkinson's Disease Research of Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
| | - Zhixiang Liao
- Precision Neurology Program, Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
- Center for Advanced Parkinson's Disease Research of Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
| | - David Gritsch
- Precision Neurology Program, Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
- Center for Advanced Parkinson's Disease Research of Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
| | - Yavor Hadzhiev
- Institute of Cancer and Genomic Sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| | - Yunfei Bai
- Precision Neurology Program, Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Joseph J Locascio
- Precision Neurology Program, Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
- Center for Advanced Parkinson's Disease Research of Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
- Department of Neurology, Massachusetts General Hospital, Boston, MA, USA
| | - Boris Guennewig
- Sydney Medical School, Brain and Mind Centre, The University of Sydney, Sydney, New South Wales, Australia
- Division of Neuroscience, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- St Vincent's Clinical School, UNSW Sydney, Sydney, New South Wales, Australia
| | - Ganqiang Liu
- Precision Neurology Program, Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
- Center for Advanced Parkinson's Disease Research of Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
| | | | - Tao Wang
- Precision Neurology Program, Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
- Center for Advanced Parkinson's Disease Research of Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA
| | | | - John C Hedreen
- Harvard Brain Tissue Resource Center, McLean Hospital, Harvard Medical School, Boston, MA, USA
| | - Richard L M Faull
- Centre for Brain Research, University of Auckland, Auckland, New Zealand
| | - Matthew P Frosch
- C.S. Kubik Laboratory for Neuropathology, Massachusetts General Hospital, Boston, MA, USA
| | - Peter T Nelson
- Sanders-Brown Center on Aging, University of Kentucky, Lexington, KY, USA
| | - Patrizia Rizzu
- German Center for Neurodegenerative Diseases (DZNE), Tübingen, Germany
| | - Antony A Cooper
- Division of Neuroscience, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- St Vincent's Clinical School, UNSW Sydney, Sydney, New South Wales, Australia
| | - Peter Heutink
- German Center for Neurodegenerative Diseases (DZNE), Tübingen, Germany
| | | | - John S Mattick
- Division of Neuroscience, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- St Vincent's Clinical School, UNSW Sydney, Sydney, New South Wales, Australia
| | - Ferenc Müller
- Institute of Cancer and Genomic Sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| | - Clemens R Scherzer
- Precision Neurology Program, Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA.
- Center for Advanced Parkinson's Disease Research of Harvard Medical School and Brigham & Women's Hospital, Boston, MA, USA.
- Department of Neurology, Massachusetts General Hospital, Boston, MA, USA.
- Ann Romney Center for Neurologic Diseases, Brigham and Women's Hospital, Boston, MA, USA.
- Program in Neuroscience, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|