1
|
Wall BPG, Nguyen M, Harrell JC, Dozmorov MG. Machine and Deep Learning Methods for Predicting 3D Genome Organization. Methods Mol Biol 2025; 2856:357-400. [PMID: 39283464 DOI: 10.1007/978-1-0716-4136-1_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
Three-dimensional (3D) chromatin interactions, such as enhancer-promoter interactions (EPIs), loops, topologically associating domains (TADs), and A/B compartments, play critical roles in a wide range of cellular processes by regulating gene expression. Recent development of chromatin conformation capture technologies has enabled genome-wide profiling of various 3D structures, even with single cells. However, current catalogs of 3D structures remain incomplete and unreliable due to differences in technology, tools, and low data resolution. Machine learning methods have emerged as an alternative to obtain missing 3D interactions and/or improve resolution. Such methods frequently use genome annotation data (ChIP-seq, DNAse-seq, etc.), DNA sequencing information (k-mers and transcription factor binding site (TFBS) motifs), and other genomic properties to learn the associations between genomic features and chromatin interactions. In this review, we discuss computational tools for predicting three types of 3D interactions (EPIs, chromatin interactions, and TAD boundaries) and analyze their pros and cons. We also point out obstacles to the computational prediction of 3D interactions and suggest future research directions.
Collapse
Affiliation(s)
- Brydon P G Wall
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA, USA
| | - My Nguyen
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
| | - J Chuck Harrell
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, USA
- Massey Comprehensive Cancer Center, Virginia Commonwealth University, Richmond, VA, USA
- Center for Pharmaceutical Engineering, Virginia Commonwealth University, Richmond, VA, USA
| | - Mikhail G Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA.
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
2
|
Zhang W, Zhang M, Zhu M. RAEPI: Predicting Enhancer-Promoter Interactions Based on Restricted Attention Mechanism. Interdiscip Sci 2024:10.1007/s12539-024-00669-0. [PMID: 39546160 DOI: 10.1007/s12539-024-00669-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 10/02/2024] [Accepted: 10/09/2024] [Indexed: 11/17/2024]
Abstract
Enhancer-promoter interactions (EPIs) are crucial in gene transcription regulation and cell differentiation. Traditional biological experiments are costly and time-consuming, motivating the development of computational prediction methods. However, existing EPI prediction methods inadequately capture the intricate direct interactions between enhancer and promoter sequences, which limits their prediction performance to some extent. In this work, we propose an innovative attention-based approach RAEPI, which uses convolutional neural networks to extract initial features of enhancers and promoters, combined with a specially designed Restricted Attention mechanism with Query-Key-Value constrained to simulate the interactions between them for further feature extraction. To improve cross-cell line prediction, we employ a transfer learning strategy for pre-training. Furthermore, we extracted sequence motifs to evaluate the RAEPI's effectiveness from a visualization perspective. Experimental results show that RAEPI achieves competitive prediction performance to existing methods on the benchmark dataset.
Collapse
Affiliation(s)
- Wanjing Zhang
- College of Computer Science, Sichuan University, Chengdu, 610065, China
| | - Mingyang Zhang
- College of Computer Science, Sichuan University, Chengdu, 610065, China
| | - Min Zhu
- College of Computer Science, Sichuan University, Chengdu, 610065, China.
| |
Collapse
|
3
|
Lu Z, Xiao X, Zheng Q, Wang X, Xu L. Assessing next-generation sequencing-based computational methods for predicting transcriptional regulators with query gene sets. Brief Bioinform 2024; 25:bbae366. [PMID: 39082650 PMCID: PMC11289684 DOI: 10.1093/bib/bbae366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 06/21/2024] [Accepted: 07/18/2024] [Indexed: 08/03/2024] Open
Abstract
This article provides an in-depth review of computational methods for predicting transcriptional regulators (TRs) with query gene sets. Identification of TRs is of utmost importance in many biological applications, including but not limited to elucidating biological development mechanisms, identifying key disease genes, and predicting therapeutic targets. Various computational methods based on next-generation sequencing (NGS) data have been developed in the past decade, yet no systematic evaluation of NGS-based methods has been offered. We classified these methods into two categories based on shared characteristics, namely library-based and region-based methods. We further conducted benchmark studies to evaluate the accuracy, sensitivity, coverage, and usability of NGS-based methods with molecular experimental datasets. Results show that BART, ChIP-Atlas, and Lisa have relatively better performance. Besides, we point out the limitations of NGS-based methods and explore potential directions for further improvement.
Collapse
Affiliation(s)
- Zeyu Lu
- Department of Statistics and Data Science, Moody School of Graduate and Advanced Studies, Southern Methodist University, 3225 Daniel Ave., P.O. Box 750332, Dallas, TX, United States
| | - Xue Xiao
- Quantitative Biomedical Research Center, Peter O’Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX, United States
| | - Qiang Zheng
- Division of Data Science, College of Science, University of Texas at Arlington, 501 S. Nedderman Dr., Arlington, TX 76019, United States
| | - Xinlei Wang
- Division of Data Science, College of Science, University of Texas at Arlington, 501 S. Nedderman Dr., Arlington, TX 76019, United States
- Department of Mathematics, University of Texas at Arlington, 411 S. Nedderman Dr., Arlington, TX 76019, United States
| | - Lin Xu
- Quantitative Biomedical Research Center, Peter O’Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX, United States
- Department of Pediatrics, Division of Hematology/Oncology, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX, United States
| |
Collapse
|
4
|
Reyna J, Fetter K, Ignacio R, Marandi CCA, Rao N, Jiang Z, Figueroa DS, Bhattacharyya S, Ay F. Loop Catalog: a comprehensive HiChIP database of human and mouse samples. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.26.591349. [PMID: 38746164 PMCID: PMC11092438 DOI: 10.1101/2024.04.26.591349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
HiChIP enables cost-effective and high-resolution profiling of regulatory and structural loops. To leverage the increasing number of publicly available HiChIP datasets from diverse cell lines and primary cells, we developed the Loop Catalog (https://loopcatalog.lji.org), a web-based database featuring HiChIP loop calls for 1319 samples across 133 studies and 44 high-resolution Hi-C loop calls. We demonstrate its utility in interpreting fine-mapped GWAS variants (SNP-to-gene linking), in identifying enriched sequence motifs and motif pairs at loop anchors, and in network-level analysis of loops connecting regulatory elements (community detection). Our comprehensive catalog, spanning over 4M unique 5kb loops, along with the accompanying analysis modalities constitutes an important resource for studies in gene regulation and genome organization.
Collapse
Affiliation(s)
- Joaquin Reyna
- Centers for Cancer Immunotherapy and Autoimmunity, La Jolla Institute for Immunology, La Jolla, CA 92037 USA
- Bioinformatics and Systems Biology Graduate Program University of California, San Diego, La Jolla, CA 92093 USA
| | - Kyra Fetter
- Centers for Cancer Immunotherapy and Autoimmunity, La Jolla Institute for Immunology, La Jolla, CA 92037 USA
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093 USA
| | - Romeo Ignacio
- Centers for Cancer Immunotherapy and Autoimmunity, La Jolla Institute for Immunology, La Jolla, CA 92037 USA
| | - Cemil Can Ali Marandi
- Centers for Cancer Immunotherapy and Autoimmunity, La Jolla Institute for Immunology, La Jolla, CA 92037 USA
- Bioinformatics and Systems Biology Graduate Program University of California, San Diego, La Jolla, CA 92093 USA
| | - Nikhil Rao
- Centers for Cancer Immunotherapy and Autoimmunity, La Jolla Institute for Immunology, La Jolla, CA 92037 USA
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093 USA
| | - Zichen Jiang
- Centers for Cancer Immunotherapy and Autoimmunity, La Jolla Institute for Immunology, La Jolla, CA 92037 USA
- Department of Mathematics, University of California San Diego, La Jolla, CA 92093 USA
| | - Daniela Salgado Figueroa
- Centers for Cancer Immunotherapy and Autoimmunity, La Jolla Institute for Immunology, La Jolla, CA 92037 USA
- Bioinformatics and Systems Biology Graduate Program University of California, San Diego, La Jolla, CA 92093 USA
| | - Sourya Bhattacharyya
- Centers for Cancer Immunotherapy and Autoimmunity, La Jolla Institute for Immunology, La Jolla, CA 92037 USA
| | - Ferhat Ay
- Centers for Cancer Immunotherapy and Autoimmunity, La Jolla Institute for Immunology, La Jolla, CA 92037 USA
- Bioinformatics and Systems Biology Graduate Program University of California, San Diego, La Jolla, CA 92093 USA
- Department of Pediatrics, University of California San Diego, La Jolla, CA 92093 USA
| |
Collapse
|
5
|
Lu Z, Xiao X, Zheng Q, Wang X, Xu L. Assessing NGS-based computational methods for predicting transcriptional regulators with query gene sets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.01.578316. [PMID: 38562775 PMCID: PMC10983863 DOI: 10.1101/2024.02.01.578316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
This article provides an in-depth review of computational methods for predicting transcriptional regulators with query gene sets. Identification of transcriptional regulators is of utmost importance in many biological applications, including but not limited to elucidating biological development mechanisms, identifying key disease genes, and predicting therapeutic targets. Various computational methods based on next-generation sequencing (NGS) data have been developed in the past decade, yet no systematic evaluation of NGS-based methods has been offered. We classified these methods into two categories based on shared characteristics, namely library-based and region-based methods. We further conducted benchmark studies to evaluate the accuracy, sensitivity, coverage, and usability of NGS-based methods with molecular experimental datasets. Results show that BART, ChIP-Atlas, and Lisa have relatively better performance. Besides, we point out the limitations of NGS-based methods and explore potential directions for further improvement. Key points An introduction to available computational methods for predicting functional TRs from a query gene set.A detailed walk-through along with practical concerns and limitations.A systematic benchmark of NGS-based methods in terms of accuracy, sensitivity, coverage, and usability, using 570 TR perturbation-derived gene sets.NGS-based methods outperform motif-based methods. Among NGS methods, those utilizing larger databases and adopting region-centric approaches demonstrate favorable performance. BART, ChIP-Atlas, and Lisa are recommended as these methods have overall better performance in evaluated scenarios.
Collapse
|
6
|
Wall BPG, Nguyen M, Harrell JC, Dozmorov MG. Machine and deep learning methods for predicting 3D genome organization. ARXIV 2024:arXiv:2403.03231v1. [PMID: 38495565 PMCID: PMC10942493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Three-Dimensional (3D) chromatin interactions, such as enhancer-promoter interactions (EPIs), loops, Topologically Associating Domains (TADs), and A/B compartments play critical roles in a wide range of cellular processes by regulating gene expression. Recent development of chromatin conformation capture technologies has enabled genome-wide profiling of various 3D structures, even with single cells. However, current catalogs of 3D structures remain incomplete and unreliable due to differences in technology, tools, and low data resolution. Machine learning methods have emerged as an alternative to obtain missing 3D interactions and/or improve resolution. Such methods frequently use genome annotation data (ChIP-seq, DNAse-seq, etc.), DNA sequencing information (k-mers, Transcription Factor Binding Site (TFBS) motifs), and other genomic properties to learn the associations between genomic features and chromatin interactions. In this review, we discuss computational tools for predicting three types of 3D interactions (EPIs, chromatin interactions, TAD boundaries) and analyze their pros and cons. We also point out obstacles of computational prediction of 3D interactions and suggest future research directions.
Collapse
Affiliation(s)
- Brydon P. G. Wall
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| | - My Nguyen
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
| | - J. Chuck Harrell
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, 23284, USA
- Massey Comprehensive Cancer Center, Virginia Commonwealth University, Richmond, VA 23298, USA
- Center for Pharmaceutical Engineering, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Mikhail G. Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, 23284, USA
| |
Collapse
|
7
|
Sigauke RF, Sanford L, Maas ZL, Jones T, Stanley JT, Townsend HA, Allen MA, Dowell RD. Atlas of nascent RNA transcripts reveals enhancer to gene linkages. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.07.570626. [PMID: 38105978 PMCID: PMC10723487 DOI: 10.1101/2023.12.07.570626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Gene transcription is controlled and modulated by regulatory regions, including enhancers and promoters. These regions are abundant in unstable, non-coding bidirectional transcription. Using nascent RNA transcription data across hundreds of human samples, we identified over 800,000 regions containing bidirectional transcription. We then identify highly correlated transcription between bidirectional and gene regions. The identified correlated pairs, a bidirectional region and a gene, are enriched for disease associated SNPs and often supported by independent 3D data. We present these resources as an SQL database which serves as a resource for future studies into gene regulation, enhancer associated RNAs, and transcription factors.
Collapse
Affiliation(s)
- Rutendo F. Sigauke
- BioFrontiers Institute, University of Colorado Boulder, 3415 Colorado Ave., UCB 596, Boulder, 80309, CO, USA
| | - Lynn Sanford
- BioFrontiers Institute, University of Colorado Boulder, 3415 Colorado Ave., UCB 596, Boulder, 80309, CO, USA
| | - Zachary L. Maas
- BioFrontiers Institute, University of Colorado Boulder, 3415 Colorado Ave., UCB 596, Boulder, 80309, CO, USA
- Computer Science, University of Colorado Boulder, 1111 Engineering Drive, UCB 430, Boulder, 80309, CO, USA
| | - Taylor Jones
- BioFrontiers Institute, University of Colorado Boulder, 3415 Colorado Ave., UCB 596, Boulder, 80309, CO, USA
| | - Jacob T. Stanley
- BioFrontiers Institute, University of Colorado Boulder, 3415 Colorado Ave., UCB 596, Boulder, 80309, CO, USA
| | - Hope A. Townsend
- BioFrontiers Institute, University of Colorado Boulder, 3415 Colorado Ave., UCB 596, Boulder, 80309, CO, USA
- Molecular, Cellular and Developmental Biology, University of Colorado Boulder, 1945 Colorado Ave, UCB 347, Boulder, 80309, CO, USA
| | - Mary A. Allen
- BioFrontiers Institute, University of Colorado Boulder, 3415 Colorado Ave., UCB 596, Boulder, 80309, CO, USA
| | - Robin D. Dowell
- BioFrontiers Institute, University of Colorado Boulder, 3415 Colorado Ave., UCB 596, Boulder, 80309, CO, USA
- Computer Science, University of Colorado Boulder, 1111 Engineering Drive, UCB 430, Boulder, 80309, CO, USA
- Molecular, Cellular and Developmental Biology, University of Colorado Boulder, 1945 Colorado Ave, UCB 347, Boulder, 80309, CO, USA
| |
Collapse
|
8
|
Ng JWY, Felix JF, Olson DM. A novel approach to risk exposure and epigenetics-the use of multidimensional context to gain insights into the early origins of cardiometabolic and neurocognitive health. BMC Med 2023; 21:466. [PMID: 38012757 PMCID: PMC10683259 DOI: 10.1186/s12916-023-03168-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 11/09/2023] [Indexed: 11/29/2023] Open
Abstract
BACKGROUND Each mother-child dyad represents a unique combination of genetic and environmental factors. This constellation of variables impacts the expression of countless genes. Numerous studies have uncovered changes in DNA methylation (DNAm), a form of epigenetic regulation, in offspring related to maternal risk factors. How these changes work together to link maternal-child risks to childhood cardiometabolic and neurocognitive traits remains unknown. This question is a key research priority as such traits predispose to future non-communicable diseases (NCDs). We propose viewing risk and the genome through a multidimensional lens to identify common DNAm patterns shared among diverse risk profiles. METHODS We identified multifactorial Maternal Risk Profiles (MRPs) generated from population-based data (n = 15,454, Avon Longitudinal Study of Parents and Children (ALSPAC)). Using cord blood HumanMethylation450 BeadChip data, we identified genome-wide patterns of DNAm that co-vary with these MRPs. We tested the prospective relation of these DNAm patterns (n = 914) to future outcomes using decision tree analysis. We then tested the reproducibility of these patterns in (1) DNAm data at age 7 and 17 years within the same cohort (n = 973 and 974, respectively) and (2) cord DNAm in an independent cohort, the Generation R Study (n = 686). RESULTS We identified twenty MRP-related DNAm patterns at birth in ALSPAC. Four were prospectively related to cardiometabolic and/or neurocognitive childhood outcomes. These patterns were replicated in DNAm data from blood collected at later ages. Three of these patterns were externally validated in cord DNAm data in Generation R. Compared to previous literature, DNAm patterns exhibited novel spatial distribution across the genome that intersects with chromatin functional and tissue-specific signatures. CONCLUSIONS To our knowledge, we are the first to leverage multifactorial population-wide data to detect patterns of variability in DNAm. This context-based approach decreases biases stemming from overreliance on specific samples or variables. We discovered molecular patterns demonstrating prospective and replicable relations to complex traits. Moreover, results suggest that patterns harbour a genome-wide organisation specific to chromatin regulation and target tissues. These preliminary findings warrant further investigation to better reflect the reality of human context in molecular studies of NCDs.
Collapse
Affiliation(s)
- Jane W Y Ng
- Department of Pediatrics, Cummings School of Medicine, University of Calgary, 28 Oki Drive NW, Calgary, AB, T3B 6A8, Canada
| | - Janine F Felix
- The Generation F Study Group, Erasmus MC University Medical Center Rotterdam, Postbus, 2040, 3000 CA, Rotterdam, The Netherlands
- Department of Pediatrics, Erasmus MC University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - David M Olson
- Departments of Obstetrics and Gynecology, Physiology, and Pediatrics, Faculty of Medicine and Dentistry, University of Alberta, 220 HMRC, Edmonton, AB, T6G2S2, Canada.
| |
Collapse
|
9
|
Wang Q, Zhang J, Liu Z, Duan Y, Li C. Integrative approaches based on genomic techniques in the functional studies on enhancers. Brief Bioinform 2023; 25:bbad442. [PMID: 38048082 PMCID: PMC10694556 DOI: 10.1093/bib/bbad442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 10/22/2023] [Accepted: 11/08/2023] [Indexed: 12/05/2023] Open
Abstract
With the development of sequencing technology and the dramatic drop in sequencing cost, the functions of noncoding genes are being characterized in a wide variety of fields (e.g. biomedicine). Enhancers are noncoding DNA elements with vital transcription regulation functions. Tens of thousands of enhancers have been identified in the human genome; however, the location, function, target genes and regulatory mechanisms of most enhancers have not been elucidated thus far. As high-throughput sequencing techniques have leapt forwards, omics approaches have been extensively employed in enhancer research. Multidimensional genomic data integration enables the full exploration of the data and provides novel perspectives for screening, identification and characterization of the function and regulatory mechanisms of unknown enhancers. However, multidimensional genomic data are still difficult to integrate genome wide due to complex varieties, massive amounts, high rarity, etc. To facilitate the appropriate methods for studying enhancers with high efficacy, we delineate the principles, data processing modes and progress of various omics approaches to study enhancers and summarize the applications of traditional machine learning and deep learning in multi-omics integration in the enhancer field. In addition, the challenges encountered during the integration of multiple omics data are addressed. Overall, this review provides a comprehensive foundation for enhancer analysis.
Collapse
Affiliation(s)
- Qilin Wang
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Junyou Zhang
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Zhaoshuo Liu
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Yingying Duan
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Chunyan Li
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
- Key Laboratory of Big Data-Based Precision Medicine (Ministry of Industry and Information Technology), Beihang University, Beijing 100191, China
- Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Beihang University, Beijing 100191, China
| |
Collapse
|
10
|
Rapakoulia T, Lopez Ruiz De Vargas S, Omgba PA, Laupert V, Ulitsky I, Vingron M. CENTRE: a gradient boosting algorithm for Cell-type-specific ENhancer-Target pREdiction. Bioinformatics 2023; 39:btad687. [PMID: 37982748 PMCID: PMC10666202 DOI: 10.1093/bioinformatics/btad687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 10/11/2023] [Accepted: 11/17/2023] [Indexed: 11/21/2023] Open
Abstract
MOTIVATION Identifying target promoters of active enhancers is a crucial step for realizing gene regulation and deciphering phenotypes and diseases. Up to now, several computational methods were developed to predict enhancer gene interactions, but they require either many epigenomic and transcriptomic experimental assays to generate cell-type (CT)-specific predictions or a single experiment applied to a large cohort of CTs to extract correlations between activities of regulatory elements. Thus, inferring CT-specific enhancer gene interactions in unstudied or poorly annotated CTs becomes a laborious and costly task. RESULTS Here, we aim to infer CT-specific enhancer target interactions, using minimal experimental input. We introduce Cell-specific ENhancer Target pREdiction (CENTRE), a machine learning framework that predicts enhancer target interactions in a CT-specific manner, using only gene expression and ChIP-seq data for three histone modifications for the CT of interest. CENTRE exploits the wealth of available datasets and extracts cell-type agnostic statistics to complement the CT-specific information. CENTRE is thoroughly tested across many datasets and CTs and achieves equivalent or superior performance than existing algorithms that require massive experimental data. AVAILABILITY AND IMPLEMENTATION CENTRE's open-source code is available at GitHub via https://github.com/slrvv/CENTRE.
Collapse
Affiliation(s)
| | | | | | - Verena Laupert
- Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| | - Igor Ulitsky
- Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
- Department of Immunology and Regenerative Biology, Weizmann Institute of Science, Rehovot 76100, Israel
- Department of Molecular Neuroscience, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Martin Vingron
- Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany
| |
Collapse
|
11
|
Umarov R, Hon CC. Enhancer target prediction: state-of-the-art approaches and future prospects. Biochem Soc Trans 2023; 51:1975-1988. [PMID: 37830459 DOI: 10.1042/bst20230917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 10/02/2023] [Accepted: 10/02/2023] [Indexed: 10/14/2023]
Abstract
Enhancers are genomic regions that regulate gene transcription and are located far away from the transcription start sites of their target genes. Enhancers are highly enriched in disease-associated variants and thus deciphering the interactions between enhancers and genes is crucial to understanding the molecular basis of genetic predispositions to diseases. Experimental validations of enhancer targets can be laborious. Computational methods have thus emerged as a valuable alternative for studying enhancer-gene interactions. A variety of computational methods have been developed to predict enhancer targets by incorporating genomic features (e.g. conservation, distance, and sequence), epigenomic features (e.g. histone marks and chromatin contacts) and activity measurements (e.g. covariations of enhancer activity and gene expression). With the recent advances in genome perturbation and chromatin conformation capture technologies, data on experimentally validated enhancer targets are becoming available for supervised training of these methods and evaluation of their performance. In this review, we categorize enhancer target prediction methods based on their rationales and approaches. Then we discuss their merits and limitations and highlight the future directions for enhancer targets prediction.
Collapse
Affiliation(s)
- Ramzan Umarov
- RIKEN Centre for Integrative Medical Sciences, Yokohama RIKEN Institute, Yokohama, Japan
| | - Chung-Chau Hon
- RIKEN Centre for Integrative Medical Sciences, Yokohama RIKEN Institute, Yokohama, Japan
| |
Collapse
|
12
|
Li Z, Portillo-Ledesma S, Schlick T. Techniques for and challenges in reconstructing 3D genome structures from 2D chromosome conformation capture data. Curr Opin Cell Biol 2023; 83:102209. [PMID: 37506571 PMCID: PMC10529954 DOI: 10.1016/j.ceb.2023.102209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 06/07/2023] [Accepted: 06/26/2023] [Indexed: 07/30/2023]
Abstract
Chromosome conformation capture technologies that provide frequency information for contacts between genomic regions have been crucial for increasing our understanding of genome folding and regulation. However, such data do not provide direct evidence of the spatial 3D organization of chromatin. In this opinion article, we discuss the development and application of computational methods to reconstruct chromatin 3D structures from experimental 2D contact data, highlighting how such modeling provides biological insights and can suggest mechanisms anchored to experimental data. By applying different reconstruction methods to the same contact data, we illustrate some state-of-the-art of these techniques and discuss our gene resolution approach based on Brownian dynamics and Monte Carlo sampling.
Collapse
Affiliation(s)
- Zilong Li
- Department of Chemistry, New York University, 100 Washington Square East, Silver Building, New York, 10003, NY, USA; Simons Center for Computational Physical Chemistry, New York University, 24 Waverly Place, Silver Building, New York, NY, 10003, USA
| | - Stephanie Portillo-Ledesma
- Department of Chemistry, New York University, 100 Washington Square East, Silver Building, New York, 10003, NY, USA; Simons Center for Computational Physical Chemistry, New York University, 24 Waverly Place, Silver Building, New York, NY, 10003, USA
| | - Tamar Schlick
- Department of Chemistry, New York University, 100 Washington Square East, Silver Building, New York, 10003, NY, USA; Courant Institute of Mathematical Sciences, New York University, 251 Mercer St., New York, 10012, NY, USA; New York University-East China Normal University Center for Computational Chemistry, New York University Shanghai, Room 340, Geography Building, 3663 North Zhongshan Road, Shanghai, 200122, China; Simons Center for Computational Physical Chemistry, New York University, 24 Waverly Place, Silver Building, New York, NY, 10003, USA.
| |
Collapse
|
13
|
Rossini R, Kumar V, Mathelier A, Rognes T, Paulsen J. MoDLE: high-performance stochastic modeling of DNA loop extrusion interactions. Genome Biol 2022; 23:247. [PMID: 36451166 PMCID: PMC9710047 DOI: 10.1186/s13059-022-02815-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Accepted: 11/17/2022] [Indexed: 12/03/2022] Open
Abstract
DNA loop extrusion emerges as a key process establishing genome structure and function. We introduce MoDLE, a computational tool for fast, stochastic modeling of molecular contacts from DNA loop extrusion capable of simulating realistic contact patterns genome wide in a few minutes. MoDLE accurately simulates contact maps in concordance with existing molecular dynamics approaches and with Micro-C data and does so orders of magnitude faster than existing approaches. MoDLE runs efficiently on machines ranging from laptops to high performance computing clusters and opens up for exploratory and predictive modeling of 3D genome structure in a wide range of settings.
Collapse
Affiliation(s)
- Roberto Rossini
- Department of Biosciences, University of Oslo, 0316, Oslo, Norway
| | - Vipin Kumar
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318, Oslo, Norway
| | - Anthony Mathelier
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318, Oslo, Norway
| | - Torbjørn Rognes
- Centre for Bioinformatics, Department of Informatics, University of Oslo, 0316, Oslo, Norway
- Department of Microbiology, Oslo University Hospital, Rikshospitalet, 0424, Oslo, Norway
| | - Jonas Paulsen
- Department of Biosciences, University of Oslo, 0316, Oslo, Norway.
- Centre for Bioinformatics, Department of Informatics, University of Oslo, 0316, Oslo, Norway.
| |
Collapse
|
14
|
Lohia R, Fox N, Gillis J. A global high-density chromatin interaction network reveals functional long-range and trans-chromosomal relationships. Genome Biol 2022; 23:238. [PMID: 36352464 PMCID: PMC9647974 DOI: 10.1186/s13059-022-02790-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2022] [Accepted: 10/10/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Chromatin contacts are essential for gene-expression regulation; however, obtaining a high-resolution genome-wide chromatin contact map is still prohibitively expensive owing to large genome sizes and the quadratic scale of pairwise data. Chromosome conformation capture (3C)-based methods such as Hi-C have been extensively used to obtain chromatin contacts. However, since the sparsity of these maps increases with an increase in genomic distance between contacts, long-range or trans-chromatin contacts are especially challenging to sample. RESULTS Here, we create a high-density reference genome-wide chromatin contact map using a meta-analytic approach. We integrate 3600 human, 6700 mouse, and 500 fly Hi-C experiments to create species-specific meta-Hi-C chromatin contact maps with 304 billion, 193 billion, and 19 billion contacts in respective species. We validate that meta-Hi-C contact maps are uniquely powered to capture functional chromatin contacts in both cis and trans. We find that while individual dataset Hi-C networks are largely unable to predict any long-range coexpression (median 0.54 AUC), meta-Hi-C networks perform comparably in both cis and trans (0.65 AUC vs 0.64 AUC). Similarly, for long-range expression quantitative trait loci (eQTL), meta-Hi-C contacts outperform all individual Hi-C experiments, providing an improvement over the conventionally used linear genomic distance-based association. Assessing between species, we find patterns of chromatin contact conservation in both cis and trans and strong associations with coexpression even in species for which Hi-C data is lacking. CONCLUSIONS We have generated an integrated chromatin interaction network which complements a large number of methodological and analytic approaches focused on improved specificity or interpretation. This high-depth "super-experiment" is surprisingly powerful in capturing long-range functional relationships of chromatin interactions, which are now able to predict coexpression, eQTLs, and cross-species relationships. The meta-Hi-C networks are available at https://labshare.cshl.edu/shares/gillislab/resource/HiC/ .
Collapse
Affiliation(s)
- Ruchi Lohia
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, USA
| | - Nathan Fox
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, USA
| | - Jesse Gillis
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, USA
- Department of Physiology and Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Canada
| |
Collapse
|
15
|
Giacoman-Lozano M, Meléndez-Ramírez C, Martinez-Ledesma E, Cuevas-Diaz Duran R, Velasco I. Epigenetics of neural differentiation: Spotlight on enhancers. Front Cell Dev Biol 2022; 10:1001701. [PMID: 36313573 PMCID: PMC9606577 DOI: 10.3389/fcell.2022.1001701] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Accepted: 10/03/2022] [Indexed: 11/28/2022] Open
Abstract
Neural induction, both in vivo and in vitro, includes cellular and molecular changes that result in phenotypic specialization related to specific transcriptional patterns. These changes are achieved through the implementation of complex gene regulatory networks. Furthermore, these regulatory networks are influenced by epigenetic mechanisms that drive cell heterogeneity and cell-type specificity, in a controlled and complex manner. Epigenetic marks, such as DNA methylation and histone residue modifications, are highly dynamic and stage-specific during neurogenesis. Genome-wide assessment of these modifications has allowed the identification of distinct non-coding regulatory regions involved in neural cell differentiation, maturation, and plasticity. Enhancers are short DNA regulatory regions that bind transcription factors (TFs) and interact with gene promoters to increase transcriptional activity. They are of special interest in neuroscience because they are enriched in neurons and underlie the cell-type-specificity and dynamic gene expression profiles. Classification of the full epigenomic landscape of neural subtypes is important to better understand gene regulation in brain health and during diseases. Advances in novel next-generation high-throughput sequencing technologies, genome editing, Genome-wide association studies (GWAS), stem cell differentiation, and brain organoids are allowing researchers to study brain development and neurodegenerative diseases with an unprecedented resolution. Herein, we describe important epigenetic mechanisms related to neurogenesis in mammals. We focus on the potential roles of neural enhancers in neurogenesis, cell-fate commitment, and neuronal plasticity. We review recent findings on epigenetic regulatory mechanisms involved in neurogenesis and discuss how sequence variations within enhancers may be associated with genetic risk for neurological and psychiatric disorders.
Collapse
Affiliation(s)
- Mayela Giacoman-Lozano
- Tecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Monterrey, NL, Mexico
| | - César Meléndez-Ramírez
- Instituto de Fisiología Celular—Neurociencias, Universidad Nacional Autónoma de Mexico, Mexico City, Mexico
- Laboratorio de Reprogramación Celular, Instituto Nacional de Neurología y Neurocirugía “Manuel Velasco Suárez”, Mexico City, Mexico
| | - Emmanuel Martinez-Ledesma
- Tecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Monterrey, NL, Mexico
- Tecnologico de Monterrey, The Institute for Obesity Research, Monterrey, NL, Mexico
| | - Raquel Cuevas-Diaz Duran
- Tecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Monterrey, NL, Mexico
- *Correspondence: Raquel Cuevas-Diaz Duran, ; Iván Velasco,
| | - Iván Velasco
- Instituto de Fisiología Celular—Neurociencias, Universidad Nacional Autónoma de Mexico, Mexico City, Mexico
- Laboratorio de Reprogramación Celular, Instituto Nacional de Neurología y Neurocirugía “Manuel Velasco Suárez”, Mexico City, Mexico
- *Correspondence: Raquel Cuevas-Diaz Duran, ; Iván Velasco,
| |
Collapse
|
16
|
Biomedical Application of Identified Biomarkers Gene Expression Based Early Diagnosis and Detection in Cervical Cancer with Modified Probabilistic Neural Network. CONTRAST MEDIA & MOLECULAR IMAGING 2022; 2022:4946154. [PMID: 36134120 PMCID: PMC9482500 DOI: 10.1155/2022/4946154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Revised: 04/15/2022] [Accepted: 05/13/2022] [Indexed: 11/17/2022]
Abstract
Cervical squamous cell carcinoma (CSC) is expected to rise to become the fourth most prevalent cancer in women globally and to replace breast cancer as the top cause of death in women in the future years, according to the World Health Organization. According to the World Health Organization, developing countries are responsible for 86 percent of all cervical cancer cases globally in women aged 15 to 44 (WHO). Cancer mortality is associated with the largest amount of monotonous antecedent in low- and middle-income nations, while cancer mortality is associated with the least amount of monotonous antecedent in high-income countries. Cervical cancer is thought to be caused by aberrant proliferation of cells in the cervix that is capable of stealing or invading other human organs, according to current thinking. Cancer of the cerebral cell is the most prevalent kind of cancer in women. It is expected that cervical squamous cell carcinoma (CSC) will be the fourth most frequent cancer in the world and the main cause of death in women by the year 2050. Despite the fact that technology has improved tremendously since then, this is still the case. When compared to high-income countries, low- and middle-income countries have the highest consistent antecedent for cancer mortality, according to the World Cancer Research Fund. Cancerous growths of cells in the cervix, such as cervical cancer, are caused by cells that have the ability to steal from or invade auxiliary organs of the body, as is the case with cervical cancer. Although technological advances have been made in recent years, gene expression profiling continues to be a prominent approach in the investigation of cervical cancer. Since then, researchers have had the opportunity to examine a gene coexpression network, which has evolved into an exceptionally comprehensive technique for microarray research. This has helped them to get a better understanding of the human genome. When a specific biological issue is addressed, gene coexpression networks retain a considerable percentage of their once vast component of physiognomy, which was previously immense. When comparing the properties of genes in a population, it is well known that feature selection may be used to choose genes that outperform the rest of the genes in the population. There are several benefits to feature selection, and this is only one of them. Typically used gene selection approaches have been shown to be insufficient in acquiring the best potential sequence of genes for training purposes, and as a result, the accuracy of the classifier has likely suffered as a result of this. Recently, a considerable number of scientists have advocated for the use of optimization approaches in the process of gene selection, and this trend is expected to continue. A metaheuristic algorithm may be used to choose a suitable subset of genes, according to the preceding assertion, which is also consistent with the metaheuristic approach. A Modified Probabilistic Neural Network differs from other networks in that the underlying gene expression associated with DEGs and standard data in a Modified Probabilistic Neural Network is not uniformly distributed as it is in other networks (MPN). As previously said, selecting the most relevant genes or repeating genes is a vital step in the prediction process. It was this technique that was used in the research of cervical cancer. Since then, researchers have had the opportunity to examine a gene coexpression network, which has evolved into an exceptionally comprehensive technique for microarray research. This has helped them to get a better understanding of the human genome. When a specific biological issue is addressed, gene coexpression networks are able to preserve a previously major section of the face that had been lost. When comparing the properties of genes in a population, it is well known that feature selection may be used to choose genes that outperform the rest of the genes in the population. There are several benefits to feature selection, and this is only one of them. Typically used gene selection approaches have been shown to be insufficient in acquiring the best potential sequence of genes for training purposes, and as a result, the accuracy of the classifier has likely suffered as a result of this. In the field of gene selection, several scholars have argued in favor of the employment of optimization approaches. A metaheuristic algorithm may be used to choose a suitable subset of genes, according to the preceding assertion, which is also consistent with the metaheuristic approach. It was discovered that Modified Probabilistic Neural Networks (MPNs) had a different distribution of gene expression linked with DEGs and normal data than other networks, which had not been previously seen. This was previously unknown. Following what has been said before, selecting the most appropriate or repeated genes is a critical task throughout the prediction process.
Collapse
|
17
|
Fan Y, Peng B. StackEPI: identification of cell line-specific enhancer-promoter interactions based on stacking ensemble learning. BMC Bioinformatics 2022; 23:272. [PMID: 35820811 PMCID: PMC9277947 DOI: 10.1186/s12859-022-04821-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 07/01/2022] [Indexed: 11/10/2022] Open
Abstract
Background Understanding the regulatory role of enhancer–promoter interactions (EPIs) on specific gene expression in cells contributes to the understanding of gene regulation, cell differentiation, etc., and its identification has been a challenging task. On the one hand, using traditional wet experimental methods to identify EPIs often means a lot of human labor and time costs. On the other hand, although the currently proposed computational methods have good recognition effects, they generally require a long training time. Results In this study, we studied the EPIs of six human cell lines and designed a cell line-specific EPIs prediction method based on a stacking ensemble learning strategy, which has better prediction performance and faster training speed, called StackEPI. Specifically, by combining different encoding schemes and machine learning methods, our prediction method can extract the cell line-specific effective information of enhancer and promoter gene sequences comprehensively and in many directions, and make accurate recognition of cell line-specific EPIs. Ultimately, the source code to implement StackEPI and experimental data involved in the experiment are available at https://github.com/20032303092/StackEPI.git. Conclusions The comparison results show that our model can deliver better performance on the problem of identifying cell line-specific EPIs and outperform other state-of-the-art models. In addition, our model also has a more efficient computation speed. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04821-9.
Collapse
Affiliation(s)
- Yongxian Fan
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, 541004, China.
| | - Binchao Peng
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, 541004, China
| |
Collapse
|
18
|
Tang L, Zhong Z, Lin Y, Yang Y, Wang J, Martin J, Li M. EPIXplorer: A web server for prediction, analysis and visualization of enhancer-promoter interactions. Nucleic Acids Res 2022; 50:W290-W297. [PMID: 35639508 PMCID: PMC9252822 DOI: 10.1093/nar/gkac397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 05/01/2022] [Accepted: 05/05/2022] [Indexed: 11/13/2022] Open
Abstract
Long distance enhancers can physically interact with promoters to regulate gene expression through formation of enhancer-promoter (E-P) interactions. Identification of E-P interactions is also important for profound understanding of normal developmental and disease-associated risk variants. Although the state-of-art predictive computation methods facilitate the identification of E-P interactions to a certain extent, currently there is no efficient method that can meet various requirements of usage. Here we developed EPIXplorer, a user-friendly web server for efficient prediction, analysis and visualization of E-P interactions. EPIXplorer integrates 9 robust predictive algorithms, supports multiple types of 3D contact data and multi-omics data as input. The output from EPIXplorer is scored, fully annotated by regulatory elements and risk single-nucleotide polymorphisms (SNPs). In addition, the Visualization and Downstream module provide further functional analysis, all the output files and high-quality images are available for download. Together, EPIXplorer provides a user-friendly interface to predict the E-P interactions in an acceptable time, as well as understand how the genome-wide association study (GWAS) variants influence disease pathology by altering DNA looping between enhancers and the target gene promoters. EPIXplorer is available at https://www.csuligroup.com/EPIXplorer.
Collapse
Affiliation(s)
- Li Tang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Zhizhou Zhong
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yisheng Lin
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yifei Yang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jun Wang
- Department of Pediatrics, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - James F Martin
- Department of Molecular Physiology and Biophysics, Baylor College of Medicine, Houston, TX 77030, USA
- Cardiovascular Research Institute, Baylor College of Medicine, Houston, TX 77030, USA
- Texas Heart Institute, Houston, TX 77030, USA
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
19
|
Piecyk RS, Schlegel L, Johannes F. Predicting 3D chromatin interactions from DNA sequence using Deep Learning. Comput Struct Biotechnol J 2022; 20:3439-3448. [PMID: 35832620 PMCID: PMC9271978 DOI: 10.1016/j.csbj.2022.06.047] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 06/21/2022] [Accepted: 06/21/2022] [Indexed: 11/22/2022] Open
Abstract
Gene regulation in eukaryotes is profoundly shaped by the 3D organization of chromatin within the cell nucleus. Distal regulatory interactions between enhancers and their target genes are widespread and many causal loci underlying heritable agricultural or clinical traits have been mapped to distal cis-regulatory elements. Dissecting the sequence features that mediate such distal interactions is key to understanding their underlying biology. Deep Learning (DL) models coupled with genome-wide 3C-based sequencing data have emerged as powerful tools to infer the DNA sequence grammar underlying such distal interactions. In this review we show that most DL models have remarkably high prediction accuracy, which indicates that DNA sequence features are important determinants of chromatin looping. However, DL model training has so far been limited to a small set of human cell lines, raising questions about the generalization of these predictions to other tissue-types and species. Furthermore, we find that the model architecture seems less relevant for model performance than the training strategy and the data preparation step. Transfer learning, coupled with functionally curated interactions, appear to be the most promising approach to learn cell-type specific and possibly species- specific sequence features in future applications.
Collapse
Affiliation(s)
- Robert S. Piecyk
- Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany
| | - Luca Schlegel
- Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany
| | - Frank Johannes
- Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany
- TUM Institute for Advanced Study, Garching, Germany
| |
Collapse
|
20
|
Mulero Hernández J, Fernández-Breis JT. Analysis of the landscape of human enhancer sequences in biological databases. Comput Struct Biotechnol J 2022; 20:2728-2744. [PMID: 35685360 PMCID: PMC9168495 DOI: 10.1016/j.csbj.2022.05.045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 05/20/2022] [Accepted: 05/21/2022] [Indexed: 12/01/2022] Open
Abstract
The process of gene regulation extends as a network in which both genetic sequences and proteins are involved. The levels of regulation and the mechanisms involved are multiple. Transcription is the main control mechanism for most genes, being the downstream steps responsible for refining the transcription patterns. In turn, gene transcription is mainly controlled by regulatory events that occur at promoters and enhancers. Several studies are focused on analyzing the contribution of enhancers in the development of diseases and their possible use as therapeutic targets. The study of regulatory elements has advanced rapidly in recent years with the development and use of next generation sequencing techniques. All this information has generated a large volume of information that has been transferred to a growing number of public repositories that store this information. In this article, we analyze the content of those public repositories that contain information about human enhancers with the aim of detecting whether the knowledge generated by scientific research is contained in those databases in a way that could be computationally exploited. The analysis will be based on three main aspects identified in the literature: types of enhancers, type of evidence about the enhancers, and methods for detecting enhancer-promoter interactions. Our results show that no single database facilitates the optimal exploitation of enhancer data, most types of enhancers are not represented in the databases and there is need for a standardized model for enhancers. We have identified major gaps and challenges for the computational exploitation of enhancer data.
Collapse
Affiliation(s)
- Juan Mulero Hernández
- Dept. Informática y Sistemas, Universidad de Murcia, CEIR Campus Mare Nostrum, IMIB-Arrixaca, Spain
| | | |
Collapse
|
21
|
Nollmann M, Bennabi I, Götz M, Gregor T. The Impact of Space and Time on the Functional Output of the Genome. Cold Spring Harb Perspect Biol 2022; 14:a040378. [PMID: 34230036 PMCID: PMC8733053 DOI: 10.1101/cshperspect.a040378] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Over the past two decades, it has become clear that the multiscale spatial and temporal organization of the genome has important implications for nuclear function. This review centers on insights gained from recent advances in light microscopy on our understanding of transcription. We discuss spatial and temporal aspects that shape nuclear order and their consequences on regulatory components, focusing on genomic scales most relevant to function. The emerging picture is that spatiotemporal constraints increase the complexity in transcriptional regulation, highlighting new challenges, such as uncertainty about how information travels from molecular factors through the genome and space to generate a functional output.
Collapse
Affiliation(s)
- Marcelo Nollmann
- Centre de Biologie Structurale, CNRS UMR5048, INSERM U1054, Univ Montpellier, 34090 Montpellier, France
| | - Isma Bennabi
- Department of Stem Cell and Developmental Biology, CNRS UMR3738, Institut Pasteur, 75015 Paris, France
| | - Markus Götz
- Centre de Biologie Structurale, CNRS UMR5048, INSERM U1054, Univ Montpellier, 34090 Montpellier, France
| | - Thomas Gregor
- Department of Stem Cell and Developmental Biology, CNRS UMR3738, Institut Pasteur, 75015 Paris, France
- Joseph Henry Laboratory of Physics & Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA
| |
Collapse
|
22
|
Deep-Learning-Based Cancer Profiles Classification Using Gene Expression Data Profile. JOURNAL OF HEALTHCARE ENGINEERING 2022; 2022:4715998. [PMID: 35035840 PMCID: PMC8759849 DOI: 10.1155/2022/4715998] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Accepted: 11/17/2021] [Indexed: 12/14/2022]
Abstract
The quantity of data required to give a valid analysis grows exponentially as machine learning dimensionality increases. In a single experiment, microarrays or gene expression profiling assesses and determines gene expression levels and patterns in various cell types or tissues. The advent of DNA microarray technology has enabled simultaneous intensive care of hundreds of gene expressions on a single chip, advancing cancer categorization. The most challenging aspect of categorization is working out many information points from many sources. The proposed approach uses microarray data to train deep learning algorithms on extracted features and then uses the Latent Feature Selection Technique to reduce classification time and increase accuracy. The feature-selection-based techniques will pick the important genes before classifying microarray data for cancer prediction and diagnosis. These methods improve classification accuracy by removing duplicate and superfluous information. The Artificial Bee Colony (ABC) technique of feature selection was proposed in this research using bone marrow PC gene expression data. The ABC algorithm, based on swarm intelligence, has been proposed for gene identification. The ABC has been used here for feature selection that generates a subset of features and every feature produced by the spectators, making this a wrapper-based feature selection system. This method's main goal is to choose the fewest genes that are critical to PC performance while also increasing prediction accuracy. Convolutional Neural Networks were used to classify tumors without labelling them. Lung, kidney, and brain cancer datasets were used in the procedure's training and testing stages. Using the cross-validation technique of k-fold methodology, the Convolutional Neural Network has an accuracy rate of 96.43%. The suggested research includes techniques for preprocessing and modifying gene expression data to enhance future cancer detection accuracy.
Collapse
|
23
|
Deng S, Feng Y, Pauklin S. 3D chromatin architecture and transcription regulation in cancer. J Hematol Oncol 2022; 15:49. [PMID: 35509102 PMCID: PMC9069733 DOI: 10.1186/s13045-022-01271-x] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 04/21/2022] [Indexed: 12/18/2022] Open
Abstract
Chromatin has distinct three-dimensional (3D) architectures important in key biological processes, such as cell cycle, replication, differentiation, and transcription regulation. In turn, aberrant 3D structures play a vital role in developing abnormalities and diseases such as cancer. This review discusses key 3D chromatin structures (topologically associating domain, lamina-associated domain, and enhancer-promoter interactions) and corresponding structural protein elements mediating 3D chromatin interactions [CCCTC-binding factor, polycomb group protein, cohesin, and Brother of the Regulator of Imprinted Sites (BORIS) protein] with a highlight of their associations with cancer. We also summarise the recent development of technologies and bioinformatics approaches to study the 3D chromatin interactions in gene expression regulation, including crosslinking and proximity ligation methods in the bulk cell population (ChIA-PET and HiChIP) or single-molecule resolution (ChIA-drop), and methods other than proximity ligation, such as GAM, SPRITE, and super-resolution microscopy techniques.
Collapse
Affiliation(s)
- Siwei Deng
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Botnar Research Centre, University of Oxford, Old Road, Headington, Oxford, OX3 7LD, UK
| | - Yuliang Feng
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Botnar Research Centre, University of Oxford, Old Road, Headington, Oxford, OX3 7LD, UK
| | - Siim Pauklin
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Botnar Research Centre, University of Oxford, Old Road, Headington, Oxford, OX3 7LD, UK.
| |
Collapse
|
24
|
Comparative characterization of 3D chromatin organization in triple-negative breast cancers. Exp Mol Med 2022; 54:585-600. [PMID: 35513575 PMCID: PMC9166756 DOI: 10.1038/s12276-022-00768-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2021] [Revised: 01/18/2022] [Accepted: 02/09/2022] [Indexed: 12/02/2022] Open
Abstract
Triple-negative breast cancer (TNBC) is a malignant cancer subtype with a high risk of recurrence and an aggressive phenotype compared to other breast cancer subtypes. Although many breast cancer studies conducted to date have investigated genetic variations and differential target gene expression, how 3D chromatin architectures are reorganized in TNBC has been poorly elucidated. Here, using in situ Hi-C technology, we characterized the 3D chromatin organization in cells representing five distinct subtypes of breast cancer (including TNBC) compared to that in normal cells. We found that the global and local 3D architectures were severely disrupted in breast cancer. TNBC cell lines (especially BT549 cells) showed the most dramatic changes relative to normal cells. Importantly, we detected CTCF-dependent TNBC-susceptible losses/gains of 3D chromatin organization and found that these changes were strongly associated with perturbed chromatin accessibility and transcriptional dysregulation. In TNBC tissue, 3D chromatin disorganization was also observed relative to the 3D chromatin organization in normal tissues. We observed that the perturbed local 3D architectures found in TNBC cells were partially conserved in TNBC tissues. Finally, we discovered distinct tissue-specific chromatin loops by comparing normal and TNBC tissues. In this study, we elucidated the characteristics of the 3D chromatin organization in breast cancer relative to normal cells/tissues at multiple scales and identified associations between disrupted structures and various epigenetic features and transcriptomes. Collectively, our findings reveal important 3D chromatin structural features for future diagnostic and therapeutic studies of TNBC. The 3D architecture of the genome is dramatically altered in an aggressive form of breast cancer, leading to changes in the regulation of gene expression that can fuel tumor growth. A team from South Korea, led by Hyeong-Gon Moon of Seoul National University College of Medicine and Daeyoup Lee of the Korea Advanced Institute of Science and Technology, Daejeon, detailed how chromosomes are positioned and folded within the nucleus of cell liness from five different subtypes of breast cancer. They found that triple-negative breast cancers displayed the most extreme reorganization of their genomes, a pattern also observed in biopsy tissues taken from patients with this subtype of cancer. Knowledge of these conformational changes could inform future efforts to develop therapies and diagnostics for patients with triple-negative breast tumors.
Collapse
|
25
|
Kumar S, Kaur S, Seem K, Kumar S, Mohapatra T. Understanding 3D Genome Organization and Its Effect on Transcriptional Gene Regulation Under Environmental Stress in Plant: A Chromatin Perspective. Front Cell Dev Biol 2021; 9:774719. [PMID: 34957106 PMCID: PMC8692796 DOI: 10.3389/fcell.2021.774719] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2021] [Accepted: 11/23/2021] [Indexed: 01/17/2023] Open
Abstract
The genome of a eukaryotic organism is comprised of a supra-molecular complex of chromatin fibers and intricately folded three-dimensional (3D) structures. Chromosomal interactions and topological changes in response to the developmental and/or environmental stimuli affect gene expression. Chromatin architecture plays important roles in DNA replication, gene expression, and genome integrity. Higher-order chromatin organizations like chromosome territories (CTs), A/B compartments, topologically associating domains (TADs), and chromatin loops vary among cells, tissues, and species depending on the developmental stage and/or environmental conditions (4D genomics). Every chromosome occupies a separate territory in the interphase nucleus and forms the top layer of hierarchical structure (CTs) in most of the eukaryotes. While the A and B compartments are associated with active (euchromatic) and inactive (heterochromatic) chromatin, respectively, having well-defined genomic/epigenomic features, TADs are the structural units of chromatin. Chromatin architecture like TADs as well as the local interactions between promoter and regulatory elements correlates with the chromatin activity, which alters during environmental stresses due to relocalization of the architectural proteins. Moreover, chromatin looping brings the gene and regulatory elements in close proximity for interactions. The intricate relationship between nucleotide sequence and chromatin architecture requires a more comprehensive understanding to unravel the genome organization and genetic plasticity. During the last decade, advances in chromatin conformation capture techniques for unravelling 3D genome organizations have improved our understanding of genome biology. However, the recent advances, such as Hi-C and ChIA-PET, have substantially increased the resolution, throughput as well our interest in analysing genome organizations. The present review provides an overview of the historical and contemporary perspectives of chromosome conformation capture technologies, their applications in functional genomics, and the constraints in predicting 3D genome organization. We also discuss the future perspectives of understanding high-order chromatin organizations in deciphering transcriptional regulation of gene expression under environmental stress (4D genomics). These might help design the climate-smart crop to meet the ever-growing demands of food, feed, and fodder.
Collapse
Affiliation(s)
- Suresh Kumar
- Division of Biochemistry, ICAR-Indian Agricultural Research Institute, New Delhi, India
| | - Simardeep Kaur
- Division of Biochemistry, ICAR-Indian Agricultural Research Institute, New Delhi, India
| | - Karishma Seem
- Division of Biochemistry, ICAR-Indian Agricultural Research Institute, New Delhi, India
| | | | | |
Collapse
|
26
|
Yi X, Zheng Z, Xu H, Zhou Y, Huang D, Wang J, Feng X, Zhao K, Fan X, Zhang S, Dong X, Wang Z, Shen Y, Cheng H, Shi L, Li MJ. Interrogating cell type-specific cooperation of transcriptional regulators in 3D chromatin. iScience 2021; 24:103468. [PMID: 34888502 PMCID: PMC8634045 DOI: 10.1016/j.isci.2021.103468] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 09/23/2021] [Accepted: 11/12/2021] [Indexed: 12/14/2022] Open
Abstract
Context-specific activities of transcription regulators (TRs) in the nucleus modulate spatiotemporal gene expression precisely. Using the largest ChIP-seq data and chromatin loops in the human K562 cell line, we initially interrogated TR cooperation in 3D chromatin via a graphical model and revealed many known and novel TRs manipulating context-specific pathways. To explore TR cooperation across broad tissue/cell types, we systematically leveraged large-scale open chromatin profiles, computational footprinting, and high-resolution chromatin interactions to investigate tissue/cell type-specific TR cooperation. We first delineated a landscape of TR cooperation across 40 human tissue/cell types. Network modularity analyses uncovered the commonality and specificity of TR cooperation in different conditions. We also demonstrated that TR cooperation information can better interpret the disease-causal variants identified by genome-wide association studies and recapitulate cell states during neural development. Our study characterizes shared and unique patterns of TR cooperation associated with the cell type specificity of gene regulation in 3D chromatin. Computational inference of transcriptional regulator (TR) cooperation in 3D chromatin A landscape of 3D TR cooperation across 40 human tissue/cell types TR cooperation can better interpret the disease-causal variants identified by GWAS Cooperation of certain TRs shapes context-specific gene regulation in cell development
Collapse
Affiliation(s)
- Xianfu Yi
- School of Biomedical Engineering and Technology, Tianjin Medical University, Tianjin 300070, China.,Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China
| | - Zhanye Zheng
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Hang Xu
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China
| | - Yao Zhou
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Dandan Huang
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Jianhua Wang
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Xiangling Feng
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Ke Zhao
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Xutong Fan
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Shijie Zhang
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Xiaobao Dong
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Genetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Zhao Wang
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Yujun Shen
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Hui Cheng
- State Key Laboratory of Experimental Hematology, Chinese Academy of Medical Sciences, Tianjin 300070, China
| | - Lei Shi
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Mulin Jun Li
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China.,Department of Epidemiology and Biostatistics, Tianjin Key Laboratory of Molecular Cancer Epidemiology, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| |
Collapse
|
27
|
Salviato E, Djordjilović V, Hariprakash JM, Tagliaferri I, Pal K, Ferrari F. Leveraging three-dimensional chromatin architecture for effective reconstruction of enhancer-target gene regulatory interactions. Nucleic Acids Res 2021; 49:e97. [PMID: 34197622 PMCID: PMC8464068 DOI: 10.1093/nar/gkab547] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Revised: 06/07/2021] [Accepted: 06/17/2021] [Indexed: 12/23/2022] Open
Abstract
A growing amount of evidence in literature suggests that germline sequence variants and somatic mutations in non-coding distal regulatory elements may be crucial for defining disease risk and prognostic stratification of patients, in genetic disorders as well as in cancer. Their functional interpretation is challenging because genome-wide enhancer-target gene (ETG) pairing is an open problem in genomics. The solutions proposed so far do not account for the hierarchy of structural domains which define chromatin three-dimensional (3D) architecture. Here we introduce a change of perspective based on the definition of multi-scale structural chromatin domains, integrated in a statistical framework to define ETG pairs. In this work (i) we develop a computational and statistical framework to reconstruct a comprehensive map of ETG pairs leveraging functional genomics data; (ii) we demonstrate that the incorporation of chromatin 3D architecture information improves ETG pairing accuracy and (iii) we use multiple experimental datasets to extensively benchmark our method against previous solutions for the genome-wide reconstruction of ETG pairs. This solution will facilitate the annotation and interpretation of sequence variants in distal non-coding regulatory elements. We expect this to be especially helpful in clinically oriented applications of whole genome sequencing in cancer and undiagnosed genetic diseases research.
Collapse
Affiliation(s)
- Elisa Salviato
- IFOM, the FIRC Institute of Molecular Oncology, Milan 20139, Italy
| | - Vera Djordjilović
- Department of Economics, Ca’ Foscari University of Venice, Venice 30100, Italy
| | | | | | - Koustav Pal
- IFOM, the FIRC Institute of Molecular Oncology, Milan 20139, Italy
| | - Francesco Ferrari
- IFOM, the FIRC Institute of Molecular Oncology, Milan 20139, Italy
- Institute of Molecular Genetics “Luigi Luca Cavalli-Sforza”, National Research Council, Pavia 27100, Italy
| |
Collapse
|
28
|
Zhang M, Hu Y, Zhu M. EPIsHilbert: Prediction of Enhancer-Promoter Interactions via Hilbert Curve Encoding and Transfer Learning. Genes (Basel) 2021; 12:genes12091385. [PMID: 34573367 PMCID: PMC8472018 DOI: 10.3390/genes12091385] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 08/31/2021] [Accepted: 09/01/2021] [Indexed: 12/19/2022] Open
Abstract
Enhancer-promoter interactions (EPIs) play a significant role in the regulation of gene transcription. However, enhancers may not necessarily interact with the closest promoters, but with distant promoters via chromatin looping. Considering the spatial position relationship between enhancers and their target promoters is important for predicting EPIs. Most existing methods only consider sequence information regardless of spatial information. On the other hand, recent computational methods lack generalization capability across different cell line datasets. In this paper, we propose EPIsHilbert, which uses Hilbert curve encoding and two transfer learning approaches. Hilbert curve encoding can preserve the spatial position information between enhancers and promoters. Additionally, we use visualization techniques to explore important sequence fragments that have a high impact on EPIs and the spatial relationships between them. Transfer learning can improve prediction performance across cell lines. In order to further prove the effectiveness of transfer learning, we analyze the sequence coincidence of different cell lines. Experimental results demonstrate that EPIsHilbert is a state-of-the-art model that is superior to most of the existing methods both in specific cell lines and cross cell lines.
Collapse
|
29
|
Boltsis I, Grosveld F, Giraud G, Kolovos P. Chromatin Conformation in Development and Disease. Front Cell Dev Biol 2021; 9:723859. [PMID: 34422840 PMCID: PMC8371409 DOI: 10.3389/fcell.2021.723859] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Accepted: 07/16/2021] [Indexed: 01/23/2023] Open
Abstract
Chromatin domains and loops are important elements of chromatin structure and dynamics, but much remains to be learned about their exact biological role and nature. Topological associated domains and functional loops are key to gene expression and hold the answer to many questions regarding developmental decisions and diseases. Here, we discuss new findings, which have linked chromatin conformation with development, differentiation and diseases and hypothesized on various models while integrating all recent findings on how chromatin architecture affects gene expression during development, evolution and disease.
Collapse
Affiliation(s)
- Ilias Boltsis
- Department of Cell Biology, Erasmus Medical Centre, Rotterdam, Netherlands
| | - Frank Grosveld
- Department of Cell Biology, Erasmus Medical Centre, Rotterdam, Netherlands
| | - Guillaume Giraud
- Department of Cell Biology, Erasmus Medical Centre, Rotterdam, Netherlands
- Cancer Research Center of Lyon – INSERM U1052, Lyon, France
| | - Petros Kolovos
- Department of Molecular Biology and Genetics, Democritus University of Thrace, Alexandroupolis, Greece
| |
Collapse
|
30
|
Chyr J, Zhang Z, Chen X, Zhou X. PredTAD: A machine learning framework that models 3D chromatin organization alterations leading to oncogene dysregulation in breast cancer cell lines. Comput Struct Biotechnol J 2021; 19:2870-2880. [PMID: 34093998 PMCID: PMC8142020 DOI: 10.1016/j.csbj.2021.05.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2021] [Revised: 05/04/2021] [Accepted: 05/04/2021] [Indexed: 10/26/2022] Open
Abstract
Topologically associating domains, or TADs, play important roles in genome organization and gene regulation; however, they are often altered in diseases. High-throughput chromatin conformation capturing assays, such as Hi-C, can capture domains of increased interactions, and TADs and boundaries can be identified using well-established analytical tools. However, generating Hi-C data is expensive. In our study, we addressed the relationship between multi-omics data and higher-order chromatin structures using a newly developed machine-learning model called PredTAD. Our tool uses already-available and cost-effective datatypes such as transcription factor and histone modification ChIPseq data. Specifically, PredTAD utilizes both epigenetic and genetic features as well as neighboring information to classify the entire human genome as boundary or non-boundary regions. Our tool can predict boundary changes between normal and breast cancer genomes. Among the most important features for predicting boundary alterations were CTCF, subunits of cohesin (RAD21 and SMC3), and chromosome number, suggesting their roles in conserved and dynamic boundaries formation. Upon further analysis, we observed that genes near altered TAD boundaries were found to be involved in several important breast cancer signaling pathways such as Ras, Jak-STAT, and estrogen signaling pathways. We also discovered a TAD boundary alteration that contributes to RET oncogene overexpression. PredTAD can also successfully predict TAD boundary changes in other conditions and diseases. In conclusion, our newly developed machine learning tool allowed for a more complete understanding of the dynamic 3D chromatin structures involved in signaling pathway activation, altered gene expression, and disease state in breast cancer cells.
Collapse
Affiliation(s)
- Jacqueline Chyr
- School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX 77054, USA
| | - Zhigang Zhang
- School of Information Management and Statistics, Hubei University of Economics, Wuhan, Hubei 430205 China
| | - Xi Chen
- School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX 77054, USA
| | - Xiaobo Zhou
- School of Biomedical Informatics, University of Texas Health Science Center, Houston, TX 77054, USA
| |
Collapse
|
31
|
Abstract
Single-cell sequencing-based methods for profiling gene transcript levels have revealed substantial heterogeneity in expression levels among morphologically indistinguishable cells. This variability has important functional implications for tissue biology and disease states such as cancer. Mapping of epigenomic information such as chromatin accessibility, nucleosome positioning, histone tail modifications and enhancer-promoter interactions in both bulk-cell and single-cell samples has shown that these characteristics of chromatin state contribute to expression or repression of associated genes. Advances in single-cell epigenomic profiling methods are enabling high-resolution mapping of chromatin states in individual cells. Recent studies using these techniques provide evidence that variations in different aspects of chromatin organization collectively define gene expression heterogeneity among otherwise highly similar cells.
Collapse
Affiliation(s)
- Benjamin Carter
- Laboratory of Epigenome Biology, Systems Biology Center, NHLBI, NIH, Bethesda, MD, USA.
| | - Keji Zhao
- Laboratory of Epigenome Biology, Systems Biology Center, NHLBI, NIH, Bethesda, MD, USA.
| |
Collapse
|
32
|
Belokopytova P, Fishman V. Predicting Genome Architecture: Challenges and Solutions. Front Genet 2021; 11:617202. [PMID: 33552135 PMCID: PMC7862721 DOI: 10.3389/fgene.2020.617202] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Accepted: 12/15/2020] [Indexed: 12/22/2022] Open
Abstract
Genome architecture plays a pivotal role in gene regulation. The use of high-throughput methods for chromatin profiling and 3-D interaction mapping provide rich experimental data sets describing genome organization and dynamics. These data challenge development of new models and algorithms connecting genome architecture with epigenetic marks. In this review, we describe how chromatin architecture could be reconstructed from epigenetic data using biophysical or statistical approaches. We discuss the applicability and limitations of these methods for understanding the mechanisms of chromatin organization. We also highlight the emergence of new predictive approaches for scoring effects of structural variations in human cells.
Collapse
Affiliation(s)
- Polina Belokopytova
- Natural Sciences Department, Novosibirsk State University, Novosibirsk, Russia
- Institute of Cytology and Genetics Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk, Russia
| | - Veniamin Fishman
- Natural Sciences Department, Novosibirsk State University, Novosibirsk, Russia
- Institute of Cytology and Genetics Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk, Russia
| |
Collapse
|
33
|
Tao H, Li H, Xu K, Hong H, Jiang S, Du G, Wang J, Sun Y, Huang X, Ding Y, Li F, Zheng X, Chen H, Bo X. Computational methods for the prediction of chromatin interaction and organization using sequence and epigenomic profiles. Brief Bioinform 2021; 22:6102668. [PMID: 33454752 PMCID: PMC8424394 DOI: 10.1093/bib/bbaa405] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 11/26/2020] [Accepted: 12/10/2020] [Indexed: 12/14/2022] Open
Abstract
The exploration of three-dimensional chromatin interaction and organization provides insight into mechanisms underlying gene regulation, cell differentiation and disease development. Advances in chromosome conformation capture technologies, such as high-throughput chromosome conformation capture (Hi-C) and chromatin interaction analysis by paired-end tag (ChIA-PET), have enabled the exploration of chromatin interaction and organization. However, high-resolution Hi-C and ChIA-PET data are only available for a limited number of cell lines, and their acquisition is costly, time consuming, laborious and affected by theoretical limitations. Increasing evidence shows that DNA sequence and epigenomic features are informative predictors of regulatory interaction and chromatin architecture. Based on these features, numerous computational methods have been developed for the prediction of chromatin interaction and organization, whereas they are not extensively applied in biomedical study. A systematical study to summarize and evaluate such methods is still needed to facilitate their application. Here, we summarize 48 computational methods for the prediction of chromatin interaction and organization using sequence and epigenomic profiles, categorize them and compare their performance. Besides, we provide a comprehensive guideline for the selection of suitable methods to predict chromatin interaction and organization based on available data and biological question of interest.
Collapse
Affiliation(s)
- Huan Tao
- Beijing Institute of Radiation Medicine
| | - Hao Li
- Beijing Institute of Radiation Medicine
| | - Kang Xu
- Beijing Institute of Radiation Medicine
| | - Hao Hong
- Beijing Institute of Radiation Medicine, Department of Biotechnology
| | - Shuai Jiang
- Beijing Institute of Radiation Medicine, Department of Biotechnology
| | - Guifang Du
- Beijing Institute of Radiation Medicine, Department of Biotechnology
| | | | - Yu Sun
- Beijing Institute of Radiation Medicine, Department of Biotechnology
| | - Xin Huang
- Beijing Institute of Radiation Medicine, Department of Biotechnology
| | - Yang Ding
- Beijing Institute of Radiation Medicine
| | - Fei Li
- Chinese Academy of Sciences, Department of Computer Network Information Center
| | | | | | | |
Collapse
|
34
|
Baur B, Shin J, Zhang S, Roy S. Data integration for inferring context-specific gene regulatory networks. CURRENT OPINION IN SYSTEMS BIOLOGY 2020; 23:38-46. [PMID: 33225112 PMCID: PMC7676633 DOI: 10.1016/j.coisb.2020.09.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Transcriptional regulatory networks control context-specific gene expression patterns and play important roles in normal and disease processes. Advances in genomics are rapidly increasing our ability to measure different components of the regulation machinery at the single-cell and bulk population level. An important challenge is to combine different types of regulatory genomic measurements to construct a more complete picture of gene regulatory networks across different disease, environmental, and developmental contexts. In this review, we focus on recent computational methods that integrate regulatory genomic data sets to infer context specificity and dynamics in regulatory networks.
Collapse
Affiliation(s)
- Brittany Baur
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, 53715, USA
| | - Junha Shin
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, 53715, USA
| | - Shilu Zhang
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, 53715, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, 53715, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53715, USA
| |
Collapse
|