1
|
Agarwal V, Inoue F, Schubach M, Penzar D, Martin BK, Dash PM, Keukeleire P, Zhang Z, Sohota A, Zhao J, Georgakopoulos-Soares I, Noble WS, Yardımcı GG, Kulakovskiy IV, Kircher M, Shendure J, Ahituv N. Massively parallel characterization of transcriptional regulatory elements. Nature 2025:10.1038/s41586-024-08430-9. [PMID: 39814889 DOI: 10.1038/s41586-024-08430-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Accepted: 11/20/2024] [Indexed: 01/18/2025]
Abstract
The human genome contains millions of candidate cis-regulatory elements (cCREs) with cell-type-specific activities that shape both health and many disease states1. However, we lack a functional understanding of the sequence features that control the activity and cell-type-specific features of these cCREs. Here we used lentivirus-based massively parallel reporter assays (lentiMPRAs) to test the regulatory activity of more than 680,000 sequences, representing an extensive set of annotated cCREs among three cell types (HepG2, K562 and WTC11), and found that 41.7% of these sequences were active. By testing sequences in both orientations, we find promoters to have strand-orientation biases and their 200-nucleotide cores to function as non-cell-type-specific 'on switches' that provide similar expression levels to their associated gene. By contrast, enhancers have weaker orientation biases, but increased tissue-specific characteristics. Utilizing our lentiMPRA data, we develop sequence-based models to predict cCRE function and variant effects with high accuracy, delineate regulatory motifs and model their combinatorial effects. Testing a lentiMPRA library encompassing 60,000 cCREs in all three cell types further identified factors that determine cell-type specificity. Collectively, our work provides an extensive catalogue of functional CREs in three widely used cell lines and showcases how large-scale functional measurements can be used to dissect regulatory grammar.
Collapse
Affiliation(s)
- Vikram Agarwal
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- mRNA Center of Excellence, Sanofi, Waltham, MA, USA.
| | - Fumitaka Inoue
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Max Schubach
- Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Dmitry Penzar
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia
- Institute of Translational Medicine, Pirogov Russian National Research Medical University, Moscow, Russia
| | - Beth K Martin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Pyaree Mohan Dash
- Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Pia Keukeleire
- Institute of Human Genetics, University Medical Center Schleswig-Holstein, University of Lübeck, Lübeck, Germany
| | - Zicong Zhang
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Ajuni Sohota
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Jingjing Zhao
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Galip Gürkan Yardımcı
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Knight Cancer Institute, Oregon Health and Science University, Portland, OR, USA
- Cancer Early Detection Advanced Research Center, Oregon Health and Science University, Portland, OR, USA
| | - Ivan V Kulakovskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia
- Life Improvement by Future Technologies (LIFT) Center, Moscow, Russia
| | - Martin Kircher
- Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
- Institute of Human Genetics, University Medical Center Schleswig-Holstein, University of Lübeck, Lübeck, Germany
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Howard Hughes Medical Institute, Seattle, WA, USA.
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, USA.
- Seattle Hub for Synthetic Biology, Seattle, Washington, USA.
| | - Nadav Ahituv
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA.
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
2
|
Chen B, Ren C, Ouyang Z, Xu J, Xu K, Li Y, Guo H, Bai X, Tian M, Xu X, Wang Y, Li H, Bo X, Chen H. Stratifying TAD boundaries pinpoints focal genomic regions of regulation, damage, and repair. Brief Bioinform 2024; 25:bbae306. [PMID: 38935071 PMCID: PMC11210073 DOI: 10.1093/bib/bbae306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 06/01/2024] [Accepted: 06/13/2024] [Indexed: 06/28/2024] Open
Abstract
Advances in chromatin mapping have exposed the complex chromatin hierarchical organization in mammals, including topologically associating domains (TADs) and their substructures, yet the functional implications of this hierarchy in gene regulation and disease progression are not fully elucidated. Our study delves into the phenomenon of shared TAD boundaries, which are pivotal in maintaining the hierarchical chromatin structure and regulating gene activity. By integrating high-resolution Hi-C data, chromatin accessibility, and DNA double-strand breaks (DSBs) data from various cell lines, we systematically explore the complex regulatory landscape at high-level TAD boundaries. Our findings indicate that these boundaries are not only key architectural elements but also vibrant hubs, enriched with functionally crucial genes and complex transcription factor binding site-clustered regions. Moreover, they exhibit a pronounced enrichment of DSBs, suggesting a nuanced interplay between transcriptional regulation and genomic stability. Our research provides novel insights into the intricate relationship between the 3D genome structure, gene regulation, and DNA repair mechanisms, highlighting the role of shared TAD boundaries in maintaining genomic integrity and resilience against perturbations. The implications of our findings extend to understanding the complexities of genomic diseases and open new avenues for therapeutic interventions targeting the structural and functional integrity of TAD boundaries.
Collapse
Affiliation(s)
- Bijia Chen
- Academy of Military Medical Sciences, Beijing 100850, China
| | - Chao Ren
- Academy of Military Medical Sciences, Beijing 100850, China
| | - Zhangyi Ouyang
- Academy of Military Medical Sciences, Beijing 100850, China
| | - Jingxuan Xu
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Department of Gastrointestinal Surgery, Peking University Cancer Hospital & Institute, Beijing 100142, China
| | - Kang Xu
- School of Software, Shandong University, Jinan 250101, China
| | - Yaru Li
- Academy of Military Medical Sciences, Beijing 100850, China
| | - Hejiang Guo
- Academy of Military Medical Sciences, Beijing 100850, China
| | - Xuemei Bai
- Academy of Military Medical Sciences, Beijing 100850, China
| | - Mengge Tian
- The First Affiliated Hospital of Harbin Medical University, Harbin 150001, China
| | - Xiang Xu
- Academy of Military Medical Sciences, Beijing 100850, China
| | - Yuyang Wang
- College of Computer and Data Science, Fuzhou University, Fuzhou 350108, China
| | - Hao Li
- Academy of Military Medical Sciences, Beijing 100850, China
| | - Xiaochen Bo
- Academy of Military Medical Sciences, Beijing 100850, China
| | - Hebing Chen
- Academy of Military Medical Sciences, Beijing 100850, China
| |
Collapse
|
3
|
Wang Q, Zhang J, Liu Z, Duan Y, Li C. Integrative approaches based on genomic techniques in the functional studies on enhancers. Brief Bioinform 2023; 25:bbad442. [PMID: 38048082 PMCID: PMC10694556 DOI: 10.1093/bib/bbad442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 10/22/2023] [Accepted: 11/08/2023] [Indexed: 12/05/2023] Open
Abstract
With the development of sequencing technology and the dramatic drop in sequencing cost, the functions of noncoding genes are being characterized in a wide variety of fields (e.g. biomedicine). Enhancers are noncoding DNA elements with vital transcription regulation functions. Tens of thousands of enhancers have been identified in the human genome; however, the location, function, target genes and regulatory mechanisms of most enhancers have not been elucidated thus far. As high-throughput sequencing techniques have leapt forwards, omics approaches have been extensively employed in enhancer research. Multidimensional genomic data integration enables the full exploration of the data and provides novel perspectives for screening, identification and characterization of the function and regulatory mechanisms of unknown enhancers. However, multidimensional genomic data are still difficult to integrate genome wide due to complex varieties, massive amounts, high rarity, etc. To facilitate the appropriate methods for studying enhancers with high efficacy, we delineate the principles, data processing modes and progress of various omics approaches to study enhancers and summarize the applications of traditional machine learning and deep learning in multi-omics integration in the enhancer field. In addition, the challenges encountered during the integration of multiple omics data are addressed. Overall, this review provides a comprehensive foundation for enhancer analysis.
Collapse
Affiliation(s)
- Qilin Wang
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Junyou Zhang
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Zhaoshuo Liu
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Yingying Duan
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Chunyan Li
- School of Engineering Medicine, Beihang University, Beijing 100191, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
- Key Laboratory of Big Data-Based Precision Medicine (Ministry of Industry and Information Technology), Beihang University, Beijing 100191, China
- Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Beihang University, Beijing 100191, China
| |
Collapse
|
4
|
Link V, Zavaleta YJA, Reyes RJ, Ding L, Wang J, Rohlfs RV, Edge MD. Microsatellites used in forensics are in regions enriched for trait-associated variants. iScience 2023; 26:107992. [PMID: 37841589 PMCID: PMC10570123 DOI: 10.1016/j.isci.2023.107992] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 08/10/2023] [Accepted: 09/18/2023] [Indexed: 10/17/2023] Open
Abstract
The 20 short tandem repeat (STR) loci of the combined DNA index system (CODIS) are the basis of the vast majority of forensic genetics in the United States. One argument for permissive rules about the collection of CODIS genotypes is that the CODIS loci are thought to contain little information about ancestry or traits. However, in the past 20 years, a growing field has identified hundreds of thousands of genotype-trait associations. Here, we conduct a survey of the landscape of such associations surrounding the CODIS loci as compared with non-CODIS STRs. Although this study cannot establish or quantify associations between CODIS genotypes and phenotypes, we find that the regions around the CODIS loci are enriched for both known pathogenic variants (> 90th percentile) and for trait-associated SNPs identified in genome-wide association studies (GWAS) (≥ 95th percentile in 10kb and 100kb flanking regions), compared with other random sets of autosomal tetranucleotide-repeat STRs.
Collapse
Affiliation(s)
- Vivian Link
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | | | - Rochelle-Jan Reyes
- Department of Biology, San Francisco State University, San Francisco, CA, USA
| | - Linda Ding
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Judy Wang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Rori V. Rohlfs
- Department of Biology, San Francisco State University, San Francisco, CA, USA
- Department of Data Science and Institute of Ecology and Evolution, University of Oregon, Eugene, OR, USA
| | - Michael D. Edge
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
5
|
Wang X, Liu D, Luo J, Kong D, Zhang Y. Exploring the Role of Enhancer-Mediated Transcriptional Regulation in Precision Biology. Int J Mol Sci 2023; 24:10843. [PMID: 37446021 PMCID: PMC10342031 DOI: 10.3390/ijms241310843] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 06/18/2023] [Accepted: 06/25/2023] [Indexed: 07/15/2023] Open
Abstract
The emergence of precision biology has been driven by the development of advanced technologies and techniques in high-resolution biological research systems. Enhancer-mediated transcriptional regulation, a complex network of gene expression and regulation in eukaryotes, has attracted significant attention as a promising avenue for investigating the underlying mechanisms of biological processes and diseases. To address biological problems with precision, large amounts of data, functional information, and research on the mechanisms of action of biological molecules is required to address biological problems with precision. Enhancers, including typical enhancers and super enhancers, play a crucial role in gene expression and regulation within this network. The identification and targeting of disease-associated enhancers hold the potential to advance precision medicine. In this review, we present the concepts, progress, importance, and challenges in precision biology, transcription regulation, and enhancers. Furthermore, we propose a model of transcriptional regulation for multi-enhancers and provide examples of their mechanisms in mammalian cells, thereby enhancing our understanding of how enhancers achieve precise regulation of gene expression in life processes. Precision biology holds promise in providing new tools and platforms for discovering insights into gene expression and disease occurrence, ultimately benefiting individuals and society as a whole.
Collapse
Affiliation(s)
- Xueyan Wang
- College of Life Science and Technology, Huazhong Agricultural University, Wuhan 430070, China;
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China; (D.L.); (J.L.); (D.K.)
| | - Danli Liu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China; (D.L.); (J.L.); (D.K.)
| | - Jing Luo
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China; (D.L.); (J.L.); (D.K.)
| | - Dashuai Kong
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China; (D.L.); (J.L.); (D.K.)
| | - Yubo Zhang
- College of Life Science and Technology, Huazhong Agricultural University, Wuhan 430070, China;
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China; (D.L.); (J.L.); (D.K.)
| |
Collapse
|
6
|
Smirnov A, Melino G, Candi E. Gene expression in organoids: an expanding horizon. Biol Direct 2023; 18:11. [PMID: 36964575 PMCID: PMC10038780 DOI: 10.1186/s13062-023-00360-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 02/20/2023] [Indexed: 03/26/2023] Open
Abstract
Recent development of human three-dimensional organoid cultures has opened new doors and opportunities ranging from modelling human development in vitro to personalised cancer therapies. These new in vitro systems are opening new horizons to the classic understanding of human development and disease. However, the complexity and heterogeneity of these models requires cutting-edge techniques to capture and trace global changes in gene expression to enable identification of key players and uncover the underlying molecular mechanisms. Rapid development of sequencing approaches made possible global transcriptome analyses and epigenetic profiling. Despite challenges in organoid culture and handling, these techniques are now being adapted to embrace organoids derived from a wide range of human tissues. Here, we review current state-of-the-art multi-omics technologies, such as single-cell transcriptomics and chromatin accessibility assays, employed to study organoids as a model for development and a platform for precision medicine.
Collapse
Affiliation(s)
- Artem Smirnov
- Department of Experimental Medicine, Torvergata Oncoscience Research, University of Rome "Tor Vergata", Via Montpellier 1, 00133, Rome, Italy
| | - Gerry Melino
- Department of Experimental Medicine, Torvergata Oncoscience Research, University of Rome "Tor Vergata", Via Montpellier 1, 00133, Rome, Italy
| | - Eleonora Candi
- Department of Experimental Medicine, Torvergata Oncoscience Research, University of Rome "Tor Vergata", Via Montpellier 1, 00133, Rome, Italy.
- Biochemistry Laboratory, Istituto Dermopatico Immacolata (IDI-IRCCS), 00166, Rome, Italy.
| |
Collapse
|
7
|
Link V, Zavaleta YJA, Reyes RJ, Ding L, Wang J, Rohlfs RV, Edge MD. Microsatellites used in forensics are located in regions unusually rich in trait-associated variants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.07.531629. [PMID: 36945578 PMCID: PMC10028909 DOI: 10.1101/2023.03.07.531629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2023]
Abstract
The 20 short tandem repeat (STR) markers of the combined DNA index system (CODIS) are the basis of the vast majority of forensic genetics in the United States. One argument for permissive rules about the collection of CODIS genotypes is that the CODIS markers are thought to contain information relevant to identification only (such as a human fingerprint would), with little information about ancestry or traits. However, in the past 20 years, a quickly growing field has identified hundreds of thousands of genotype-trait associations. Here we conduct a survey of the landscape of such associations surrounding the CODIS loci as compared with non-CODIS STRs. We find that the regions around the CODIS markers are enriched for both known pathogenic variants (>90th percentile) and for SNPs identified as trait-associated in genome-wide association studies (GWAS) (≥95th percentile in 10kb and 100kb flanking regions), compared with other random sets of autosomal tetranucleotide-repeat STRs. Although it is not obvious how much phenotypic information CODIS would need to convey to strain the "DNA fingerprint" analogy, the CODIS markers, considered as a set, are in regions unusually dense with variants with known phenotypic associations.
Collapse
Affiliation(s)
- Vivian Link
- Department of Quantitative and Computational Biology, University of Southern California
| | | | | | - Linda Ding
- Department of Quantitative and Computational Biology, University of Southern California
| | - Judy Wang
- Department of Quantitative and Computational Biology, University of Southern California
| | - Rori V. Rohlfs
- Department of Biology, San Francisco State University
- Department of Computer Science and Institute of Ecology and Evolution, University of Oregon
| | - Michael D. Edge
- Department of Quantitative and Computational Biology, University of Southern California
| |
Collapse
|
8
|
Yang TH, Yu YH, Wu SH, Zhang FY. CFA: An explainable deep learning model for annotating the transcriptional roles of cis-regulatory modules based on epigenetic codes. Comput Biol Med 2023; 152:106375. [PMID: 36502693 DOI: 10.1016/j.compbiomed.2022.106375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 11/07/2022] [Accepted: 11/27/2022] [Indexed: 11/30/2022]
Abstract
Metazoa gene expression is controlled by modular DNA segments called cis-regulatory modules (CRMs). CRMs can convey promoter/enhancer/insulator roles, generating additional regulation layers in transcription. Experiments for understanding CRM roles are low-throughput and costly. Large-scale CRM function investigation still depends on computational methods. However, existing in silico tools only recognize enhancers or promoters exclusively, thus accumulating errors when considering CRM promoter/enhancer/insulator roles altogether. Currently, no algorithm can concurrently consider these CRM roles. In this research, we developed the CRM Function Annotator (CFA) model. CFA provides complete CRM transcriptional role labeling based on epigenetic profiling interpretation. We demonstrated that CFA achieves high performance (test macro auROC/auPRC = 94.1%/90.3%) and outperforms existing tools in promoter/enhancer/insulator identification. CFA is also inspected to recognize explainable epigenetic codes consistent with previous findings when labeling CRM roles. By considering the higher-order combinations of the epigenetic codes, CFA significantly reduces false-positive rates in CRM transcriptional role annotation. CFA is available at https://github.com/cobisLab/CFA/.
Collapse
Affiliation(s)
- Tzu-Hsien Yang
- Department of Biomedical Engineering, National Cheng Kung University, No. 1, University Road, Tainan 701, Taiwan.
| | - Yu-Huai Yu
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan.
| | - Sheng-Hang Wu
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan.
| | - Fang-Yuan Zhang
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan.
| |
Collapse
|
9
|
A novel oncogenic enhancer of estrogen receptor-positive breast cancer. MOLECULAR THERAPY - NUCLEIC ACIDS 2022; 29:836-851. [PMID: 36159594 PMCID: PMC9463563 DOI: 10.1016/j.omtn.2022.08.029] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 08/17/2022] [Indexed: 11/22/2022]
|
10
|
Yang TH, Yang YC, Tu KC. regCNN: identifying Drosophila genome-wide cis-regulatory modules via integrating the local patterns in epigenetic marks and transcription factor binding motifs. Comput Struct Biotechnol J 2022; 20:296-308. [PMID: 35035784 PMCID: PMC8724954 DOI: 10.1016/j.csbj.2021.12.015] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 12/10/2021] [Accepted: 12/10/2021] [Indexed: 11/20/2022] Open
Abstract
Transcription regulation in metazoa is controlled by the binding events of transcription factors (TFs) or regulatory proteins on specific modular DNA regulatory sequences called cis-regulatory modules (CRMs). Understanding the distributions of CRMs on a genomic scale is essential for constructing the metazoan transcriptional regulatory networks that help diagnose genetic disorders. While traditional reporter-assay CRM identification approaches can provide an in-depth understanding of functions of some CRM, these methods are usually cost-inefficient and low-throughput. It is generally believed that by integrating diverse genomic data, reliable CRM predictions can be made. Hence, researchers often first resort to computational algorithms for genome-wide CRM screening before specific experiments. However, current existing in silico methods for searching potential CRMs were restricted by low sensitivity, poor prediction accuracy, or high computation time from TFBS composition combinatorial complexity. To overcome these obstacles, we designed a novel CRM identification pipeline called regCNN by considering the base-by-base local patterns in TF binding motifs and epigenetic profiles. On the test set, regCNN shows an accuracy/auROC of 84.5%/92.5% in CRM identification. And by further considering local patterns in epigenetic profiles and TF binding motifs, it can accomplish 4.7% (92.5%–87.8%) improvement in the auROC value over the average value-based pure multi-layer perceptron model. We also demonstrated that regCNN outperforms all currently available tools by at least 11.3% in auROC values. Finally, regCNN is verified to be robust against its resizing window hyperparameter in dealing with the variable lengths of CRMs. The model of regCNN can be downloaded athttp://cobisHSS0.im.nuk.edu.tw/regCNN/.
Collapse
Affiliation(s)
- Tzu-Hsien Yang
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan
| | - Ya-Chiao Yang
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan
| | - Kai-Chi Tu
- Department of Information Management, National University of Kaohsiung, Kaohsiung University Rd, 811 Kaohsiung, Taiwan
| |
Collapse
|
11
|
Morrow A, Hughes J, Singh J, Joseph A, Yosef N. Epitome: predicting epigenetic events in novel cell types with multi-cell deep ensemble learning. Nucleic Acids Res 2021; 49:e110. [PMID: 34379786 PMCID: PMC8565335 DOI: 10.1093/nar/gkab676] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 07/19/2021] [Accepted: 07/25/2021] [Indexed: 01/04/2023] Open
Abstract
The accumulation of large epigenomics data consortiums provides us with the opportunity to extrapolate existing knowledge to new cell types and conditions. We propose Epitome, a deep neural network that learns similarities of chromatin accessibility between well characterized reference cell types and a query cellular context, and copies over signal of transcription factor binding and modification of histones from reference cell types when chromatin profiles are similar to the query. Epitome achieves state-of-the-art accuracy when predicting transcription factor binding sites on novel cellular contexts and can further improve predictions as more epigenetic signals are collected from both reference cell types and the query cellular context of interest.
Collapse
Affiliation(s)
- Alyssa Kramer Morrow
- Electrical Engineering and Computer Science Department, University of California-Berkeley 465 Soda Hall, Berkeley, CA 94720-1776, USA
| | - John Weston Hughes
- Electrical Engineering and Computer Science Department, University of California-Berkeley 465 Soda Hall, Berkeley, CA 94720-1776, USA
- Computer Science Department, Stanford University, 353 Serra Mall, Stanford, CA 94305, USA
| | - Jahnavi Singh
- Electrical Engineering and Computer Science Department, University of California-Berkeley 465 Soda Hall, Berkeley, CA 94720-1776, USA
| | - Anthony Douglas Joseph
- Electrical Engineering and Computer Science Department, University of California-Berkeley 465 Soda Hall, Berkeley, CA 94720-1776, USA
- Center for Computational Biology, University of California-Berkeley 108 Stanley Hall, Berkeley, CA 94720-3220, USA
- Unite Genomics, Inc., 1301 Marina Village Pkwy, Suite 320, Alameda, CA 94501, USA
| | - Nir Yosef
- Electrical Engineering and Computer Science Department, University of California-Berkeley 465 Soda Hall, Berkeley, CA 94720-1776, USA
- Center for Computational Biology, University of California-Berkeley 108 Stanley Hall, Berkeley, CA 94720-3220, USA
- Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology, and Harvard University, Boston, MA, 02139, USA
- Chan Zuckerberg Biohub, San Francisco, CA, 94158, USA
| |
Collapse
|
12
|
Kumar S. SWI/SNF (BAF) complexes: From framework to a functional role in endothelial mechanotransduction. CURRENT TOPICS IN MEMBRANES 2021; 87:171-198. [PMID: 34696885 DOI: 10.1016/bs.ctm.2021.09.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/14/2023]
Abstract
Endothelial cells (ECs) are constantly subjected to an array of mechanical cues, especially shear stress, due to their luminal placement in the blood vessels. Blood flow can regulate various aspects of endothelial biology and pathophysiology by regulating the endothelial processes at the transcriptomic, proteomic, miRNomic, metabolomics, and epigenomic levels. ECs sense, respond, and adapt to altered blood flow patterns and shear profiles by specialized mechanisms of mechanosensing and mechanotransduction, resulting in qualitative and quantitative differences in their gene expression. Chromatin-regulatory proteins can regulate transcriptional activation by modifying the organization of nucleosomes at promoters, enhancers, silencers, insulators, and locus control regions. Recent research efforts have illustrated that SWI/SNF (SWItch/Sucrose Non-Fermentable) or BRG1/BRM-associated factor (BAF) complex regulates DNA accessibility and chromatin structure. Since the discovery, the gene-regulatory mechanisms of the BAF complex associated with chromatin remodeling have been intensively studied to investigate its role in diverse disease phenotypes. Thus far, it is evident that (1) the SWI/SNF complex broadly regulates the activity of transcriptional enhancers to control lineage-specific differentiation and (2) mutations in the BAF complex proteins lead to developmental disorders and cancers. It is unclear if blood flow can modulate the activity of SWI/SNF complex to regulate EC differentiation and reprogramming. This review emphasizes the integrative role of SWI/SNF complex from a structural and functional standpoint with a special reference to cardiovascular diseases (CVDs). The review also highlights how regulation of this complex by blood flow can lead to the discovery of new therapeutic interventions for the treatment of endothelial dysfunction in vascular diseases.
Collapse
Affiliation(s)
- Sandeep Kumar
- Wallace H. Coulter Department of Biomedical Engineering at Emory University and Georgia Institute of Technology, Atlanta, GA, United States.
| |
Collapse
|
13
|
Lange M, Begolli R, Giakountis A. Non-Coding Variants in Cancer: Mechanistic Insights and Clinical Potential for Personalized Medicine. Noncoding RNA 2021; 7:47. [PMID: 34449663 PMCID: PMC8395730 DOI: 10.3390/ncrna7030047] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Revised: 07/26/2021] [Accepted: 08/01/2021] [Indexed: 12/11/2022] Open
Abstract
The cancer genome is characterized by extensive variability, in the form of Single Nucleotide Polymorphisms (SNPs) or structural variations such as Copy Number Alterations (CNAs) across wider genomic areas. At the molecular level, most SNPs and/or CNAs reside in non-coding sequences, ultimately affecting the regulation of oncogenes and/or tumor-suppressors in a cancer-specific manner. Notably, inherited non-coding variants can predispose for cancer decades prior to disease onset. Furthermore, accumulation of additional non-coding driver mutations during progression of the disease, gives rise to genomic instability, acting as the driving force of neoplastic development and malignant evolution. Therefore, detection and characterization of such mutations can improve risk assessment for healthy carriers and expand the diagnostic and therapeutic toolbox for the patient. This review focuses on functional variants that reside in transcribed or not transcribed non-coding regions of the cancer genome and presents a collection of appropriate state-of-the-art methodologies to study them.
Collapse
Affiliation(s)
- Marios Lange
- Department of Biochemistry and Biotechnology, University of Thessaly, Biopolis, 41500 Larissa, Greece; (M.L.); (R.B.)
| | - Rodiola Begolli
- Department of Biochemistry and Biotechnology, University of Thessaly, Biopolis, 41500 Larissa, Greece; (M.L.); (R.B.)
| | - Antonis Giakountis
- Department of Biochemistry and Biotechnology, University of Thessaly, Biopolis, 41500 Larissa, Greece; (M.L.); (R.B.)
- Institute for Fundamental Biomedical Research, B.S.R.C “Alexander Fleming”, 34 Fleming Str., 16672 Vari, Greece
| |
Collapse
|
14
|
Svoboda LK, Neier K, Wang K, Cavalcante RG, Rygiel CA, Tsai Z, Jones TR, Liu S, Goodrich JM, Lalancette C, Colacino JA, Sartor MA, Dolinoy DC. Tissue and sex-specific programming of DNA methylation by perinatal lead exposure: implications for environmental epigenetics studies. Epigenetics 2020; 16:1102-1122. [PMID: 33164632 DOI: 10.1080/15592294.2020.1841872] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Early developmental environment can influence long-term health through reprogramming of the epigenome. Human environmental epigenetics studies rely on surrogate tissues, such as blood, to assess the effects of environment on disease-relevant but inaccessible target tissues. However, the extent to which environment-induced epigenetic changes are conserved between these tissues is unclear. A better understanding of this conservation is imperative for effective design and interpretation of human environmental epigenetics studies. The Toxicant Exposures and Responses by Genomic and Epigenomic Regulators of Transcription (TaRGET II) consortium was established by the National Institute of Environmental Health Sciences to address the utility of surrogate tissues as proxies for toxicant-induced epigenetic changes in target tissues. We and others have recently reported that perinatal exposure to lead (Pb) is associated with adverse metabolic outcomes. Here, we investigated the sex-specific effects of perinatal exposure to a human environmentally relevant level of Pb on DNA methylation in paired liver and blood samples from adult mice using enhanced reduced-representation bisulphite sequencing. Although Pb exposure ceased at 3 weeks of age, we observed thousands of sex-specific differentially methylated cytosines in the blood and liver of Pb-exposed animals at 5 months of age, including 44 genomically imprinted loci. We observed significant tissue overlap in the genes mapping to differentially methylated cytosines. A small but significant subset of Pb-altered genes exhibit basal sex differences in gene expression in the mouse liver. Collectively, these data identify potential molecular targets for Pb-induced metabolic diseases, and inform the design of more robust human environmental epigenomics studies.
Collapse
Affiliation(s)
- Laurie K Svoboda
- Environmental Health Sciences, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Kari Neier
- Environmental Health Sciences, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Kai Wang
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School Palmer Commons, Ann Arbor, MI, USA
| | | | - Christine A Rygiel
- Environmental Health Sciences, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Zing Tsai
- Environmental Health Sciences, University of Michigan School of Public Health, Ann Arbor, MI, USA.,Department of Computational Medicine and Bioinformatics, University of Michigan Medical School Palmer Commons, Ann Arbor, MI, USA
| | - Tamara R Jones
- Environmental Health Sciences, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Siyu Liu
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School Palmer Commons, Ann Arbor, MI, USA
| | - Jaclyn M Goodrich
- Environmental Health Sciences, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Claudia Lalancette
- Epigenomics Core, University of Michigan, Medical School, Ann Arbor, MI, USA
| | - Justin A Colacino
- Environmental Health Sciences, University of Michigan School of Public Health, Ann Arbor, MI, USA.,Nutritional Sciences, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Maureen A Sartor
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School Palmer Commons, Ann Arbor, MI, USA.,Department of Biostatistics, University of Michigan, School of Public Health, Ann Arbor, MI, USA
| | - Dana C Dolinoy
- Environmental Health Sciences, University of Michigan School of Public Health, Ann Arbor, MI, USA.,Nutritional Sciences, University of Michigan School of Public Health, Ann Arbor, MI, USA
| |
Collapse
|
15
|
Rochette-Egly C. Retinoic Acid-Regulated Target Genes During Development: Integrative Genomics Analysis. Subcell Biochem 2020; 95:57-85. [PMID: 32297296 DOI: 10.1007/978-3-030-42282-0_3] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Retinoic acid (RA), a major natural active metabolite of vitamin A (VA) is well known to play critical roles in embryonic development. The effects of RA are mediated by nuclear receptors (RARs), which regulate the expression of gene batteries involved in cell growth and differentiation. Since the early 1990s several laboratories have focused on understanding how RA-regulated genes and RAR binding sites operate by studying the differentiation of embryonal carcinoma cells and embryonic stem cells. The development of hybridization-based microarray technology and high performance software analysis programs has allowed the characterization of thousands of RA-regulated genes. During the two last decades, publication of the genome sequence of various organisms has allowed advances in massive parallel sequencing and bioinformatics analysis of genome-wide data sets. These new generation sequencing (NGS) technologies have revolutionized the field by providing a global integrated picture of RA-regulated gene networks and the regulatory programs involved in cell fate decisions during embryonal carcinoma and embryonic stem cells differentiation. Now the challenge is to reconstruct the RA-regulated gene networks at the single cell level during the development of specialized embryonic tissues.
Collapse
Affiliation(s)
- Cecile Rochette-Egly
- Université de Strasbourg, IGBMC (Institut de Génétique et de Biologie Moléculaire et Cellulaire), INSERM, U964, CNRS, UMR7104, 1 rue Laurent Fries, BP 10142, 67404, Illkirch Cedex, France.
| |
Collapse
|
16
|
Brown K, Takawira LT, O'Neill MM, Mizrachi E, Myburg AA, Hussey SG. Identification and functional evaluation of accessible chromatin associated with wood formation in Eucalyptus grandis. THE NEW PHYTOLOGIST 2019; 223:1937-1951. [PMID: 31063599 DOI: 10.1111/nph.15897] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Accepted: 04/29/2019] [Indexed: 05/03/2023]
Abstract
Accessible chromatin changes dynamically during development and harbours functional regulatory regions which are poorly understood in the context of wood development. We explored the importance of accessible chromatin in Eucalyptus grandis in immature xylem generally, and MYB transcription factor-mediated transcriptional programmes specifically. We identified biologically reproducible DNase I Hypersensitive Sites (DHSs) and assessed their functional significance in immature xylem through their associations with gene expression, epigenomic data and DNA sequence conservation. We identified in vitro DNA binding sites for six secondary cell wall-associated Eucalyptus MYB (EgrMYB) transcription factors using DAP-seq, reconstructed protein-DNA networks of predicted targets based on binding sites within or outside DHSs and assessed biological enrichment of these networks with published datasets. 25 319 identified immature xylem DHSs were associated with increased transcription and significantly enriched for various epigenetic signatures (H3K4me3, H3K27me3, RNA pol II), conserved noncoding sequences and depleted single nucleotide variants. Predicted networks built from EgrMYB binding sites located in accessible chromatin were significantly enriched for systems biology datasets relevant to wood formation, whereas those occurring in inaccessible chromatin were not. Our study demonstrates that DHSs in E. grandis immature xylem, most of which are intergenic, are of functional significance to gene regulation in this tissue.
Collapse
Affiliation(s)
- Katrien Brown
- Department of Biochemistry, Genetics and Microbiology, Forestry and Agricultural Biotechnology Institute (FABI), Genomics Research Institute (GRI), University of Pretoria, Private Bag X28, Pretoria, 0002, South Africa
| | - Lazarus T Takawira
- Department of Biochemistry, Genetics and Microbiology, Forestry and Agricultural Biotechnology Institute (FABI), Genomics Research Institute (GRI), University of Pretoria, Private Bag X28, Pretoria, 0002, South Africa
| | - Marja M O'Neill
- Department of Biochemistry, Genetics and Microbiology, Forestry and Agricultural Biotechnology Institute (FABI), Genomics Research Institute (GRI), University of Pretoria, Private Bag X28, Pretoria, 0002, South Africa
| | - Eshchar Mizrachi
- Department of Biochemistry, Genetics and Microbiology, Forestry and Agricultural Biotechnology Institute (FABI), Genomics Research Institute (GRI), University of Pretoria, Private Bag X28, Pretoria, 0002, South Africa
| | - Alexander A Myburg
- Department of Biochemistry, Genetics and Microbiology, Forestry and Agricultural Biotechnology Institute (FABI), Genomics Research Institute (GRI), University of Pretoria, Private Bag X28, Pretoria, 0002, South Africa
| | - Steven G Hussey
- Department of Biochemistry, Genetics and Microbiology, Forestry and Agricultural Biotechnology Institute (FABI), Genomics Research Institute (GRI), University of Pretoria, Private Bag X28, Pretoria, 0002, South Africa
| |
Collapse
|
17
|
Sun Y, Miao N, Sun T. Detect accessible chromatin using ATAC-sequencing, from principle to applications. Hereditas 2019; 156:29. [PMID: 31427911 PMCID: PMC6696680 DOI: 10.1186/s41065-019-0105-9] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Accepted: 08/12/2019] [Indexed: 02/07/2023] Open
Abstract
Background Chromatin accessibility is crucial for gene expression regulation in specific cells and in multiple biological processes. Assay for Transposase Accessible Chromatin with high-throughput sequencing (ATAC-seq) is an effective way to reveal chromatin accessibility at a genome-wide level. Through ATAC-seq, produced reads from a small number of cells reflect accessible regions that correspond to nucleosome positioning and transcription factor binding sites, due to probing hyperactive Tn5 transposase to DNA sequence. Conclusion In this review, we summarize both principle and features of ATAC-seq, highlight its applications in basic and clinical research. ATAC-seq has generated comprehensive chromatin accessible maps, and is becoming a powerful tool to understand dynamic gene expression regulation in stem cells, early embryos and tumors.
Collapse
Affiliation(s)
- Yuanyuan Sun
- Center for Precision Medicine, School of Medicine and School of Biomedical Sciences, Huaqiao University, 668 Jimei Road, Xiamen, 361021 Fujian China
| | - Nan Miao
- Center for Precision Medicine, School of Medicine and School of Biomedical Sciences, Huaqiao University, 668 Jimei Road, Xiamen, 361021 Fujian China
| | - Tao Sun
- Center for Precision Medicine, School of Medicine and School of Biomedical Sciences, Huaqiao University, 668 Jimei Road, Xiamen, 361021 Fujian China
| |
Collapse
|
18
|
Ren J, Finney R, Ni K, Cam M, Muegge K. The chromatin remodeling protein Lsh alters nucleosome occupancy at putative enhancers and modulates binding of lineage specific transcription factors. Epigenetics 2019; 14:277-293. [PMID: 30861354 PMCID: PMC6557562 DOI: 10.1080/15592294.2019.1582275] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Revised: 01/07/2019] [Accepted: 02/07/2019] [Indexed: 12/12/2022] Open
Abstract
Dynamic regulation of chromatin accessibility is a key feature of cellular differentiation during embryogenesis, but the precise factors that control access to chromatin remain largely unknown. Lsh/HELLS is critical for normal development and mutations of Lsh in human cause the ICF (Immune deficiency, Centromeric instability, Facial anomalies) syndrome, a severe immune disorder with multiple organ deficiencies. We report here that Lsh, previously known to regulate DNA methylation level, has a genome wide chromatin remodeling function. Using micrococcal nuclease (MNase)-seq analysis, we demonstrate that Lsh protects MNase accessibility at transcriptional regulatory regions characterized by DNase I hypersensitivity and certain histone 3 (H3) tail modifications associated with enhancers. Using an auxin-inducible degron system, allowing proteolytical degradation of Lsh, we show that Lsh mediated changes in nucleosome occupancy are independent of DNA methylation level and are characterized by reduced H3 occupancy. While Lsh mediated nucleosome occupancy prevents binding sites for transcription factors in wild type cells, depletion of Lsh leads to an increase in binding of ectopically expressed tissue specific transcription factors to their respective binding sites. Our data suggests that Lsh mediated chromatin remodeling can modulate nucleosome positioning at a subset of putative enhancers contributing to the preservation of cellular identity through regulation of accessibility.
Collapse
Affiliation(s)
- Jianke Ren
- Mouse Cancer Genetics Program, National Cancer Institute, Frederick, MD, USA
| | - Richard Finney
- CCR Collaborative Bioinformatics Resource, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA
| | - Kai Ni
- Mouse Cancer Genetics Program, National Cancer Institute, Frederick, MD, USA
| | - Maggie Cam
- CCR Collaborative Bioinformatics Resource, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA
| | - Kathrin Muegge
- Mouse Cancer Genetics Program, National Cancer Institute, Frederick, MD, USA
- Frederick National Laboratory for Cancer Research, Basic Science Program, Leidos Biomedical Research, Inc., Frederick, MD, USA
| |
Collapse
|