1
|
Buckley RM, Ostrander EA. Large-scale genomic analysis of the domestic dog informs biological discovery. Genome Res 2024; 34:811-821. [PMID: 38955465 DOI: 10.1101/gr.278569.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/04/2024]
Abstract
Recent advances in genomics, coupled with a unique population structure and remarkable levels of variation, have propelled the domestic dog to new levels as a system for understanding fundamental principles in mammalian biology. Central to this advance are more than 350 recognized breeds, each a closed population that has undergone selection for unique features. Genetic variation in the domestic dog is particularly well characterized compared with other domestic mammals, with almost 3000 high-coverage genomes publicly available. Importantly, as the number of sequenced genomes increases, new avenues for analysis are becoming available. Herein, we discuss recent discoveries in canine genomics regarding behavior, morphology, and disease susceptibility. We explore the limitations of current data sets for variant interpretation, tradeoffs between sequencing strategies, and the burgeoning role of long-read genomes for capturing structural variants. In addition, we consider how large-scale collections of whole-genome sequence data drive rare variant discovery and assess the geographic distribution of canine diversity, which identifies Asia as a major source of missing variation. Finally, we review recent comparative genomic analyses that will facilitate annotation of the noncoding genome in dogs.
Collapse
Affiliation(s)
- Reuben M Buckley
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Elaine A Ostrander
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| |
Collapse
|
2
|
Lalanne JB, Regalado SG, Domcke S, Calderon D, Martin BK, Li X, Li T, Suiter CC, Lee C, Trapnell C, Shendure J. Multiplex profiling of developmental cis-regulatory elements with quantitative single-cell expression reporters. Nat Methods 2024; 21:983-993. [PMID: 38724692 PMCID: PMC11166576 DOI: 10.1038/s41592-024-02260-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 03/22/2024] [Indexed: 06/13/2024]
Abstract
The inability to scalably and precisely measure the activity of developmental cis-regulatory elements (CREs) in multicellular systems is a bottleneck in genomics. Here we develop a dual RNA cassette that decouples the detection and quantification tasks inherent to multiplex single-cell reporter assays. The resulting measurement of reporter expression is accurate over multiple orders of magnitude, with a precision approaching the limit set by Poisson counting noise. Together with RNA barcode stabilization via circularization, these scalable single-cell quantitative expression reporters provide high-contrast readouts, analogous to classic in situ assays but entirely from sequencing. Screening >200 regions of accessible chromatin in a multicellular in vitro model of early mammalian development, we identify 13 (8 previously uncharacterized) autonomous and cell-type-specific developmental CREs. We further demonstrate that chimeric CRE pairs generate cognate two-cell-type activity profiles and assess gain- and loss-of-function multicellular expression phenotypes from CRE variants with perturbed transcription factor binding sites. Single-cell quantitative expression reporters can be applied in developmental and multicellular systems to quantitatively characterize native, perturbed and synthetic CREs at scale, with high sensitivity and at single-cell resolution.
Collapse
Affiliation(s)
| | - Samuel G Regalado
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Silvia Domcke
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Diego Calderon
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Beth K Martin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Xiaoyi Li
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Tony Li
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Chase C Suiter
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
| | - Choli Lee
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Cole Trapnell
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA.
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA.
- Howard Hughes Medical Institute, Seattle, WA, USA.
| |
Collapse
|
3
|
Li L, Song Q, Zhou J, Ji Q. Controllers of histone methylation-modifying enzymes in gastrointestinal cancers. Biomed Pharmacother 2024; 174:116488. [PMID: 38520871 DOI: 10.1016/j.biopha.2024.116488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 02/26/2024] [Accepted: 03/19/2024] [Indexed: 03/25/2024] Open
Abstract
Gastrointestinal (GI) cancers have been considered primarily genetic malignancies, caused by a series of progressive genetic alterations. Accumulating evidence shows that histone methylation, an epigenetic modification program, plays an essential role in the different pathological stages of GI cancer progression, such as precancerous lesions, tumorigenesis, and tumor metastasis. Histone methylation-modifying enzymes, including histone methyltransferases (HMTs) and demethylases (HDMs), are the main executor of post-transcriptional modification. The abnormal expression of histone methylation-modifying enzymes characterizes GI cancers with complex pathogenesis and progression. Interactions between upstream controllers and histone methylation-modifying enzymes have recently been revealed, and have provided numerous opportunities to elucidate the pathogenesis of GI cancers in depth and clearly. Here we focus on the association between histone methylation-modifying enzymes and their controllers, aiming to provide a new perspective on the molecular research and clinical management of GI cancers.
Collapse
Affiliation(s)
- Ling Li
- Department of Medical Oncology & Cancer Institute of Integrative Medicine, Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China
| | - Qing Song
- Department of Medical Oncology, Suzhou TCM Hospital Affiliated to Nanjing University of Chinese Medicine, Suzhou, Jiangsu 215007, China
| | - Jing Zhou
- Department of Medical Oncology & Cancer Institute of Integrative Medicine, Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China; Liver Disease Department of Integrative Medicine, Ningbo No.2 Hospital, Ningbo, Zhejiang 315000, China.
| | - Qing Ji
- Department of Medical Oncology & Cancer Institute of Integrative Medicine, Shuguang Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai 201203, China.
| |
Collapse
|
4
|
Duncan AG, Mitchell JA, Moses AM. Improving the performance of supervised deep learning for regulatory genomics using phylogenetic augmentation. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae190. [PMID: 38588559 DOI: 10.1093/bioinformatics/btae190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 01/12/2024] [Accepted: 04/05/2024] [Indexed: 04/10/2024]
Abstract
MOTIVATION Supervised deep learning is used to model the complex relationship between genomic sequence and regulatory function. Understanding how these models make predictions can provide biological insight into regulatory functions. Given the complexity of the sequence to regulatory function mapping (the cis-regulatory code), it has been suggested that the genome contains insufficient sequence variation to train models with suitable complexity. Data augmentation is a widely used approach to increase the data variation available for model training, however current data augmentation methods for genomic sequence data are limited. RESULTS Inspired by the success of comparative genomics, we show that augmenting genomic sequences with evolutionarily related sequences from other species, which we term phylogenetic augmentation, improves the performance of deep learning models trained on regulatory genomic sequences to predict high-throughput functional assay measurements. Additionally, we show that phylogenetic augmentation can rescue model performance when the training set is down-sampled and permits deep learning on a real-world small dataset, demonstrating that this approach improves data efficiency. Overall, this data augmentation method represents a solution for improving model performance that is applicable to many supervised deep-learning problems in genomics. AVAILABILITY AND IMPLEMENTATION The open-source GitHub repository agduncan94/phylogenetic_augmentation_paper includes the code for rerunning the analyses here and recreating the figures.
Collapse
Affiliation(s)
- Andrew G Duncan
- Cell & Systems Biology, University of Toronto, Toronto, ON M5S 3G5, Canada
| | | | - Alan M Moses
- Cell & Systems Biology, University of Toronto, Toronto, ON M5S 3G5, Canada
| |
Collapse
|
5
|
Zhang G, Fu Y, Yang L, Ye F, Zhang P, Zhang S, Ma L, Li J, Wu H, Han X, Wang J, Guo G. Construction of single-cell cross-species chromatin accessibility landscapes with combinatorial-hybridization-based ATAC-seq. Dev Cell 2024; 59:793-811.e8. [PMID: 38330939 DOI: 10.1016/j.devcel.2024.01.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 11/03/2023] [Accepted: 01/18/2024] [Indexed: 02/10/2024]
Abstract
Despite recent advances in single-cell genomics, the lack of maps for single-cell candidate cis-regulatory elements (cCREs) in non-mammal species has limited our exploration of conserved regulatory programs across vertebrates and invertebrates. Here, we developed a combinatorial-hybridization-based method for single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) named CH-ATAC-seq, enabling the construction of single-cell accessible chromatin landscapes for zebrafish, Drosophila, and earthworms (Eisenia andrei). By integrating scATAC censuses of humans, monkeys, and mice, we systematically identified 152 distinct main cell types and around 0.8 million cell-type-specific cCREs. Our analysis provided insights into the conservation of neural, muscle, and immune lineages across species, while epithelial cells exhibited a higher organ-origin heterogeneity. Additionally, a large-scale gene regulatory network (GRN) was constructed in four vertebrates by integrating scRNA-seq censuses. Overall, our study provides a valuable resource for comparative epigenomics, identifying the evolutionary conservation and divergence of gene regulation across different species.
Collapse
Affiliation(s)
- Guodong Zhang
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou 310000, China; Liangzhu Laboratory, Zhejiang University, Hangzhou 311121, China
| | - Yuting Fu
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou 310000, China
| | - Lei Yang
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou 310000, China
| | - Fang Ye
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou 310000, China; Liangzhu Laboratory, Zhejiang University, Hangzhou 311121, China
| | - Peijing Zhang
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou 310000, China
| | - Shuang Zhang
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou 310000, China
| | - Lifeng Ma
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou 310000, China
| | - Jiaqi Li
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou 310000, China
| | - Hanyu Wu
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou 310000, China
| | - Xiaoping Han
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou 310000, China; Zhejiang Provincial Key Laboratory for Tissue Engineering and Regenerative Medicine, Dr. Li Dak Sum & Yip Yio Chin Center for Stem Cell and Regenerative Medicine, Hangzhou 310058, China.
| | - Jingjing Wang
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou 310000, China; Liangzhu Laboratory, Zhejiang University, Hangzhou 311121, China.
| | - Guoji Guo
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou 310000, China; Liangzhu Laboratory, Zhejiang University, Hangzhou 311121, China; Zhejiang Provincial Key Laboratory for Tissue Engineering and Regenerative Medicine, Dr. Li Dak Sum & Yip Yio Chin Center for Stem Cell and Regenerative Medicine, Hangzhou 310058, China; Institute of Hematology, Zhejiang University, Hangzhou, China.
| |
Collapse
|
6
|
Taskiran II, Spanier KI, Dickmänken H, Kempynck N, Pančíková A, Ekşi EC, Hulselmans G, Ismail JN, Theunis K, Vandepoel R, Christiaens V, Mauduit D, Aerts S. Cell-type-directed design of synthetic enhancers. Nature 2024; 626:212-220. [PMID: 38086419 PMCID: PMC10830415 DOI: 10.1038/s41586-023-06936-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 12/05/2023] [Indexed: 01/19/2024]
Abstract
Transcriptional enhancers act as docking stations for combinations of transcription factors and thereby regulate spatiotemporal activation of their target genes1. It has been a long-standing goal in the field to decode the regulatory logic of an enhancer and to understand the details of how spatiotemporal gene expression is encoded in an enhancer sequence. Here we show that deep learning models2-6, can be used to efficiently design synthetic, cell-type-specific enhancers, starting from random sequences, and that this optimization process allows detailed tracing of enhancer features at single-nucleotide resolution. We evaluate the function of fully synthetic enhancers to specifically target Kenyon cells or glial cells in the fruit fly brain using transgenic animals. We further exploit enhancer design to create 'dual-code' enhancers that target two cell types and minimal enhancers smaller than 50 base pairs that are fully functional. By examining the state space searches towards local optima, we characterize enhancer codes through the strength, combination and arrangement of transcription factor activator and transcription factor repressor motifs. Finally, we apply the same strategies to successfully design human enhancers, which adhere to enhancer rules similar to those of Drosophila enhancers. Enhancer design guided by deep learning leads to better understanding of how enhancers work and shows that their code can be exploited to manipulate cell states.
Collapse
Affiliation(s)
- Ibrahim I Taskiran
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Katina I Spanier
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Hannah Dickmänken
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Niklas Kempynck
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Alexandra Pančíková
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- VIB-KULeuven Center for Cancer Biology, Leuven, Belgium
| | - Eren Can Ekşi
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Gert Hulselmans
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Joy N Ismail
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- UK Dementia Research Institute at Imperial College London, London, UK
| | - Koen Theunis
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Roel Vandepoel
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Valerie Christiaens
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - David Mauduit
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Stein Aerts
- Laboratory of Computational Biology, VIB Center for AI & Computational Biology (VIB.AI), Leuven, Belgium.
- VIB-KULeuven Center for Brain & Disease Research, Leuven, Belgium.
- Department of Human Genetics, KU Leuven, Leuven, Belgium.
| |
Collapse
|
7
|
Bravo González-Blas C, Matetovici I, Hillen H, Taskiran II, Vandepoel R, Christiaens V, Sansores-García L, Verboven E, Hulselmans G, Poovathingal S, Demeulemeester J, Psatha N, Mauduit D, Halder G, Aerts S. Single-cell spatial multi-omics and deep learning dissect enhancer-driven gene regulatory networks in liver zonation. Nat Cell Biol 2024; 26:153-167. [PMID: 38182825 PMCID: PMC10791584 DOI: 10.1038/s41556-023-01316-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 11/15/2023] [Indexed: 01/07/2024]
Abstract
In the mammalian liver, hepatocytes exhibit diverse metabolic and functional profiles based on their location within the liver lobule. However, it is unclear whether this spatial variation, called zonation, is governed by a well-defined gene regulatory code. Here, using a combination of single-cell multiomics, spatial omics, massively parallel reporter assays and deep learning, we mapped enhancer-gene regulatory networks across mouse liver cell types. We found that zonation affects gene expression and chromatin accessibility in hepatocytes, among other cell types. These states are driven by the repressors TCF7L1 and TBX3, alongside other core hepatocyte transcription factors, such as HNF4A, CEBPA, FOXA1 and ONECUT1. To examine the architecture of the enhancers driving these cell states, we trained a hierarchical deep learning model called DeepLiver. Our study provides a multimodal understanding of the regulatory code underlying hepatocyte identity and their zonation state that can be used to engineer enhancers with specific activity levels and zonation patterns.
Collapse
Affiliation(s)
- Carmen Bravo González-Blas
- VIB Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Irina Matetovici
- VIB Center for Brain & Disease Research, Leuven, Belgium
- VIB Center for AI and Computational Biology (VIB.AI), Leuven, Belgium
- VIB Tech Watch, VIB Headquarters, Ghent, Belgium
| | - Hanne Hillen
- VIB Center for Cancer Biology, Leuven, Belgium
- Department of Oncology, KU Leuven, Leuven, Belgium
| | - Ibrahim Ihsan Taskiran
- VIB Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- VIB Center for AI and Computational Biology (VIB.AI), Leuven, Belgium
| | - Roel Vandepoel
- VIB Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- VIB Center for AI and Computational Biology (VIB.AI), Leuven, Belgium
| | - Valerie Christiaens
- VIB Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- VIB Center for AI and Computational Biology (VIB.AI), Leuven, Belgium
| | - Leticia Sansores-García
- VIB Center for Cancer Biology, Leuven, Belgium
- Department of Oncology, KU Leuven, Leuven, Belgium
| | - Elisabeth Verboven
- VIB Center for Cancer Biology, Leuven, Belgium
- Department of Oncology, KU Leuven, Leuven, Belgium
| | - Gert Hulselmans
- VIB Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- VIB Center for AI and Computational Biology (VIB.AI), Leuven, Belgium
| | | | - Jonas Demeulemeester
- VIB Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Nikoleta Psatha
- VIB Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - David Mauduit
- VIB Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
- VIB Center for AI and Computational Biology (VIB.AI), Leuven, Belgium
| | - Georg Halder
- VIB Center for Cancer Biology, Leuven, Belgium
- Department of Oncology, KU Leuven, Leuven, Belgium
| | - Stein Aerts
- VIB Center for Brain & Disease Research, Leuven, Belgium.
- Department of Human Genetics, KU Leuven, Leuven, Belgium.
- VIB Center for AI and Computational Biology (VIB.AI), Leuven, Belgium.
| |
Collapse
|
8
|
Marzi SJ, Schilder BM, Nott A, Frigerio CS, Willaime-Morawek S, Bucholc M, Hanger DP, James C, Lewis PA, Lourida I, Noble W, Rodriguez-Algarra F, Sharif JA, Tsalenchuk M, Winchester LM, Yaman Ü, Yao Z, Ranson JM, Llewellyn DJ. Artificial intelligence for neurodegenerative experimental models. Alzheimers Dement 2023; 19:5970-5987. [PMID: 37768001 DOI: 10.1002/alz.13479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 08/11/2023] [Accepted: 08/14/2023] [Indexed: 09/29/2023]
Abstract
INTRODUCTION Experimental models are essential tools in neurodegenerative disease research. However, the translation of insights and drugs discovered in model systems has proven immensely challenging, marred by high failure rates in human clinical trials. METHODS Here we review the application of artificial intelligence (AI) and machine learning (ML) in experimental medicine for dementia research. RESULTS Considering the specific challenges of reproducibility and translation between other species or model systems and human biology in preclinical dementia research, we highlight best practices and resources that can be leveraged to quantify and evaluate translatability. We then evaluate how AI and ML approaches could be applied to enhance both cross-model reproducibility and translation to human biology, while sustaining biological interpretability. DISCUSSION AI and ML approaches in experimental medicine remain in their infancy. However, they have great potential to strengthen preclinical research and translation if based upon adequate, robust, and reproducible experimental data. HIGHLIGHTS There are increasing applications of AI in experimental medicine. We identified issues in reproducibility, cross-species translation, and data curation in the field. Our review highlights data resources and AI approaches as solutions. Multi-omics analysis with AI offers exciting future possibilities in drug discovery.
Collapse
Affiliation(s)
- Sarah J Marzi
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Brian M Schilder
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Alexi Nott
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | | | | | - Magda Bucholc
- School of Computing, Engineering & Intelligent Systems, Ulster University, Derry, UK
| | - Diane P Hanger
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | | | - Patrick A Lewis
- Royal Veterinary College, London, UK
- Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London, UK
| | | | - Wendy Noble
- Faculty of Health and Life Sciences, University of Exeter, Exeter, UK
| | | | - Jalil-Ahmad Sharif
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Maria Tsalenchuk
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | | | - Ümran Yaman
- UK Dementia Research Institute at UCL, London, UK
| | | | | | - David J Llewellyn
- University of Exeter Medical School, Exeter, UK
- Alan Turing Institute, London, UK
| |
Collapse
|
9
|
Klie A, Laub D, Talwar JV, Stites H, Jores T, Solvason JJ, Farley EK, Carter H. Predictive analyses of regulatory sequences with EUGENe. NATURE COMPUTATIONAL SCIENCE 2023; 3:946-956. [PMID: 38177592 PMCID: PMC10768637 DOI: 10.1038/s43588-023-00544-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 09/27/2023] [Indexed: 01/06/2024]
Abstract
Deep learning has become a popular tool to study cis-regulatory function. Yet efforts to design software for deep-learning analyses in regulatory genomics that are findable, accessible, interoperable and reusable (FAIR) have fallen short of fully meeting these criteria. Here we present elucidating the utility of genomic elements with neural nets (EUGENe), a FAIR toolkit for the analysis of genomic sequences with deep learning. EUGENe consists of a set of modules and subpackages for executing the key functionality of a genomics deep learning workflow: (1) extracting, transforming and loading sequence data from many common file formats; (2) instantiating, initializing and training diverse model architectures; and (3) evaluating and interpreting model behavior. We designed EUGENe as a simple, flexible and extensible interface for streamlining and customizing end-to-end deep-learning sequence analyses, and illustrate these principles through application of the toolkit to three predictive modeling tasks. We hope that EUGENe represents a springboard towards a collaborative ecosystem for deep-learning applications in genomics research.
Collapse
Affiliation(s)
- Adam Klie
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
| | - David Laub
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
| | - James V Talwar
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
| | | | - Tobias Jores
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Joe J Solvason
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
- Department of Molecular Biology, University of California San Diego, La Jolla, CA, USA
| | - Emma K Farley
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
- Department of Molecular Biology, University of California San Diego, La Jolla, CA, USA
| | - Hannah Carter
- Department of Medicine, University of California San Diego, La Jolla, CA, USA.
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
10
|
Wang Z, Luo M, Liang Q, Zhao K, Hu Y, Wang W, Feng X, Hu B, Teng J, You T, Li R, Bao Z, Pan W, Yang T, Zhang C, Li T, Dong X, Yi X, Liu B, Zhao L, Li M, Chen K, Song W, Yang J, Li MJ. Landscape of enhancer disruption and functional screen in melanoma cells. Genome Biol 2023; 24:248. [PMID: 37904237 PMCID: PMC10614365 DOI: 10.1186/s13059-023-03087-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Accepted: 10/12/2023] [Indexed: 11/01/2023] Open
Abstract
BACKGROUND The high mutation rate throughout the entire melanoma genome presents a major challenge in stratifying true driver events from the background mutations. Numerous recurrent non-coding alterations, such as those in enhancers, can shape tumor evolution, thereby emphasizing the importance in systematically deciphering enhancer disruptions in melanoma. RESULTS Here, we leveraged 297 melanoma whole-genome sequencing samples to prioritize highly recurrent regions. By performing a genome-scale CRISPR interference (CRISPRi) screen on highly recurrent region-associated enhancers in melanoma cells, we identified 66 significant hits which could have tumor-suppressive roles. These functional enhancers show unique mutational patterns independent of classical significantly mutated genes in melanoma. Target gene analysis for the essential enhancers reveal many known and hidden mechanisms underlying melanoma growth. Utilizing extensive functional validation experiments, we demonstrate that a super enhancer element could modulate melanoma cell proliferation by targeting MEF2A, and another distal enhancer is able to sustain PTEN tumor-suppressive potential via long-range interactions. CONCLUSIONS Our study establishes a catalogue of crucial enhancers and their target genes in melanoma growth and progression, and illuminates the identification of novel mechanisms of dysregulation for melanoma driver genes and new therapeutic targeting strategies.
Collapse
Affiliation(s)
- Zhao Wang
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Institute of Aging, Key Laboratory of Alzheimer's Disease of Zhejiang Province, The Second Affiliated Hospital, Wenzhou Medical University, Wenzhou, China.
- Department of Epidemiology and Biostatistics, Tianjin Key Laboratory of Molecular Cancer Epidemiology, The Province and Ministry Co-Sponsored Collaborative Innovation Center for Medical Epigenetics, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China.
| | - Menghan Luo
- Department of Epidemiology and Biostatistics, Tianjin Key Laboratory of Molecular Cancer Epidemiology, The Province and Ministry Co-Sponsored Collaborative Innovation Center for Medical Epigenetics, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
- Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Qian Liang
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Institute of Aging, Key Laboratory of Alzheimer's Disease of Zhejiang Province, The Second Affiliated Hospital, Wenzhou Medical University, Wenzhou, China
- Department of Epidemiology and Biostatistics, Tianjin Key Laboratory of Molecular Cancer Epidemiology, The Province and Ministry Co-Sponsored Collaborative Innovation Center for Medical Epigenetics, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
- Scientific Research Center, Wenzhou Medical University, Wenzhou, China
| | - Ke Zhao
- Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Yuelin Hu
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Institute of Aging, Key Laboratory of Alzheimer's Disease of Zhejiang Province, The Second Affiliated Hospital, Wenzhou Medical University, Wenzhou, China
| | - Wei Wang
- Department of Epidemiology and Biostatistics, Tianjin Key Laboratory of Molecular Cancer Epidemiology, The Province and Ministry Co-Sponsored Collaborative Innovation Center for Medical Epigenetics, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Xiangling Feng
- Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Bolang Hu
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Institute of Aging, Key Laboratory of Alzheimer's Disease of Zhejiang Province, The Second Affiliated Hospital, Wenzhou Medical University, Wenzhou, China
| | - Jianjin Teng
- Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Tianyi You
- Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Ran Li
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Institute of Aging, Key Laboratory of Alzheimer's Disease of Zhejiang Province, The Second Affiliated Hospital, Wenzhou Medical University, Wenzhou, China
| | - Zhengkai Bao
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Institute of Aging, Key Laboratory of Alzheimer's Disease of Zhejiang Province, The Second Affiliated Hospital, Wenzhou Medical University, Wenzhou, China
| | - Wenhao Pan
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Institute of Aging, Key Laboratory of Alzheimer's Disease of Zhejiang Province, The Second Affiliated Hospital, Wenzhou Medical University, Wenzhou, China
| | - Tielong Yang
- Department of Bone and Soft Tissue Tumor, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Chao Zhang
- Department of Bone and Soft Tissue Tumor, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Ting Li
- Department of Bone and Soft Tissue Tumor, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Xiaobao Dong
- Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Xianfu Yi
- Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Ben Liu
- Department of Epidemiology and Biostatistics, Tianjin Key Laboratory of Molecular Cancer Epidemiology, The Province and Ministry Co-Sponsored Collaborative Innovation Center for Medical Epigenetics, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Li Zhao
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China
| | - Miaoxin Li
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, China
| | - Kexin Chen
- Department of Epidemiology and Biostatistics, Tianjin Key Laboratory of Molecular Cancer Epidemiology, The Province and Ministry Co-Sponsored Collaborative Innovation Center for Medical Epigenetics, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
| | - Weihong Song
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), Institute of Aging, Key Laboratory of Alzheimer's Disease of Zhejiang Province, The Second Affiliated Hospital, Wenzhou Medical University, Wenzhou, China.
| | - Jilong Yang
- Department of Bone and Soft Tissue Tumor, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China.
| | - Mulin Jun Li
- Department of Epidemiology and Biostatistics, Tianjin Key Laboratory of Molecular Cancer Epidemiology, The Province and Ministry Co-Sponsored Collaborative Innovation Center for Medical Epigenetics, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China.
- Department of Bioinformatics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin, China.
| |
Collapse
|
11
|
Li YE, Preissl S, Miller M, Johnson ND, Wang Z, Jiao H, Zhu C, Wang Z, Xie Y, Poirion O, Kern C, Pinto-Duarte A, Tian W, Siletti K, Emerson N, Osteen J, Lucero J, Lin L, Yang Q, Zhu Q, Zemke N, Espinoza S, Yanny AM, Nyhus J, Dee N, Casper T, Shapovalova N, Hirschstein D, Hodge RD, Linnarsson S, Bakken T, Levi B, Keene CD, Shang J, Lein E, Wang A, Behrens MM, Ecker JR, Ren B. A comparative atlas of single-cell chromatin accessibility in the human brain. Science 2023; 382:eadf7044. [PMID: 37824643 PMCID: PMC10852054 DOI: 10.1126/science.adf7044] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 09/14/2023] [Indexed: 10/14/2023]
Abstract
Recent advances in single-cell transcriptomics have illuminated the diverse neuronal and glial cell types within the human brain. However, the regulatory programs governing cell identity and function remain unclear. Using a single-nucleus assay for transposase-accessible chromatin using sequencing (snATAC-seq), we explored open chromatin landscapes across 1.1 million cells in 42 brain regions from three adults. Integrating this data unveiled 107 distinct cell types and their specific utilization of 544,735 candidate cis-regulatory DNA elements (cCREs) in the human genome. Nearly a third of the cCREs demonstrated conservation and chromatin accessibility in the mouse brain cells. We reveal strong links between specific brain cell types and neuropsychiatric disorders including schizophrenia, bipolar disorder, Alzheimer's disease (AD), and major depression, and have developed deep learning models to predict the regulatory roles of noncoding risk variants in these disorders.
Collapse
Affiliation(s)
- Yang Eric Li
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Sebastian Preissl
- Center for Epigenomics, University of California San Diego, School of Medicine, La Jolla, CA 92093, USA
| | - Michael Miller
- Center for Epigenomics, University of California San Diego, School of Medicine, La Jolla, CA 92093, USA
| | | | - Zihan Wang
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Henry Jiao
- Center for Epigenomics, University of California San Diego, School of Medicine, La Jolla, CA 92093, USA
| | - Chenxu Zhu
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Zhaoning Wang
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Yang Xie
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Olivier Poirion
- Center for Epigenomics, University of California San Diego, School of Medicine, La Jolla, CA 92093, USA
| | - Colin Kern
- Center for Epigenomics, University of California San Diego, School of Medicine, La Jolla, CA 92093, USA
| | | | - Wei Tian
- The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Kimberly Siletti
- Division of Molecular Neurobiology, Department of Medical Biochemistry and Biophysics, Karolinska Institute, 171 77 Stockholm, Sweden
| | - Nora Emerson
- The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Julia Osteen
- The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Jacinta Lucero
- The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Lin Lin
- Center for Epigenomics, University of California San Diego, School of Medicine, La Jolla, CA 92093, USA
| | - Qian Yang
- Center for Epigenomics, University of California San Diego, School of Medicine, La Jolla, CA 92093, USA
| | - Quan Zhu
- Center for Epigenomics, University of California San Diego, School of Medicine, La Jolla, CA 92093, USA
| | - Nathan Zemke
- Center for Epigenomics, University of California San Diego, School of Medicine, La Jolla, CA 92093, USA
| | - Sarah Espinoza
- Center for Epigenomics, University of California San Diego, School of Medicine, La Jolla, CA 92093, USA
| | | | - Julie Nyhus
- Allen Institute for Brain Science, Seattle, WA 98109, USA
| | - Nick Dee
- Allen Institute for Brain Science, Seattle, WA 98109, USA
| | - Tamara Casper
- Allen Institute for Brain Science, Seattle, WA 98109, USA
| | | | | | | | - Sten Linnarsson
- Division of Molecular Neurobiology, Department of Medical Biochemistry and Biophysics, Karolinska Institute, 171 77 Stockholm, Sweden
| | - Trygve Bakken
- Allen Institute for Brain Science, Seattle, WA 98109, USA
| | - Boaz Levi
- Allen Institute for Brain Science, Seattle, WA 98109, USA
| | - C Dirk Keene
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98104, USA
| | - Jingbo Shang
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Ed Lein
- Allen Institute for Brain Science, Seattle, WA 98109, USA
| | - Allen Wang
- Center for Epigenomics, University of California San Diego, School of Medicine, La Jolla, CA 92093, USA
| | | | - Joseph R Ecker
- The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
- Howard Hughes Medical Institute, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Bing Ren
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA 92093, USA
- Center for Epigenomics, University of California San Diego, School of Medicine, La Jolla, CA 92093, USA
| |
Collapse
|
12
|
Brennan KJ, Weilert M, Krueger S, Pampari A, Liu HY, Yang AWH, Morrison JA, Hughes TR, Rushlow CA, Kundaje A, Zeitlinger J. Chromatin accessibility in the Drosophila embryo is determined by transcription factor pioneering and enhancer activation. Dev Cell 2023; 58:1898-1916.e9. [PMID: 37557175 PMCID: PMC10592203 DOI: 10.1016/j.devcel.2023.07.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 05/09/2023] [Accepted: 07/13/2023] [Indexed: 08/11/2023]
Abstract
Chromatin accessibility is integral to the process by which transcription factors (TFs) read out cis-regulatory DNA sequences, but it is difficult to differentiate between TFs that drive accessibility and those that do not. Deep learning models that learn complex sequence rules provide an unprecedented opportunity to dissect this problem. Using zygotic genome activation in Drosophila as a model, we analyzed high-resolution TF binding and chromatin accessibility data with interpretable deep learning and performed genetic validation experiments. We identify a hierarchical relationship between the pioneer TF Zelda and the TFs involved in axis patterning. Zelda consistently pioneers chromatin accessibility proportional to motif affinity, whereas patterning TFs augment chromatin accessibility in sequence contexts where they mediate enhancer activation. We conclude that chromatin accessibility occurs in two tiers: one through pioneering, which makes enhancers accessible but not necessarily active, and the second when the correct combination of TFs leads to enhancer activation.
Collapse
Affiliation(s)
- Kaelan J Brennan
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Melanie Weilert
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Sabrina Krueger
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Anusri Pampari
- Department of Computer Science, Stanford University, Palo Alto, CA 94305, USA
| | - Hsiao-Yun Liu
- Department of Biology, New York University, New York, NY 10003, USA
| | - Ally W H Yang
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Jason A Morrison
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Timothy R Hughes
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | | | - Anshul Kundaje
- Department of Computer Science, Stanford University, Palo Alto, CA 94305, USA; Department of Genetics, Stanford University, Palo Alto, CA 94305, USA
| | - Julia Zeitlinger
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA; Department of Pathology & Laboratory Medicine, The University of Kansas Medical Center, Kansas City, KS 66160, USA.
| |
Collapse
|
13
|
Bravo González-Blas C, De Winter S, Hulselmans G, Hecker N, Matetovici I, Christiaens V, Poovathingal S, Wouters J, Aibar S, Aerts S. SCENIC+: single-cell multiomic inference of enhancers and gene regulatory networks. Nat Methods 2023; 20:1355-1367. [PMID: 37443338 PMCID: PMC10482700 DOI: 10.1038/s41592-023-01938-4] [Citation(s) in RCA: 85] [Impact Index Per Article: 85.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Accepted: 06/06/2023] [Indexed: 07/15/2023]
Abstract
Joint profiling of chromatin accessibility and gene expression in individual cells provides an opportunity to decipher enhancer-driven gene regulatory networks (GRNs). Here we present a method for the inference of enhancer-driven GRNs, called SCENIC+. SCENIC+ predicts genomic enhancers along with candidate upstream transcription factors (TFs) and links these enhancers to candidate target genes. To improve both recall and precision of TF identification, we curated and clustered a motif collection with more than 30,000 motifs. We benchmarked SCENIC+ on diverse datasets from different species, including human peripheral blood mononuclear cells, ENCODE cell lines, melanoma cell states and Drosophila retinal development. Next, we exploit SCENIC+ predictions to study conserved TFs, enhancers and GRNs between human and mouse cell types in the cerebral cortex. Finally, we use SCENIC+ to study the dynamics of gene regulation along differentiation trajectories and the effect of TF perturbations on cell state. SCENIC+ is available at scenicplus.readthedocs.io .
Collapse
Affiliation(s)
- Carmen Bravo González-Blas
- VIB Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Seppe De Winter
- VIB Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Gert Hulselmans
- VIB Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Nikolai Hecker
- VIB Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Irina Matetovici
- VIB Center for Brain & Disease Research, Leuven, Belgium
- VIB Tech Watch, VIB Headquarters, Ghent, Belgium
| | - Valerie Christiaens
- VIB Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | | | - Jasper Wouters
- VIB Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Sara Aibar
- VIB Center for Brain & Disease Research, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Stein Aerts
- VIB Center for Brain & Disease Research, Leuven, Belgium.
- Department of Human Genetics, KU Leuven, Leuven, Belgium.
| |
Collapse
|
14
|
Smith GD, Ching WH, Cornejo-Páramo P, Wong ES. Decoding enhancer complexity with machine learning and high-throughput discovery. Genome Biol 2023; 24:116. [PMID: 37173718 PMCID: PMC10176946 DOI: 10.1186/s13059-023-02955-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 04/28/2023] [Indexed: 05/15/2023] Open
Abstract
Enhancers are genomic DNA elements controlling spatiotemporal gene expression. Their flexible organization and functional redundancies make deciphering their sequence-function relationships challenging. This article provides an overview of the current understanding of enhancer organization and evolution, with an emphasis on factors that influence these relationships. Technological advancements, particularly in machine learning and synthetic biology, are discussed in light of how they provide new ways to understand this complexity. Exciting opportunities lie ahead as we continue to unravel the intricacies of enhancer function.
Collapse
Affiliation(s)
- Gabrielle D Smith
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia
| | - Wan Hern Ching
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
| | - Paola Cornejo-Páramo
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia
| | - Emily S Wong
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia.
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia.
| |
Collapse
|
15
|
Kaplow IM, Lawler AJ, Schäffer DE, Srinivasan C, Sestili HH, Wirthlin ME, Phan BN, Prasad K, Brown AR, Zhang X, Foley K, Genereux DP, Karlsson EK, Lindblad-Toh K, Meyer WK, Pfenning AR, Andrews G, Armstrong JC, Bianchi M, Birren BW, Bredemeyer KR, Breit AM, Christmas MJ, Clawson H, Damas J, Di Palma F, Diekhans M, Dong MX, Eizirik E, Fan K, Fanter C, Foley NM, Forsberg-Nilsson K, Garcia CJ, Gatesy J, Gazal S, Genereux DP, Goodman L, Grimshaw J, Halsey MK, Harris AJ, Hickey G, Hiller M, Hindle AG, Hubley RM, Hughes GM, Johnson J, Juan D, Kaplow IM, Karlsson EK, Keough KC, Kirilenko B, Koepfli KP, Korstian JM, Kowalczyk A, Kozyrev SV, Lawler AJ, Lawless C, Lehmann T, Levesque DL, Lewin HA, Li X, Lind A, Lindblad-Toh K, Mackay-Smith A, Marinescu VD, Marques-Bonet T, Mason VC, Meadows JRS, Meyer WK, Moore JE, Moreira LR, Moreno-Santillan DD, Morrill KM, Muntané G, Murphy WJ, Navarro A, Nweeia M, Ortmann S, Osmanski A, Paten B, Paulat NS, Pfenning AR, Phan BN, Pollard KS, Pratt HE, Ray DA, Reilly SK, Rosen JR, Ruf I, Ryan L, Ryder OA, Sabeti PC, Schäffer DE, Serres A, Shapiro B, Smit AFA, Springer M, Srinivasan C, Steiner C, Storer JM, Sullivan KAM, Sullivan PF, Sundström E, Supple MA, Swofford R, Talbot JE, Teeling E, Turner-Maier J, Valenzuela A, Wagner F, Wallerman O, Wang C, Wang J, Weng Z, Wilder AP, Wirthlin ME, Xue JR, Zhang X. Relating enhancer genetic variation across mammals to complex phenotypes using machine learning. Science 2023; 380:eabm7993. [PMID: 37104615 DOI: 10.1126/science.abm7993] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Protein-coding differences between species often fail to explain phenotypic diversity, suggesting the involvement of genomic elements that regulate gene expression such as enhancers. Identifying associations between enhancers and phenotypes is challenging because enhancer activity can be tissue-dependent and functionally conserved despite low sequence conservation. We developed the Tissue-Aware Conservation Inference Toolkit (TACIT) to associate candidate enhancers with species' phenotypes using predictions from machine learning models trained on specific tissues. Applying TACIT to associate motor cortex and parvalbumin-positive interneuron enhancers with neurological phenotypes revealed dozens of enhancer-phenotype associations, including brain size-associated enhancers that interact with genes implicated in microcephaly or macrocephaly. TACIT provides a foundation for identifying enhancers associated with the evolution of any convergently evolved phenotype in any large group of species with aligned genomes.
Collapse
Affiliation(s)
- Irene M Kaplow
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Alyssa J Lawler
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Daniel E Schäffer
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Chaitanya Srinivasan
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Heather H Sestili
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Morgan E Wirthlin
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - BaDoi N Phan
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
- Medical Scientist Training Program, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Kavya Prasad
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Ashley R Brown
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Xiaomeng Zhang
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Kathleen Foley
- Department of Biological Sciences, Lehigh University, Bethlehem, PA, USA
| | - Diane P Genereux
- Broad Institute, Cambridge, MA, USA
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Elinor K Karlsson
- Broad Institute, Cambridge, MA, USA
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Kerstin Lindblad-Toh
- Broad Institute, Cambridge, MA, USA
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Wynn K Meyer
- Department of Biological Sciences, Lehigh University, Bethlehem, PA, USA
| | - Andreas R Pfenning
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Zemke NR, Armand EJ, Wang W, Lee S, Zhou J, Li YE, Liu H, Tian W, Nery JR, Castanon RG, Bartlett A, Osteen JK, Li D, Zhuo X, Xu V, Miller M, Krienen FM, Zhang Q, Taskin N, Ting J, Feng G, McCarroll SA, Callaway EM, Wang T, Behrens MM, Lein ES, Ecker JR, Ren B. Comparative single cell epigenomic analysis of gene regulatory programs in the rodent and primate neocortex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.08.536119. [PMID: 37066152 PMCID: PMC10104177 DOI: 10.1101/2023.04.08.536119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Sequence divergence of cis- regulatory elements drives species-specific traits, but how this manifests in the evolution of the neocortex at the molecular and cellular level remains to be elucidated. We investigated the gene regulatory programs in the primary motor cortex of human, macaque, marmoset, and mouse with single-cell multiomics assays, generating gene expression, chromatin accessibility, DNA methylome, and chromosomal conformation profiles from a total of over 180,000 cells. For each modality, we determined species-specific, divergent, and conserved gene expression and epigenetic features at multiple levels. We find that cell type-specific gene expression evolves more rapidly than broadly expressed genes and that epigenetic status at distal candidate cis -regulatory elements (cCREs) evolves faster than promoters. Strikingly, transposable elements (TEs) contribute to nearly 80% of the human-specific cCREs in cortical cells. Through machine learning, we develop sequence-based predictors of cCREs in different species and demonstrate that the genomic regulatory syntax is highly preserved from rodents to primates. Lastly, we show that epigenetic conservation combined with sequence similarity helps uncover functional cis -regulatory elements and enhances our ability to interpret genetic variants contributing to neurological disease and traits.
Collapse
|
17
|
Dong C, Shen S, Keleş S. AdaLiftOver: high-resolution identification of orthologous regulatory elements with Adaptive liftOver. Bioinformatics 2023; 39:btad149. [PMID: 37004197 PMCID: PMC10085516 DOI: 10.1093/bioinformatics/btad149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 03/02/2023] [Accepted: 03/20/2023] [Indexed: 04/03/2023] Open
Abstract
MOTIVATION Elucidating functionally similar orthologous regulatory regions for human and model organism genomes is critical for exploiting model organism research and advancing our understanding of results from genome-wide association studies (GWAS). Sequence conservation is the de facto approach for finding orthologous non-coding regions between human and model organism genomes. However, existing methods for mapping non-coding genomic regions across species are challenged by the multi-mapping, low precision, and low mapping rate issues. RESULTS We develop Adaptive liftOver (AdaLiftOver), a large-scale computational tool for identifying functionally similar orthologous non-coding regions across species. AdaLiftOver builds on the UCSC liftOver framework to extend the query regions and prioritizes the resulting candidate target regions based on the conservation of the epigenomic and the sequence grammar features. Evaluations of AdaLiftOver with multiple case studies, spanning both genomic intervals from epigenome datasets across a wide range of model organisms and GWAS SNPs, yield AdaLiftOver as a versatile method for deriving hard-to-obtain human epigenome datasets as well as reliably identifying orthologous loci for GWAS SNPs. AVAILABILITY AND IMPLEMENTATION The R package and the data for AdaLiftOver is available from https://github.com/keleslab/AdaLiftOver.
Collapse
Affiliation(s)
- Chenyang Dong
- Department of Statistics, University of Wisconsin-Madison, 1300 University Avenue, Madison, WI 53706, USA
| | - Siqi Shen
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, WARF Room 201, 610 Walnut Street, Madison, WI 53706, USA
| | - Sündüz Keleş
- Department of Statistics, University of Wisconsin-Madison, 1300 University Avenue, Madison, WI 53706, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, WARF Room 201, 610 Walnut Street, Madison, WI 53706, USA
| |
Collapse
|
18
|
Rusin LY. Evolution of homology: From archetype towards a holistic concept of cell type. J Morphol 2023; 284:e21569. [PMID: 36789784 DOI: 10.1002/jmor.21569] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 01/10/2023] [Accepted: 02/13/2023] [Indexed: 02/16/2023]
Abstract
The concept of homology lies in the heart of comparative biological science. The distinction between homology as structure and analogy as function has shaped the evolutionary paradigm for a century and formed the axis of comparative anatomy and embryology, which accept the identity of structure as a ground measure of relatedness. The advent of single-cell genomics overturned the classical view of cell homology by establishing a backbone regulatory identity of cell types, the basic biological units bridging the molecular and phenotypic dimensions, to reveal that the cell is the most flexible unit of living matter and that many approaches of classical biology need to be revised to understand evolution and diversity at the cellular level. The emerging theory of cell types explicitly decouples cell identity from phenotype, essentially allowing for the divergence of evolutionarily related morphotypes beyond recognition, as well as it decouples ontogenetic cell lineage from cell-type phylogeny, whereby explicating that cell types can share common descent regardless of their structure, function or developmental origin. The article succinctly summarizes current progress and opinion in this field and formulates a more generalistic view of biological cell types as avatars, transient or terminal cell states deployed in a continuum of states by the developmental programme of one and the same omnipotent cell, capable of changing or combining identities with distinct evolutionary histories or inventing ad hoc identities that never existed in evolution or development. It highlights how the new logic grounded in the regulatory nature of cell identity transforms the concepts of cell homology and phenotypic stability, suggesting that cellular evolution is inherently and massively network-like, with one-to-one homologies being rather uncommon and restricted to shallower levels of the animal tree of life.
Collapse
Affiliation(s)
- Leonid Y Rusin
- Laboratory for Mathematic Methods and Models in Bioinformatics, Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, Russia
- EvoGenome Analytics LLC, Odintsovo, Moscow Region, Russia
| |
Collapse
|
19
|
Wang C, Zou Q, Ju Y, Shi H. Enhancer-FRL: Improved and Robust Identification of Enhancers and Their Activities Using Feature Representation Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:967-975. [PMID: 36063523 DOI: 10.1109/tcbb.2022.3204365] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Enhancers are crucial for precise regulation of gene expression, while enhancer identification and strength prediction are challenging because of their free distribution and tremendous number of similar fractions in the genome. Although several bioinformatics tools have been developed, shortfalls in these models remain, and their performances need further improvement. In the present study, a two-layer predictor called Enhancer-FRL was proposed for identifying enhancers (enhancers or nonenhancers) and their activities (strong and weak). More specifically, to build an efficient model, the feature representation learning scheme was applied to generate a 50D probabilistic vector based on 10 feature encodings and five machine learning algorithms. Subsequently, the multiview probabilistic features were integrated to construct the final prediction model. Compared with the single feature-based model, Enhancer-FRL showed significant performance improvement and model robustness. Performance assessment on the independent test dataset indicated that the proposed model outperformed state-of-the-art available toolkits. The webserver Enhancer-FRL is freely accessible at http://lab.malab.cn/∼wangchao/softwares/Enhancer-FRL/, The code and datasets can be downloaded at the webserver page or at the Github https://github.com/wangchao-malab/Enhancer-FRL/.
Collapse
|
20
|
Kim S, Wysocka J. Deciphering the multi-scale, quantitative cis-regulatory code. Mol Cell 2023; 83:373-392. [PMID: 36693380 PMCID: PMC9898153 DOI: 10.1016/j.molcel.2022.12.032] [Citation(s) in RCA: 57] [Impact Index Per Article: 57.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 12/29/2022] [Accepted: 12/30/2022] [Indexed: 01/24/2023]
Abstract
Uncovering the cis-regulatory code that governs when and how much each gene is transcribed in a given genome and cellular state remains a central goal of biology. Here, we discuss major layers of regulation that influence how transcriptional outputs are encoded by DNA sequence and cellular context. We first discuss how transcription factors bind specific DNA sequences in a dosage-dependent and cooperative manner and then proceed to the cofactors that facilitate transcription factor function and mediate the activity of modular cis-regulatory elements such as enhancers, silencers, and promoters. We then consider the complex and poorly understood interplay of these diverse elements within regulatory landscapes and its relationships with chromatin states and nuclear organization. We propose that a mechanistically informed, quantitative model of transcriptional regulation that integrates these multiple regulatory layers will be the key to ultimately cracking the cis-regulatory code.
Collapse
Affiliation(s)
- Seungsoo Kim
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305, USA; Institute for Stem Cell Biology and Regenerative Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Joanna Wysocka
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Chemical and Systems Biology, Stanford University School of Medicine, Stanford, CA 94305, USA; Department of Developmental Biology, Stanford University School of Medicine, Stanford, CA 94305, USA; Institute for Stem Cell Biology and Regenerative Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA.
| |
Collapse
|
21
|
Li Z, Gao E, Zhou J, Han W, Xu X, Gao X. Applications of deep learning in understanding gene regulation. CELL REPORTS METHODS 2023; 3:100384. [PMID: 36814848 PMCID: PMC9939384 DOI: 10.1016/j.crmeth.2022.100384] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
Gene regulation is a central topic in cell biology. Advances in omics technologies and the accumulation of omics data have provided better opportunities for gene regulation studies than ever before. For this reason deep learning, as a data-driven predictive modeling approach, has been successfully applied to this field during the past decade. In this article, we aim to give a brief yet comprehensive overview of representative deep-learning methods for gene regulation. Specifically, we discuss and compare the design principles and datasets used by each method, creating a reference for researchers who wish to replicate or improve existing methods. We also discuss the common problems of existing approaches and prospectively introduce the emerging deep-learning paradigms that will potentially alleviate them. We hope that this article will provide a rich and up-to-date resource and shed light on future research directions in this area.
Collapse
Affiliation(s)
- Zhongxiao Li
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Elva Gao
- The KAUST School, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Juexiao Zhou
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Wenkai Han
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Xiaopeng Xu
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Xin Gao
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| |
Collapse
|
22
|
Cazares TA, Rizvi FW, Iyer B, Chen X, Kotliar M, Bejjani AT, Wayman JA, Donmez O, Wronowski B, Parameswaran S, Kottyan LC, Barski A, Weirauch MT, Prasath VBS, Miraldi ER. maxATAC: Genome-scale transcription-factor binding prediction from ATAC-seq with deep neural networks. PLoS Comput Biol 2023; 19:e1010863. [PMID: 36719906 PMCID: PMC9917285 DOI: 10.1371/journal.pcbi.1010863] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 02/10/2023] [Accepted: 01/10/2023] [Indexed: 02/01/2023] Open
Abstract
Transcription factors read the genome, fundamentally connecting DNA sequence to gene expression across diverse cell types. Determining how, where, and when TFs bind chromatin will advance our understanding of gene regulatory networks and cellular behavior. The 2017 ENCODE-DREAM in vivo Transcription-Factor Binding Site (TFBS) Prediction Challenge highlighted the value of chromatin accessibility data to TFBS prediction, establishing state-of-the-art methods for TFBS prediction from DNase-seq. However, the more recent Assay-for-Transposase-Accessible-Chromatin (ATAC)-seq has surpassed DNase-seq as the most widely-used chromatin accessibility profiling method. Furthermore, ATAC-seq is the only such technique available at single-cell resolution from standard commercial platforms. While ATAC-seq datasets grow exponentially, suboptimal motif scanning is unfortunately the most common method for TFBS prediction from ATAC-seq. To enable community access to state-of-the-art TFBS prediction from ATAC-seq, we (1) curated an extensive benchmark dataset (127 TFs) for ATAC-seq model training and (2) built "maxATAC", a suite of user-friendly, deep neural network models for genome-wide TFBS prediction from ATAC-seq in any cell type. With models available for 127 human TFs, maxATAC is the largest collection of high-performance TFBS prediction models for ATAC-seq. maxATAC performance extends to primary cells and single-cell ATAC-seq, enabling improved TFBS prediction in vivo. We demonstrate maxATAC's capabilities by identifying TFBS associated with allele-dependent chromatin accessibility at atopic dermatitis genetic risk loci.
Collapse
Affiliation(s)
- Tareian A. Cazares
- Immunology Graduate Program, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
| | - Faiz W. Rizvi
- Systems Biology and Physiology Graduate Program, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
| | - Balaji Iyer
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, Ohio, United States of America
| | - Xiaoting Chen
- The Center for Autoimmune Genetics and Etiology (CAGE), Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Michael Kotliar
- Division of Allergy and Immunology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Anthony T. Bejjani
- Molecular and Developmental Biology Graduate Program, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
| | - Joseph A. Wayman
- Division of Immunobiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Omer Donmez
- The Center for Autoimmune Genetics and Etiology (CAGE), Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Benjamin Wronowski
- Division of Allergy and Immunology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Sreeja Parameswaran
- The Center for Autoimmune Genetics and Etiology (CAGE), Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Leah C. Kottyan
- The Center for Autoimmune Genetics and Etiology (CAGE), Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
- Division of Human Genetics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Artem Barski
- Division of Allergy and Immunology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
- Division of Human Genetics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Matthew T. Weirauch
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- The Center for Autoimmune Genetics and Etiology (CAGE), Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
- Division of Human Genetics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - V. B. Surya Prasath
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, Ohio, United States of America
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
| | - Emily R. Miraldi
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, Ohio, United States of America
- Division of Immunobiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
| |
Collapse
|
23
|
Toneyan S, Tang Z, Koo PK. Evaluating deep learning for predicting epigenomic profiles. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00570-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
24
|
Abstract
Human accelerated regions (HARs) are the fastest-evolving sequences in the human genome. When HARs were discovered in 2006, their function was mysterious due to scant annotation of the noncoding genome. Diverse technologies, from transgenic animals to machine learning, have consistently shown that HARs function as gene regulatory enhancers with significant enrichment in neurodevelopment. It is now possible to quantitatively measure the enhancer activity of thousands of HARs in parallel and model how each nucleotide contributes to gene expression. These strategies have revealed that many human HAR sequences function differently than their chimpanzee orthologs, though individual nucleotide changes in the same HAR may have opposite effects, consistent with compensatory substitutions. To fully evaluate the role of HARs in human evolution, it will be necessary to experimentally and computationally dissect them across more cell types and developmental stages.
Collapse
Affiliation(s)
- Sean Whalen
- Gladstone Institute of Data Science and Biotechnology, San Francisco, California, USA; ,
| | - Katherine S Pollard
- Gladstone Institute of Data Science and Biotechnology, San Francisco, California, USA; ,
- Department of Epidemiology and Biostatistics, University of California, San Francisco, California, USA
- Chan Zuckerberg Biohub, San Francisco, California, USA
| |
Collapse
|
25
|
Mazo-Vargas A, Langmüller AM, Wilder A, van der Burg KRL, Lewis JJ, Messer PW, Zhang L, Martin A, Reed RD. Deep cis-regulatory homology of the butterfly wing pattern ground plan. Science 2022; 378:304-308. [PMID: 36264807 DOI: 10.1126/science.abi9407] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Butterfly wing patterns derive from a deeply conserved developmental ground plan yet are diverse and evolve rapidly. It is poorly understood how gene regulatory architectures can accommodate both deep homology and adaptive change. To address this, we characterized the cis-regulatory evolution of the color pattern gene WntA in nymphalid butterflies. Comparative assay for transposase-accessible chromatin using sequencing (ATAC-seq) and in vivo deletions spanning 46 cis-regulatory elements across five species revealed deep homology of ground plan-determining sequences, except in monarch butterflies. Furthermore, noncoding deletions displayed both positive and negative regulatory effects that were often broad in nature. Our results provide little support for models predicting rapid enhancer turnover and suggest that deeply ancestral, multifunctional noncoding elements can underlie rapidly evolving trait systems.
Collapse
Affiliation(s)
- Anyi Mazo-Vargas
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY, USA.,Department of Biological Sciences, The George Washington University, Washington, DC, USA
| | - Anna M Langmüller
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
| | - Alexis Wilder
- Department of Biological Sciences, The George Washington University, Washington, DC, USA
| | | | - James J Lewis
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY, USA.,Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA
| | - Philipp W Messer
- Department of Computational Biology, Cornell University, Ithaca, NY, USA
| | - Linlin Zhang
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY, USA.,CAS and Shandong Province Key Laboratory of Experimental Marine Biology, Center for Ocean Mega-Science, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, China
| | - Arnaud Martin
- Department of Biological Sciences, The George Washington University, Washington, DC, USA
| | - Robert D Reed
- Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, NY, USA
| |
Collapse
|
26
|
Cai X, Teng J, Ren D, Zhang H, Li J, Zhang Z. Model Comparison of Heritability Enrichment Analysis in Livestock Population. Genes (Basel) 2022; 13:genes13091644. [PMID: 36140810 PMCID: PMC9498849 DOI: 10.3390/genes13091644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Revised: 09/03/2022] [Accepted: 09/10/2022] [Indexed: 11/16/2022] Open
Abstract
Heritability enrichment analysis is an important means of exploring the genetic architecture of complex traits in human genetics. Heritability enrichment is typically defined as the proportion of an SNP subset explained heritability, divided by the proportion of SNPs. Heritability enrichment enables better study of underlying complex traits, such as functional variant/gene subsets, biological networks and metabolic pathways detected through integrating explosively increased omics data. This would be beneficial for genomic prediction of disease risk in humans and genetic values estimation of important economical traits in livestock and plant species. However, in livestock, factors affecting the heritability enrichment estimation of complex traits have not been examined. Previous studies on humans reported that the frequencies, effect sizes, and levels of linkage disequilibrium (LD) of underlying causal variants (CVs) would affect the heritability enrichment estimation. Therefore, the distribution of heritability across the genome should be fully considered to obtain the unbiased estimation of heritability enrichment. To explore the performance of different heritability enrichment models in livestock populations, we used the VanRaden, GCTA and α models, assuming different α values, and the LDAK model, considering LD weight. We simulated three types of phenotypes, with CVs from various minor allele frequency (MAF) ranges: genome-wide (0.005 ≤ MAF ≤ 0.5), common (0.05 ≤ MAF ≤ 0.5), and uncommon (0.01 ≤ MAF < 0.05). The performances of the models with two different subsets (one of which contained known CVs and the other consisting of randomly selected markers) were compared to verify the accuracy of heritability enrichment estimation of functional variant sets. Our results showed that models with known CV subsets provided more robust enrichment estimation. Models with different α values tended to provide stable and accurate estimates for common and genome-wide CVs (relative deviation 0.5−2.2%), while tending to underestimate the enrichment of uncommon CVs. As the α value increased, enrichments from 15.73% higher than true value (i.e., 3.00) to 48.93% lower than true value for uncommon CVs were observed. In addition, the long-range LD windows (e.g., 5000 kb) led to large bias of the enrichment estimations for both common and uncommon CVs. Overall, heritability enrichment estimations were sensitive for the α value assumption and LD weight consideration of different models. Accuracy would be greatly improved by using a suitable model. This study would be helpful in understanding the genetic architecture of complex traits and provides a reference for genetic analysis in the livestock population.
Collapse
|
27
|
Hammelman J, Patel T, Closser M, Wichterle H, Gifford D. Ranking reprogramming factors for cell differentiation. Nat Methods 2022; 19:812-822. [PMID: 35710610 PMCID: PMC10460539 DOI: 10.1038/s41592-022-01522-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 05/13/2022] [Indexed: 12/16/2022]
Abstract
Transcription factor over-expression is a proven method for reprogramming cells to a desired cell type for regenerative medicine and therapeutic discovery. However, a general method for the identification of reprogramming factors to create an arbitrary cell type is an open problem. Here we examine the success rate of methods and data for differentiation by testing the ability of nine computational methods (CellNet, GarNet, EBseq, AME, DREME, HOMER, KMAC, diffTF and DeepAccess) to discover and rank candidate factors for eight target cell types with known reprogramming solutions. We compare methods that use gene expression, biological networks and chromatin accessibility data, and comprehensively test parameter and preprocessing of input data to optimize performance. We find the best factor identification methods can identify an average of 50-60% of reprogramming factors within the top ten candidates, and methods that use chromatin accessibility perform the best. Among the chromatin accessibility methods, complex methods DeepAccess and diffTF have higher correlation with the ranked significance of transcription factor candidates within reprogramming protocols for differentiation. We provide evidence that AME and diffTF are optimal methods for transcription factor recovery that will allow for systematic prioritization of transcription factor candidates to aid in the design of new reprogramming protocols.
Collapse
Affiliation(s)
- Jennifer Hammelman
- Computational and Systems Biology, MIT, Cambridge, MA, USA
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA
| | - Tulsi Patel
- Departments of Pathology and Cell Biology, Neuroscience, Rehabilitation and Regenerative Medicine (in Neurology), Columbia University Irving Medical Center, New York, NY, USA
- Center for Motor Neuron Biology and Disease, Columbia University Irving Medical Center, New York, NY, USA
- Columbia Stem Cell Initiative, Columbia University Irving Medical Center, New York, NY, USA
| | - Michael Closser
- Departments of Pathology and Cell Biology, Neuroscience, Rehabilitation and Regenerative Medicine (in Neurology), Columbia University Irving Medical Center, New York, NY, USA
- Center for Motor Neuron Biology and Disease, Columbia University Irving Medical Center, New York, NY, USA
- Columbia Stem Cell Initiative, Columbia University Irving Medical Center, New York, NY, USA
| | - Hynek Wichterle
- Departments of Pathology and Cell Biology, Neuroscience, Rehabilitation and Regenerative Medicine (in Neurology), Columbia University Irving Medical Center, New York, NY, USA
- Center for Motor Neuron Biology and Disease, Columbia University Irving Medical Center, New York, NY, USA
- Columbia Stem Cell Initiative, Columbia University Irving Medical Center, New York, NY, USA
| | - David Gifford
- Computational and Systems Biology, MIT, Cambridge, MA, USA.
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA.
- Department of Biological Engineering, MIT, Cambridge, MA, USA.
- Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA, USA.
| |
Collapse
|
28
|
Abstract
The tremendous amount of biological sequence data available, combined with the recent methodological breakthrough in deep learning in domains such as computer vision or natural language processing, is leading today to the transformation of bioinformatics through the emergence of deep genomics, the application of deep learning to genomic sequences. We review here the new applications that the use of deep learning enables in the field, focusing on three aspects: the functional annotation of genomes, the sequence determinants of the genome functions and the possibility to write synthetic genomic sequences.
Collapse
|
29
|
Lawler AJ, Ramamurthy E, Brown AR, Shin N, Kim Y, Toong N, Kaplow IM, Wirthlin M, Zhang X, Phan BN, Fox GA, Wade K, He J, Ozturk BE, Byrne LC, Stauffer WR, Fish KN, Pfenning AR. Machine learning sequence prioritization for cell type-specific enhancer design. eLife 2022; 11:69571. [PMID: 35576146 PMCID: PMC9110026 DOI: 10.7554/elife.69571] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Accepted: 04/25/2022] [Indexed: 11/22/2022] Open
Abstract
Recent discoveries of extreme cellular diversity in the brain warrant rapid development of technologies to access specific cell populations within heterogeneous tissue. Available approaches for engineering-targeted technologies for new neuron subtypes are low yield, involving intensive transgenic strain or virus screening. Here, we present Specific Nuclear-Anchored Independent Labeling (SNAIL), an improved virus-based strategy for cell labeling and nuclear isolation from heterogeneous tissue. SNAIL works by leveraging machine learning and other computational approaches to identify DNA sequence features that confer cell type-specific gene activation and then make a probe that drives an affinity purification-compatible reporter gene. As a proof of concept, we designed and validated two novel SNAIL probes that target parvalbumin-expressing (PV+) neurons. Nuclear isolation using SNAIL in wild-type mice is sufficient to capture characteristic open chromatin features of PV+ neurons in the cortex, striatum, and external globus pallidus. The SNAIL framework also has high utility for multispecies cell probe engineering; expression from a mouse PV+ SNAIL enhancer sequence was enriched in PV+ neurons of the macaque cortex. Expansion of this technology has broad applications in cell type-specific observation, manipulation, and therapeutics across species and disease models.
Collapse
Affiliation(s)
- Alyssa J Lawler
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, United States.,Biological Sciences Department, Mellon College of Science, Carnegie Mellon University, Pittsburgh, United States.,Neuroscience Institute, Carnegie Mellon University, Pittsburgh, United States
| | - Easwaran Ramamurthy
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, United States.,Neuroscience Institute, Carnegie Mellon University, Pittsburgh, United States
| | - Ashley R Brown
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, United States.,Neuroscience Institute, Carnegie Mellon University, Pittsburgh, United States
| | - Naomi Shin
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, United States.,Neuroscience Institute, Carnegie Mellon University, Pittsburgh, United States
| | - Yeonju Kim
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, United States.,Neuroscience Institute, Carnegie Mellon University, Pittsburgh, United States
| | - Noelle Toong
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, United States.,Neuroscience Institute, Carnegie Mellon University, Pittsburgh, United States
| | - Irene M Kaplow
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, United States.,Neuroscience Institute, Carnegie Mellon University, Pittsburgh, United States
| | - Morgan Wirthlin
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, United States.,Neuroscience Institute, Carnegie Mellon University, Pittsburgh, United States
| | - Xiaoyu Zhang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, United States.,Neuroscience Institute, Carnegie Mellon University, Pittsburgh, United States
| | - BaDoi N Phan
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, United States.,Neuroscience Institute, Carnegie Mellon University, Pittsburgh, United States.,Medical Scientist Training Program, University of Pittsburgh, Pittsburgh, United States
| | - Grant A Fox
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, United States.,Neuroscience Institute, Carnegie Mellon University, Pittsburgh, United States
| | - Kirsten Wade
- Department of Psychiatry, Translational Neuroscience Program, University of Pittsburgh, Pittsburgh, United States
| | - Jing He
- Department of Neurobiology, University of Pittsburgh, Pittsburgh, United States.,Systems Neuroscience Center, Brain Institute, Center for Neuroscience, Center for the Neural Basis of Cognition, Pittsburgh, United States
| | - Bilge Esin Ozturk
- Department of Ophthalmology, University of Pittsburgh, Pittsburgh, United States
| | - Leah C Byrne
- Department of Neurobiology, University of Pittsburgh, Pittsburgh, United States.,Department of Ophthalmology, University of Pittsburgh, Pittsburgh, United States.,Division of Experimental Retinal Therapies, Department of Clinical Sciences & Advanced Medicine, School of Veterinary Medicine, University of Pennsylvania, Philadelphia, United States.,Department of Bioengineering, University of Pittsburgh, Pittsburgh, United States
| | - William R Stauffer
- Department of Neurobiology, University of Pittsburgh, Pittsburgh, United States
| | - Kenneth N Fish
- Department of Psychiatry, Translational Neuroscience Program, University of Pittsburgh, Pittsburgh, United States
| | - Andreas R Pfenning
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, United States.,Neuroscience Institute, Carnegie Mellon University, Pittsburgh, United States
| |
Collapse
|
30
|
DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers. Nat Genet 2022; 54:613-624. [PMID: 35551305 DOI: 10.1038/s41588-022-01048-5] [Citation(s) in RCA: 69] [Impact Index Per Article: 34.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Accepted: 03/08/2022] [Indexed: 02/06/2023]
Abstract
Enhancer sequences control gene expression and comprise binding sites (motifs) for different transcription factors (TFs). Despite extensive genetic and computational studies, the relationship between DNA sequence and regulatory activity is poorly understood, and de novo enhancer design has been challenging. Here, we built a deep-learning model, DeepSTARR, to quantitatively predict the activities of thousands of developmental and housekeeping enhancers directly from DNA sequence in Drosophila melanogaster S2 cells. The model learned relevant TF motifs and higher-order syntax rules, including functionally nonequivalent instances of the same TF motif that are determined by motif-flanking sequence and intermotif distances. We validated these rules experimentally and demonstrated that they can be generalized to humans by testing more than 40,000 wildtype and mutant Drosophila and human enhancers. Finally, we designed and functionally validated synthetic enhancers with desired activities de novo.
Collapse
|
31
|
Kaplow IM, Schäffer DE, Wirthlin ME, Lawler AJ, Brown AR, Kleyman M, Pfenning AR. Inferring mammalian tissue-specific regulatory conservation by predicting tissue-specific differences in open chromatin. BMC Genomics 2022; 23:291. [PMID: 35410163 PMCID: PMC8996547 DOI: 10.1186/s12864-022-08450-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 03/07/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Evolutionary conservation is an invaluable tool for inferring functional significance in the genome, including regions that are crucial across many species and those that have undergone convergent evolution. Computational methods to test for sequence conservation are dominated by algorithms that examine the ability of one or more nucleotides to align across large evolutionary distances. While these nucleotide alignment-based approaches have proven powerful for protein-coding genes and some non-coding elements, they fail to capture conservation of many enhancers, distal regulatory elements that control spatial and temporal patterns of gene expression. The function of enhancers is governed by a complex, often tissue- and cell type-specific code that links combinations of transcription factor binding sites and other regulation-related sequence patterns to regulatory activity. Thus, function of orthologous enhancer regions can be conserved across large evolutionary distances, even when nucleotide turnover is high. RESULTS We present a new machine learning-based approach for evaluating enhancer conservation that leverages the combinatorial sequence code of enhancer activity rather than relying on the alignment of individual nucleotides. We first train a convolutional neural network model that can predict tissue-specific open chromatin, a proxy for enhancer activity, across mammals. Next, we apply that model to distinguish instances where the genome sequence would predict conserved function versus a loss of regulatory activity in that tissue. We present criteria for systematically evaluating model performance for this task and use them to demonstrate that our models accurately predict tissue-specific conservation and divergence in open chromatin between primate and rodent species, vastly out-performing leading nucleotide alignment-based approaches. We then apply our models to predict open chromatin at orthologs of brain and liver open chromatin regions across hundreds of mammals and find that brain enhancers associated with neuron activity have a stronger tendency than the general population to have predicted lineage-specific open chromatin. CONCLUSION The framework presented here provides a mechanism to annotate tissue-specific regulatory function across hundreds of genomes and to study enhancer evolution using predicted regulatory differences rather than nucleotide-level conservation measurements.
Collapse
Affiliation(s)
- Irene M Kaplow
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA. .,Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA.
| | - Daniel E Schäffer
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Morgan E Wirthlin
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA.,Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Alyssa J Lawler
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA.,Department of Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Ashley R Brown
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA.,Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Michael Kleyman
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA.,Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Andreas R Pfenning
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA. .,Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA. .,Department of Biology, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|
32
|
Castro-Mondragon JA, Riudavets-Puig R, Rauluseviciute I, Berhanu Lemma R, Turchi L, Blanc-Mathieu R, Lucas J, Boddie P, Khan A, Manosalva Pérez N, Fornes O, Leung T, Aguirre A, Hammal F, Schmelter D, Baranasic D, Ballester B, Sandelin A, Lenhard B, Vandepoele K, Wasserman WW, Parcy F, Mathelier A. JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res 2022; 50:D165-D173. [PMID: 34850907 PMCID: PMC8728201 DOI: 10.1093/nar/gkab1113] [Citation(s) in RCA: 819] [Impact Index Per Article: 409.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 10/20/2021] [Accepted: 10/22/2021] [Indexed: 12/18/2022] Open
Abstract
JASPAR (http://jaspar.genereg.net/) is an open-access database containing manually curated, non-redundant transcription factor (TF) binding profiles for TFs across six taxonomic groups. In this 9th release, we expanded the CORE collection with 341 new profiles (148 for plants, 101 for vertebrates, 85 for urochordates, and 7 for insects), which corresponds to a 19% expansion over the previous release. We added 298 new profiles to the Unvalidated collection when no orthogonal evidence was found in the literature. All the profiles were clustered to provide familial binding profiles for each taxonomic group. Moreover, we revised the structural classification of DNA binding domains to consider plant-specific TFs. This release introduces word clouds to represent the scientific knowledge associated with each TF. We updated the genome tracks of TFBSs predicted with JASPAR profiles in eight organisms; the human and mouse TFBS predictions can be visualized as native tracks in the UCSC Genome Browser. Finally, we provide a new tool to perform JASPAR TFBS enrichment analysis in user-provided genomic regions. All the data is accessible through the JASPAR website, its associated RESTful API, the R/Bioconductor data package, and a new Python package, pyJASPAR, that facilitates serverless access to the data.
Collapse
Affiliation(s)
- Jaime A Castro-Mondragon
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Rafael Riudavets-Puig
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Ieva Rauluseviciute
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Roza Berhanu Lemma
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Laura Turchi
- Laboratoire Physiologie Cellulaire et Végétale, Univ. Grenoble Alpes, CNRS, CEA, INRAE, IRIG-DBSCI-LPCV, 17 avenue des martyrsF-38054, Grenoble, France
| | - Romain Blanc-Mathieu
- Laboratoire Physiologie Cellulaire et Végétale, Univ. Grenoble Alpes, CNRS, CEA, INRAE, IRIG-DBSCI-LPCV, 17 avenue des martyrsF-38054, Grenoble, France
| | - Jeremy Lucas
- Laboratoire Physiologie Cellulaire et Végétale, Univ. Grenoble Alpes, CNRS, CEA, INRAE, IRIG-DBSCI-LPCV, 17 avenue des martyrsF-38054, Grenoble, France
| | - Paul Boddie
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Aziz Khan
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA94305, USA
| | - Nicolás Manosalva Pérez
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, 9052 Ghent, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, 9052 Ghent, Belgium
| | - Oriol Fornes
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - Tiffany Y Leung
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - Alejandro Aguirre
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | | | - Daniel Schmelter
- UCSC Genome Browser, University of California Santa Cruz, Santa Cruz, CA95060, USA
| | - Damir Baranasic
- MRC London Institute of Medical Sciences, Du Cane Road, London, W12 0NN, UK
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London W12 0NN, UK
| | | | - Albin Sandelin
- The Bioinformatics Centre, Department of Biology & Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaloes Vej 5, DK2200 Copenhagen N, Denmark
| | - Boris Lenhard
- MRC London Institute of Medical Sciences, Du Cane Road, London, W12 0NN, UK
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London W12 0NN, UK
| | - Klaas Vandepoele
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, 9052 Ghent, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, 9052 Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, Technologiepark 71, 9052 Ghent, Belgium
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - François Parcy
- Laboratoire Physiologie Cellulaire et Végétale, Univ. Grenoble Alpes, CNRS, CEA, INRAE, IRIG-DBSCI-LPCV, 17 avenue des martyrsF-38054, Grenoble, France
| | - Anthony Mathelier
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
- Department of Medical Genetics, Institute of Clinical Medicine, University of Oslo and Oslo University Hospital, Oslo, Norway
| |
Collapse
|
33
|
Janssens J, Aibar S, Taskiran II, Ismail JN, Gomez AE, Aughey G, Spanier KI, De Rop FV, González-Blas CB, Dionne M, Grimes K, Quan XJ, Papasokrati D, Hulselmans G, Makhzami S, De Waegeneer M, Christiaens V, Southall T, Aerts S. Decoding gene regulation in the fly brain. Nature 2022; 601:630-636. [PMID: 34987221 DOI: 10.1038/s41586-021-04262-z] [Citation(s) in RCA: 71] [Impact Index Per Article: 35.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Accepted: 11/17/2021] [Indexed: 12/13/2022]
Abstract
The Drosophila brain is a frequently used model in neuroscience. Single-cell transcriptome analysis1-6, three-dimensional morphological classification7 and electron microscopy mapping of the connectome8,9 have revealed an immense diversity of neuronal and glial cell types that underlie an array of functional and behavioural traits in the fly. The identities of these cell types are controlled by gene regulatory networks (GRNs), involving combinations of transcription factors that bind to genomic enhancers to regulate their target genes. Here, to characterize GRNs at the cell-type level in the fly brain, we profiled the chromatin accessibility of 240,919 single cells spanning 9 developmental timepoints and integrated these data with single-cell transcriptomes. We identify more than 95,000 regulatory regions that are used in different neuronal cell types, of which 70,000 are linked to developmental trajectories involving neurogenesis, reprogramming and maturation. For 40 cell types, uniquely accessible regions were associated with their expressed transcription factors and downstream target genes through a combination of motif discovery, network inference and deep learning, creating enhancer GRNs. The enhancer architectures revealed by DeepFlyBrain lead to a better understanding of neuronal regulatory diversity and can be used to design genetic driver lines for cell types at specific timepoints, facilitating their characterization and manipulation.
Collapse
Affiliation(s)
- Jasper Janssens
- VIB Center for Brain & Disease Research, Leuven, Belgium.,Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Sara Aibar
- VIB Center for Brain & Disease Research, Leuven, Belgium.,Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Ibrahim Ihsan Taskiran
- VIB Center for Brain & Disease Research, Leuven, Belgium.,Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Joy N Ismail
- VIB Center for Brain & Disease Research, Leuven, Belgium.,Department of Human Genetics, KU Leuven, Leuven, Belgium
| | | | - Gabriel Aughey
- Department of Life Sciences, Imperial College London, London, UK
| | - Katina I Spanier
- VIB Center for Brain & Disease Research, Leuven, Belgium.,Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Florian V De Rop
- VIB Center for Brain & Disease Research, Leuven, Belgium.,Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Carmen Bravo González-Blas
- VIB Center for Brain & Disease Research, Leuven, Belgium.,Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Marc Dionne
- Department of Life Sciences, Imperial College London, London, UK
| | - Krista Grimes
- Department of Life Sciences, Imperial College London, London, UK
| | - Xiao Jiang Quan
- VIB Center for Brain & Disease Research, Leuven, Belgium.,Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Dafni Papasokrati
- VIB Center for Brain & Disease Research, Leuven, Belgium.,Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Gert Hulselmans
- VIB Center for Brain & Disease Research, Leuven, Belgium.,Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Samira Makhzami
- VIB Center for Brain & Disease Research, Leuven, Belgium.,Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Maxime De Waegeneer
- VIB Center for Brain & Disease Research, Leuven, Belgium.,Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Valerie Christiaens
- VIB Center for Brain & Disease Research, Leuven, Belgium.,Department of Human Genetics, KU Leuven, Leuven, Belgium
| | - Tony Southall
- Department of Life Sciences, Imperial College London, London, UK
| | - Stein Aerts
- VIB Center for Brain & Disease Research, Leuven, Belgium. .,Department of Human Genetics, KU Leuven, Leuven, Belgium.
| |
Collapse
|
34
|
Mauduit D, Taskiran II, Minnoye L, de Waegeneer M, Christiaens V, Hulselmans G, Demeulemeester J, Wouters J, Aerts S. Analysis of long and short enhancers in melanoma cell states. eLife 2021; 10:e71735. [PMID: 34874265 PMCID: PMC8691835 DOI: 10.7554/elife.71735] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 12/06/2021] [Indexed: 12/14/2022] Open
Abstract
Understanding how enhancers drive cell-type specificity and efficiently identifying them is essential for the development of innovative therapeutic strategies. In melanoma, the melanocytic (MEL) and the mesenchymal-like (MES) states present themselves with different responses to therapy, making the identification of specific enhancers highly relevant. Using massively parallel reporter assays (MPRAs) in a panel of patient-derived melanoma lines (MM lines), we set to identify and decipher melanoma enhancers by first focusing on regions with state-specific H3K27 acetylation close to differentially expressed genes. An in-depth evaluation of those regions was then pursued by investigating the activity of overlapping ATAC-seq peaks along with a full tiling of the acetylated regions with 190 bp sequences. Activity was observed in more than 60% of the selected regions, and we were able to precisely locate the active enhancers within ATAC-seq peaks. Comparison of sequence content with activity, using the deep learning model DeepMEL2, revealed that AP-1 alone is responsible for the MES enhancer activity. In contrast, SOX10 and MITF both influence MEL enhancer function with SOX10 being required to achieve high levels of activity. Overall, our MPRAs shed light on the relationship between long and short sequences in terms of their sequence content, enhancer activity, and specificity across melanoma cell states.
Collapse
Affiliation(s)
- David Mauduit
- VIB-KU Leuven Center for Brain & Disease ResearchLeuvenBelgium
- KU Leuven, Department of Human Genetics KU LeuvenLeuvenBelgium
| | - Ibrahim Ihsan Taskiran
- VIB-KU Leuven Center for Brain & Disease ResearchLeuvenBelgium
- KU Leuven, Department of Human Genetics KU LeuvenLeuvenBelgium
| | - Liesbeth Minnoye
- VIB-KU Leuven Center for Brain & Disease ResearchLeuvenBelgium
- KU Leuven, Department of Human Genetics KU LeuvenLeuvenBelgium
| | - Maxime de Waegeneer
- VIB-KU Leuven Center for Brain & Disease ResearchLeuvenBelgium
- KU Leuven, Department of Human Genetics KU LeuvenLeuvenBelgium
| | - Valerie Christiaens
- VIB-KU Leuven Center for Brain & Disease ResearchLeuvenBelgium
- KU Leuven, Department of Human Genetics KU LeuvenLeuvenBelgium
| | - Gert Hulselmans
- VIB-KU Leuven Center for Brain & Disease ResearchLeuvenBelgium
- KU Leuven, Department of Human Genetics KU LeuvenLeuvenBelgium
| | - Jonas Demeulemeester
- VIB-KU Leuven Center for Brain & Disease ResearchLeuvenBelgium
- KU Leuven, Department of Human Genetics KU LeuvenLeuvenBelgium
- Cancer Genomics Laboratory, The Francis Crick InstituteLondonUnited Kingdom
| | - Jasper Wouters
- VIB-KU Leuven Center for Brain & Disease ResearchLeuvenBelgium
- KU Leuven, Department of Human Genetics KU LeuvenLeuvenBelgium
| | - Stein Aerts
- VIB-KU Leuven Center for Brain & Disease ResearchLeuvenBelgium
- KU Leuven, Department of Human Genetics KU LeuvenLeuvenBelgium
| |
Collapse
|
35
|
Srinivasan C, Phan BN, Lawler AJ, Ramamurthy E, Kleyman M, Brown AR, Kaplow IM, Wirthlin ME, Pfenning AR. Addiction-Associated Genetic Variants Implicate Brain Cell Type- and Region-Specific Cis-Regulatory Elements in Addiction Neurobiology. J Neurosci 2021; 41:9008-9030. [PMID: 34462306 PMCID: PMC8549541 DOI: 10.1523/jneurosci.2534-20.2021] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 06/18/2021] [Accepted: 07/10/2021] [Indexed: 12/14/2022] Open
Abstract
Recent large genome-wide association studies have identified multiple confident risk loci linked to addiction-associated behavioral traits. Most genetic variants linked to addiction-associated traits lie in noncoding regions of the genome, likely disrupting cis-regulatory element (CRE) function. CREs tend to be highly cell type-specific and may contribute to the functional development of the neural circuits underlying addiction. Yet, a systematic approach for predicting the impact of risk variants on the CREs of specific cell populations is lacking. To dissect the cell types and brain regions underlying addiction-associated traits, we applied stratified linkage disequilibrium score regression to compare genome-wide association studies to genomic regions collected from human and mouse assays for open chromatin, which is associated with CRE activity. We found enrichment of addiction-associated variants in putative CREs marked by open chromatin in neuronal (NeuN+) nuclei collected from multiple prefrontal cortical areas and striatal regions known to play major roles in reward and addiction. To further dissect the cell type-specific basis of addiction-associated traits, we also identified enrichments in human orthologs of open chromatin regions of female and male mouse neuronal subtypes: cortical excitatory, D1, D2, and PV. Last, we developed machine learning models to predict mouse cell type-specific open chromatin, enabling us to further categorize human NeuN+ open chromatin regions into cortical excitatory or striatal D1 and D2 neurons and predict the functional impact of addiction-associated genetic variants. Our results suggest that different neuronal subtypes within the reward system play distinct roles in the variety of traits that contribute to addiction.SIGNIFICANCE STATEMENT We combine statistical genetic and machine learning techniques to find that the predisposition to for nicotine, alcohol, and cannabis use behaviors can be partially explained by genetic variants in conserved regulatory elements within specific brain regions and neuronal subtypes of the reward system. Our computational framework can flexibly integrate open chromatin data across species to screen for putative causal variants in a cell type- and tissue-specific manner for numerous complex traits.
Collapse
Affiliation(s)
- Chaitanya Srinivasan
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
| | - BaDoi N Phan
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
- Medical Scientist Training Program, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania 15213
| | - Alyssa J Lawler
- Department of Biological Sciences, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
| | - Easwaran Ramamurthy
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
| | - Michael Kleyman
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
| | - Ashley R Brown
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
| | - Irene M Kaplow
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
| | - Morgan E Wirthlin
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
| | - Andreas R Pfenning
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
- Department of Biological Sciences, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
| |
Collapse
|
36
|
Affiliation(s)
- Alicia M McConnell
- Stem Cell Program and Division of Hematology/Oncology, Children's Hospital Boston, Howard Hughes Medical Institute, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Leonard I Zon
- Stem Cell Program and Division of Hematology/Oncology, Children's Hospital Boston, Howard Hughes Medical Institute, Boston, MA, USA.
- Harvard Medical School, Boston, MA, USA.
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
37
|
Dong C, Simonett SP, Shin S, Stapleton DS, Schueler KL, Churchill GA, Lu L, Liu X, Jin F, Li Y, Attie AD, Keller MP, Keleş S. INFIMA leverages multi-omics model organism data to identify effector genes of human GWAS variants. Genome Biol 2021; 22:241. [PMID: 34425882 PMCID: PMC8381555 DOI: 10.1186/s13059-021-02450-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Accepted: 08/02/2021] [Indexed: 11/24/2022] Open
Abstract
Genome-wide association studies reveal many non-coding variants associated with complex traits. However, model organism studies largely remain as an untapped resource for unveiling the effector genes of non-coding variants. We develop INFIMA, Integrative Fine-Mapping, to pinpoint causal SNPs for diversity outbred (DO) mice eQTL by integrating founder mice multi-omics data including ATAC-seq, RNA-seq, footprinting, and in silico mutation analysis. We demonstrate INFIMA's superior performance compared to alternatives with human and mouse chromatin conformation capture datasets. We apply INFIMA to identify novel effector genes for GWAS variants associated with diabetes. The results of the application are available at http://www.statlab.wisc.edu/shiny/INFIMA/ .
Collapse
Affiliation(s)
- Chenyang Dong
- Department of Statistics, University of Wisconsin-Madison, Madison, WI USA
| | - Shane P. Simonett
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI USA
| | - Sunyoung Shin
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX USA
| | - Donnie S. Stapleton
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI USA
| | - Kathryn L. Schueler
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI USA
| | | | - Leina Lu
- Case Western University, Cleveland, OH USA
| | | | - Fulai Jin
- Case Western University, Cleveland, OH USA
| | - Yan Li
- Case Western University, Cleveland, OH USA
| | - Alan D. Attie
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI USA
| | - Mark P. Keller
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI USA
| | - Sündüz Keleş
- Department of Statistics, University of Wisconsin-Madison, Madison, WI USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI USA
| |
Collapse
|
38
|
Berico P, Cigrang M, Davidson G, Braun C, Sandoz J, Legras S, Vokshi BH, Slovic N, Peyresaubes F, Gene Robles CM, Egly JM, Compe E, Davidson I, Coin F. CDK7 and MITF repress a transcription program involved in survival and drug tolerance in melanoma. EMBO Rep 2021; 22:e51683. [PMID: 34296805 DOI: 10.15252/embr.202051683] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Revised: 06/18/2021] [Accepted: 06/25/2021] [Indexed: 11/09/2022] Open
Abstract
Melanoma cell phenotype switching between differentiated melanocytic and undifferentiated mesenchymal-like states drives metastasis and drug resistance. CDK7 is the serine/threonine kinase of the basal transcription factor TFIIH. We show that dedifferentiation of melanocytic-type melanoma cells into mesenchymal-like cells and acquisition of tolerance to targeted therapies is achieved through chronic inhibition of CDK7. In addition to emergence of a mesenchymal-type signature, we identify a GATA6-dependent gene expression program comprising genes such as AMIGO2 or ABCG2 involved in melanoma survival or targeted drug tolerance, respectively. Mechanistically, we show that CDK7 drives expression of the melanocyte lineage transcription factor MITF that in turn binds to an intronic region of GATA6 to repress its expression in melanocytic-type cells. We show that GATA6 expression is activated in MITF-low melanoma cells of patient-derived xenografts. Taken together, our data show how the poorly characterized repressive function of MITF in melanoma participates in a molecular cascade regulating activation of a transcriptional program involved in survival and drug resistance in melanoma.
Collapse
Affiliation(s)
- Pietro Berico
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, Equipe Labélisée Ligue contre le Cancer, Strasbourg, France.,Centre National de la Recherche Scientifique, UMR7104, Illkirch, France.,Institut National de la Santé et de la Recherche Médicale, Illkirch, France.,Université de Strasbourg, Illkirch, France
| | - Max Cigrang
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, Equipe Labélisée Ligue contre le Cancer, Strasbourg, France.,Centre National de la Recherche Scientifique, UMR7104, Illkirch, France.,Institut National de la Santé et de la Recherche Médicale, Illkirch, France.,Université de Strasbourg, Illkirch, France
| | - Guillaume Davidson
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, Equipe Labélisée Ligue contre le Cancer, Strasbourg, France.,Centre National de la Recherche Scientifique, UMR7104, Illkirch, France.,Institut National de la Santé et de la Recherche Médicale, Illkirch, France.,Université de Strasbourg, Illkirch, France
| | - Cathy Braun
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, Equipe Labélisée Ligue contre le Cancer, Strasbourg, France.,Centre National de la Recherche Scientifique, UMR7104, Illkirch, France.,Institut National de la Santé et de la Recherche Médicale, Illkirch, France.,Université de Strasbourg, Illkirch, France
| | - Jeremy Sandoz
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, Equipe Labélisée Ligue contre le Cancer, Strasbourg, France.,Centre National de la Recherche Scientifique, UMR7104, Illkirch, France.,Institut National de la Santé et de la Recherche Médicale, Illkirch, France.,Université de Strasbourg, Illkirch, France
| | - Stephanie Legras
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, Equipe Labélisée Ligue contre le Cancer, Strasbourg, France.,Centre National de la Recherche Scientifique, UMR7104, Illkirch, France.,Institut National de la Santé et de la Recherche Médicale, Illkirch, France.,Université de Strasbourg, Illkirch, France
| | - Bujamin Hektor Vokshi
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, Equipe Labélisée Ligue contre le Cancer, Strasbourg, France.,Centre National de la Recherche Scientifique, UMR7104, Illkirch, France.,Institut National de la Santé et de la Recherche Médicale, Illkirch, France.,Université de Strasbourg, Illkirch, France
| | - Nevena Slovic
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, Equipe Labélisée Ligue contre le Cancer, Strasbourg, France.,Centre National de la Recherche Scientifique, UMR7104, Illkirch, France.,Institut National de la Santé et de la Recherche Médicale, Illkirch, France.,Université de Strasbourg, Illkirch, France
| | - François Peyresaubes
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, Equipe Labélisée Ligue contre le Cancer, Strasbourg, France.,Centre National de la Recherche Scientifique, UMR7104, Illkirch, France.,Institut National de la Santé et de la Recherche Médicale, Illkirch, France.,Université de Strasbourg, Illkirch, France
| | - Carlos Mario Gene Robles
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, Equipe Labélisée Ligue contre le Cancer, Strasbourg, France.,Centre National de la Recherche Scientifique, UMR7104, Illkirch, France.,Institut National de la Santé et de la Recherche Médicale, Illkirch, France.,Université de Strasbourg, Illkirch, France
| | - Jean-Marc Egly
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, Equipe Labélisée Ligue contre le Cancer, Strasbourg, France.,Centre National de la Recherche Scientifique, UMR7104, Illkirch, France.,Institut National de la Santé et de la Recherche Médicale, Illkirch, France.,Université de Strasbourg, Illkirch, France
| | - Emmanuel Compe
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, Equipe Labélisée Ligue contre le Cancer, Strasbourg, France.,Centre National de la Recherche Scientifique, UMR7104, Illkirch, France.,Institut National de la Santé et de la Recherche Médicale, Illkirch, France.,Université de Strasbourg, Illkirch, France
| | - Irwin Davidson
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, Equipe Labélisée Ligue contre le Cancer, Strasbourg, France.,Centre National de la Recherche Scientifique, UMR7104, Illkirch, France.,Institut National de la Santé et de la Recherche Médicale, Illkirch, France.,Université de Strasbourg, Illkirch, France
| | - Frederic Coin
- Institut de Génétique et de Biologie Moléculaire et Cellulaire, Equipe Labélisée Ligue contre le Cancer, Strasbourg, France.,Centre National de la Recherche Scientifique, UMR7104, Illkirch, France.,Institut National de la Santé et de la Recherche Médicale, Illkirch, France.,Université de Strasbourg, Illkirch, France
| |
Collapse
|
39
|
White JR, Thompson DT, Koch KE, Kiriazov BS, Beck AC, van der Heide DM, Grimm BG, Kulak MV, Weigel RJ. AP-2α-Mediated Activation of E2F and EZH2 Drives Melanoma Metastasis. Cancer Res 2021; 81:4455-4470. [PMID: 34210752 DOI: 10.1158/0008-5472.can-21-0772] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 06/09/2021] [Accepted: 06/29/2021] [Indexed: 12/13/2022]
Abstract
In melanoma metastasis, the role of the AP-2α transcription factor, which is encoded by TFAP2A, is controversial as some findings have suggested tumor suppressor activity while other studies have shown high TFAP2A expression in node-positive melanoma associated with poor prognosis. Here we demonstrate that AP-2α facilitates melanoma metastasis through transcriptional activation of genes within the E2F pathway including EZH2. A BioID screen found that AP-2α interacts with members of the nucleosome remodeling and deacetylase (NuRD) complex. Loss of AP-2α removed activating chromatin marks in the promoters of EZH2 and other E2F target genes through activation of the NuRD repression complex. In melanoma cells, treatment with tazemetostat, an FDA-approved and highly specific EZH2 inhibitor, substantially reduced anchorage-independent colony formation and demonstrated heritable antimetastatic effects, which were dependent on AP-2α. Single-cell RNA sequencing analysis of a metastatic melanoma mouse model revealed hyperexpansion of Tfap2a High/E2F-activated cell populations in transformed melanoma relative to progenitor melanocyte stem cells. These findings demonstrate that melanoma metastasis is driven by the AP-2α/EZH2 pathway and suggest that AP-2α expression can be used as a biomarker to predict responsiveness to EZH2 inhibitors for the treatment of advanced melanomas. SIGNIFICANCE: AP-2α drives melanoma metastasis by upregulating E2F pathway genes including EZH2 through inhibition of the NuRD repression complex, serving as a biomarker to predict responsiveness to EZH2 inhibitors.
Collapse
Affiliation(s)
| | | | - Kelsey E Koch
- Department of Surgery, University of Iowa, Iowa City, Iowa
| | | | - Anna C Beck
- Department of Surgery, University of Iowa, Iowa City, Iowa
| | | | | | | | | |
Collapse
|
40
|
Asma H, Halfon MS. Annotating the Insect Regulatory Genome. INSECTS 2021; 12:591. [PMID: 34209769 PMCID: PMC8305585 DOI: 10.3390/insects12070591] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 06/23/2021] [Accepted: 06/25/2021] [Indexed: 11/17/2022]
Abstract
An ever-growing number of insect genomes is being sequenced across the evolutionary spectrum. Comprehensive annotation of not only genes but also regulatory regions is critical for reaping the full benefits of this sequencing. Driven by developments in sequencing technologies and in both empirical and computational discovery strategies, the past few decades have witnessed dramatic progress in our ability to identify cis-regulatory modules (CRMs), sequences such as enhancers that play a major role in regulating transcription. Nevertheless, providing a timely and comprehensive regulatory annotation of newly sequenced insect genomes is an ongoing challenge. We review here the methods being used to identify CRMs in both model and non-model insect species, and focus on two tools that we have developed, REDfly and SCRMshaw. These resources can be paired together in a powerful combination to facilitate insect regulatory annotation over a broad range of species, with an accuracy equal to or better than that of other state-of-the-art methods.
Collapse
Affiliation(s)
- Hasiba Asma
- Program in Genetics, Genomics, and Bioinformatics, University at Buffalo-State University of New York, Buffalo, NY 14203, USA;
| | - Marc S. Halfon
- Program in Genetics, Genomics, and Bioinformatics, University at Buffalo-State University of New York, Buffalo, NY 14203, USA;
- Department of Biochemistry, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
- Department of Biomedical Informatics, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
- Department of Biological Sciences, University at Buffalo-State University of New York, Buffalo, NY 14203, USA
- NY State Center of Excellence in Bioinformatics & Life Sciences, Buffalo, NY 14203, USA
| |
Collapse
|
41
|
Tanay A, Sebé-Pedrós A. Evolutionary Cell Type Mapping with Single-Cell Genomics. Trends Genet 2021; 37:919-932. [PMID: 34020820 DOI: 10.1016/j.tig.2021.04.008] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Revised: 04/15/2021] [Accepted: 04/17/2021] [Indexed: 12/14/2022]
Abstract
A fundamental characteristic of animal multicellularity is the spatial coexistence of functionally specialized cell types that are all encoded by a single genome sequence. Cell type transcriptional programs are deployed and maintained by regulatory mechanisms that control the asymmetric, differential access to genomic information in each cell. This genome regulation ultimately results in specific cellular phenotypes. However, the emergence, diversity, and evolutionary dynamics of animal cell types remain almost completely unexplored beyond a few species. Single-cell genomics is emerging as a powerful tool to build comprehensive catalogs of cell types and their associated gene regulatory programs in non-traditional model species. We review the current state of sampling efforts across the animal tree of life and challenges ahead for the comparative study of cell type programs. We also discuss how the phylogenetic integration of cell atlases can lead to the development of models of cell type evolution and a phylogenetic taxonomy of cells.
Collapse
Affiliation(s)
- Amos Tanay
- Department of Computer Science and Applied Mathematics, and Department of Biological Regulation, Weizmann Institute of Science, 76100 Rehovot, Israel.
| | - Arnau Sebé-Pedrós
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain; Universitat Pompeu Fabra (UPF), Barcelona 08003, Spain.
| |
Collapse
|
42
|
Comparative Transcriptomic Analysis of the Hematopoietic System between Human and Mouse by Single Cell RNA Sequencing. Cells 2021; 10:cells10050973. [PMID: 33919312 PMCID: PMC8143332 DOI: 10.3390/cells10050973] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 04/16/2021] [Accepted: 04/19/2021] [Indexed: 12/14/2022] Open
Abstract
(1) Background: mouse models are fundamental to the study of hematopoiesis, but comparisons between mouse and human in single cells have been limited in depth. (2) Methods: we constructed a single-cell resolution transcriptomic atlas of hematopoietic stem and progenitor cells (HSPCs) of human and mouse, from a total of 32,805 single cells. We used Monocle to examine the trajectories of hematopoietic differentiation, and SCENIC to analyze gene networks underlying hematopoiesis. (3) Results: After alignment with Seurat 2, the cells of mouse and human could be separated by same cell type categories. Cells were grouped into 17 subpopulations; cluster-specific genes were species-conserved and shared functional themes. The clustering dendrogram indicated that cell types were highly conserved between human and mouse. A visualization of the Monocle results provided an intuitive representation of HSPC differentiation to three dominant branches (Erythroid/megakaryocytic, Myeloid, and Lymphoid), derived directly from the hematopoietic stem cell and the long-term hematopoietic stem cells in both human and mouse. Gene regulation was similarly conserved, reflected by comparable transcriptional factors and regulatory sequence motifs in subpopulations of cells. (4) Conclusions: our analysis has confirmed evolutionary conservation in the hematopoietic systems of mouse and human, extending to cell types, gene expression and regulatory elements.
Collapse
|
43
|
Atak ZK, Taskiran II, Demeulemeester J, Flerin C, Mauduit D, Minnoye L, Hulselmans G, Christiaens V, Ghanem GE, Wouters J, Aerts S. Interpretation of allele-specific chromatin accessibility using cell state-aware deep learning. Genome Res 2021; 31:1082-1096. [PMID: 33832990 PMCID: PMC8168584 DOI: 10.1101/gr.260851.120] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Accepted: 04/05/2021] [Indexed: 12/26/2022]
Abstract
Genomic sequence variation within enhancers and promoters can have a significant impact on the cellular state and phenotype. However, sifting through the millions of candidate variants in a personal genome or a cancer genome, to identify those that impact cis-regulatory function, remains a major challenge. Interpretation of noncoding genome variation benefits from explainable artificial intelligence to predict and interpret the impact of a mutation on gene regulation. Here we generate phased whole genomes with matched chromatin accessibility, histone modifications, and gene expression for 10 melanoma cell lines. We find that training a specialized deep learning model, called DeepMEL2, on melanoma chromatin accessibility data can capture the various regulatory programs of the melanocytic and mesenchymal-like melanoma cell states. This model outperforms motif-based variant scoring, as well as more generic deep learning models. We detect hundreds to thousands of allele-specific chromatin accessibility variants (ASCAVs) in each melanoma genome, of which 15%-20% can be explained by gains or losses of transcription factor binding sites. A considerable fraction of ASCAVs are caused by changes in AP-1 binding, as confirmed by matched ChIP-seq data to identify allele-specific binding of JUN and FOSL1. Finally, by augmenting the DeepMEL2 model with ChIP-seq data for GABPA, the TERT promoter mutation, as well as additional ETS motif gains, can be identified with high confidence. In conclusion, we present a new integrative genomics approach and a deep learning model to identify and interpret functional enhancer mutations with allelic imbalance of chromatin accessibility and gene expression.
Collapse
Affiliation(s)
- Zeynep Kalender Atak
- VIB-KU Leuven Center for Brain and Disease Research, 3000 Leuven, Belgium.,KU Leuven, Department of Human Genetics KU Leuven, 3000 Leuven, Belgium
| | - Ibrahim Ihsan Taskiran
- VIB-KU Leuven Center for Brain and Disease Research, 3000 Leuven, Belgium.,KU Leuven, Department of Human Genetics KU Leuven, 3000 Leuven, Belgium
| | - Jonas Demeulemeester
- VIB-KU Leuven Center for Brain and Disease Research, 3000 Leuven, Belgium.,KU Leuven, Department of Human Genetics KU Leuven, 3000 Leuven, Belgium.,Cancer Genomics Laboratory, The Francis Crick Institute, London NW1 1AT, United Kingdom
| | - Christopher Flerin
- VIB-KU Leuven Center for Brain and Disease Research, 3000 Leuven, Belgium.,KU Leuven, Department of Human Genetics KU Leuven, 3000 Leuven, Belgium
| | - David Mauduit
- VIB-KU Leuven Center for Brain and Disease Research, 3000 Leuven, Belgium.,KU Leuven, Department of Human Genetics KU Leuven, 3000 Leuven, Belgium
| | - Liesbeth Minnoye
- VIB-KU Leuven Center for Brain and Disease Research, 3000 Leuven, Belgium.,KU Leuven, Department of Human Genetics KU Leuven, 3000 Leuven, Belgium
| | - Gert Hulselmans
- VIB-KU Leuven Center for Brain and Disease Research, 3000 Leuven, Belgium.,KU Leuven, Department of Human Genetics KU Leuven, 3000 Leuven, Belgium
| | - Valerie Christiaens
- VIB-KU Leuven Center for Brain and Disease Research, 3000 Leuven, Belgium.,KU Leuven, Department of Human Genetics KU Leuven, 3000 Leuven, Belgium
| | - Ghanem-Elias Ghanem
- Institut Jules Bordet, Université Libre de Bruxelles, 1000 Brussels, Belgium
| | - Jasper Wouters
- VIB-KU Leuven Center for Brain and Disease Research, 3000 Leuven, Belgium.,KU Leuven, Department of Human Genetics KU Leuven, 3000 Leuven, Belgium
| | - Stein Aerts
- VIB-KU Leuven Center for Brain and Disease Research, 3000 Leuven, Belgium.,KU Leuven, Department of Human Genetics KU Leuven, 3000 Leuven, Belgium
| |
Collapse
|
44
|
Giudicelli F, Roest Crollius H. On the importance of evolutionary constraint for regulatory sequence identification. Brief Funct Genomics 2021:elab015. [PMID: 33754633 DOI: 10.1093/bfgp/elab015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 01/15/2021] [Accepted: 02/19/2021] [Indexed: 11/13/2022] Open
Abstract
Regulation of gene expression relies on the activity of specialized genomic elements, enhancers or silencers, distributed over sometimes large distance from their target gene promoters. A significant part of vertebrate genomes consists in such regulatory elements, but their identification and that of their target genes remains challenging, due to the lack of clear signature at the nucleotide level. For many years the main hallmark used for identifying functional elements has been their sequence conservation between genomes of distant species, indicative of purifying selection. More recently, genome-wide biochemical assays have opened new avenues for detecting regulatory regions, shifting attention away from evolutionary constraints. Here, we review the respective contributions of comparative genomics and biochemical assays for the definition of regulatory elements and their targets and advocate that both sequence conservation and preserved synteny, taken as signature of functional constraint, remain essential tools in this task.
Collapse
|
45
|
Jindal GA, Farley EK. Enhancer grammar in development, evolution, and disease: dependencies and interplay. Dev Cell 2021; 56:575-587. [PMID: 33689769 PMCID: PMC8462829 DOI: 10.1016/j.devcel.2021.02.016] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Revised: 02/15/2021] [Accepted: 02/16/2021] [Indexed: 12/19/2022]
Abstract
Each language has standard books describing that language's grammatical rules. Biologists have searched for similar, albeit more complex, principles relating enhancer sequence to gene expression. Here, we review the literature on enhancer grammar. We introduce dependency grammar, a model where enhancers encode information based on dependencies between enhancer features shaped by mechanistic, evolutionary, and biological constraints. Classifying enhancers based on the types of dependencies may identify unifying principles relating enhancer sequence to gene expression. Such rules would allow us to read the instructions for development within genomes and pinpoint causal enhancer variants underlying disease and evolutionary changes.
Collapse
Affiliation(s)
- Granton A Jindal
- Division of Cardiology, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA; Division of Biological Sciences, Section of Molecular Biology, University of California San Diego, La Jolla, CA 92093, USA
| | - Emma K Farley
- Division of Cardiology, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA; Division of Biological Sciences, Section of Molecular Biology, University of California San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
46
|
Minnoye L, Marinov GK, Krausgruber T, Pan L, Marand AP, Secchia S, Greenleaf WJ, Furlong EEM, Zhao K, Schmitz RJ, Bock C, Aerts S. Chromatin accessibility profiling methods. NATURE REVIEWS. METHODS PRIMERS 2021; 1:10. [PMID: 38410680 PMCID: PMC10895463 DOI: 10.1038/s43586-020-00008-9] [Citation(s) in RCA: 66] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 12/01/2020] [Indexed: 02/06/2023]
Abstract
Chromatin accessibility, or the physical access to chromatinized DNA, is a widely studied characteristic of the eukaryotic genome. As active regulatory DNA elements are generally 'accessible', the genome-wide profiling of chromatin accessibility can be used to identify candidate regulatory genomic regions in a tissue or cell type. Multiple biochemical methods have been developed to profile chromatin accessibility, both in bulk and at the single-cell level. Depending on the method, enzymatic cleavage, transposition or DNA methyltransferases are used, followed by high-throughput sequencing, providing a view of genome-wide chromatin accessibility. In this Primer, we discuss these biochemical methods, as well as bioinformatics tools for analysing and interpreting the generated data, and insights into the key regulators underlying developmental, evolutionary and disease processes. We outline standards for data quality, reproducibility and deposition used by the genomics community. Although chromatin accessibility profiling is invaluable to study gene regulation, alone it provides only a partial view of this complex process. Orthogonal assays facilitate the interpretation of accessible regions with respect to enhancer-promoter proximity, functional transcription factor binding and regulatory function. We envision that technological improvements including single-molecule, multi-omics and spatial methods will bring further insight into the secrets of genome regulation.
Collapse
Affiliation(s)
- Liesbeth Minnoye
- Center for Brain & Disease Research, VIB-KU Leuven, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | | | - Thomas Krausgruber
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Lixia Pan
- Laboratory of Epigenome Biology, Systems Biology Center, Division of Intramural Research, National Heart, Lung and Blood Institute, NIH, Bethesda, MD, USA
| | | | - Stefano Secchia
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | | | - Eileen E M Furlong
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Keji Zhao
- Laboratory of Epigenome Biology, Systems Biology Center, Division of Intramural Research, National Heart, Lung and Blood Institute, NIH, Bethesda, MD, USA
| | | | - Christoph Bock
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
- Institute of Artificial Intelligence and Decision Support, Center for Medical Statistics, Informatics, and Intelligent Systems, Medical University of Vienna, Vienna, Austria
| | - Stein Aerts
- Center for Brain & Disease Research, VIB-KU Leuven, Leuven, Belgium
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| |
Collapse
|