1
|
Wei J, Resztak JA, Ranjbaran A, Alazizi A, Mair-Meijers HE, Slatcher RB, Zilioli S, Wen X, Luca F, Pique-Regi R. Functional characterization of eQTLs and asthma risk loci with scATAC-seq across immune cell types and contexts. Am J Hum Genet 2025:S0002-9297(24)00459-2. [PMID: 39814021 DOI: 10.1016/j.ajhg.2024.12.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 12/13/2024] [Accepted: 12/17/2024] [Indexed: 01/18/2025] Open
Abstract
cis-regulatory elements (CREs) control gene transcription dynamics across cell types and in response to the environment. In asthma, multiple immune cell types play an important role in the inflammatory process. Genetic variants in CREs can also affect gene expression response dynamics and contribute to asthma risk. However, the regulatory mechanisms underlying control of transcriptional dynamics across different environmental contexts and cell types at single-cell resolution remain to be elucidated. To resolve this question, we performed single-cell ATAC-seq (scATAC-seq) in peripheral blood mononuclear cells (PBMCs) from 16 children with asthma. PBMCs were activated with phytohemagglutinin (PHA) or lipopolysaccharide (LPS) and treated with dexamethasone (DEX), an anti-inflammatory glucocorticoid. We analyzed changes in chromatin accessibility, measured transcription factor motif activity, and identified treatment- and cell-type-specific transcription factors that drive changes in both gene expression mean and variability. We observed a strong positive linear dependence between motif response and their target gene expression changes but a negative relationship with changes in target gene expression variability. This result suggests that an increase of transcription factor binding tightens the variability of gene expression around the mean. We then annotated genetic variants in chromatin accessibility peaks and response motifs, followed by computational fine-mapping of expression quantitative trait loci (eQTL) from a pediatric asthma cohort. We found that eQTLs were 5-fold enriched in peaks with response motifs and refined the credible set for 410 asthma risk genes, with 191 having the causal variant in response motifs. In conclusion, scATAC-seq enhances the understanding of molecular mechanisms for asthma risk variants mediated by gene expression.
Collapse
Affiliation(s)
- Julong Wei
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI, USA
| | - Justyna A Resztak
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI, USA
| | - Ali Ranjbaran
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI, USA
| | - Adnan Alazizi
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI, USA
| | | | | | - Samuele Zilioli
- Department of Psychology, Wayne State University, Detroit, MI, USA; Department of Family Medicine and Public Health Sciences, Wayne State University, Detroit, MI, USA
| | - Xiaoquan Wen
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Francesca Luca
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA.
| | - Roger Pique-Regi
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI, USA; Department of Obstetrics and Gynecology, Wayne State University, Detroit, MI 48201, USA.
| |
Collapse
|
2
|
Wanniarachchi DV, Viswakula S, Wickramasuriya AM. The evaluation of transcription factor binding site prediction tools in human and Arabidopsis genomes. BMC Bioinformatics 2024; 25:371. [PMID: 39623329 PMCID: PMC11613939 DOI: 10.1186/s12859-024-05995-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Accepted: 11/21/2024] [Indexed: 12/06/2024] Open
Abstract
BACKGROUND The precise prediction of transcription factor binding sites (TFBSs) is pivotal for unraveling the gene regulatory networks underlying biological processes. While numerous tools have emerged for in silico TFBS prediction in recent years, the evolving landscape of computational biology necessitates thorough assessments of tool performance to ensure accuracy and reliability. Only a limited number of studies have been conducted to evaluate the performance of TFBS prediction tools comprehensively. Thus, the present study focused on assessing twelve widely used TFBS prediction tools and four de novo motif discovery tools using a benchmark dataset comprising real, generic, Markov, and negative sequences. TFBSs of Arabidopsis thaliana and Homo sapiens genomes downloaded from the JASPAR database were implanted in these sequences and the performance of tools was evaluated using several statistical parameters at different overlap percentages between the lengths of known and predicted binding sites. RESULTS Overall, the Multiple Cluster Alignment and Search Tool (MCAST) emerged as the best TFBS prediction tool, followed by Find Individual Motif Occurrences (FIMO) and MOtif Occurrence Detection Suite (MOODS). In addition, MotEvo and Dinucleotide Weight Tensor Toolbox (DWT-toolbox) demonstrated the highest sensitivity in identifying TFBSs at 90% and 80% overlap. Further, MCAST and DWT-toolbox managed to demonstrate the highest sensitivity across all three data types real, generic, and Markov. Among the de novo motif discovery tools, the Multiple Em for Motif Elicitation (MEME) emerged as the best performer. An analysis of the promoter regions of genes involved in the anthocyanin biosynthesis pathway in plants and the pentose phosphate pathway in humans, using the three best-performing tools, revealed considerable variation among the top 20 motifs identified by these tools. CONCLUSION The findings of this study lay a robust groundwork for selecting optimal TFBS prediction tools for future research. Given the variability observed in tool performance, employing multiple tools for identifying TFBSs in a set of sequences is highly recommended. In addition, further studies are recommended to develop an integrated toolbox that incorporates TFBS prediction or motif discovery tools, aiming to streamline result precision and accuracy.
Collapse
Affiliation(s)
- Dinithi V Wanniarachchi
- Department of Plant Sciences, Faculty of Science, University of Colombo, Colombo 03, Sri Lanka
| | - Sameera Viswakula
- Department of Statistics, Faculty of Science, University of Colombo, Colombo 03, Sri Lanka
| | | |
Collapse
|
3
|
Tay T, Bommakanti G, Jaensch E, Gorthi A, Karapa Reddy I, Hu Y, Zhang R, Doshi AS, Tan SL, Brucklacher-Waldert V, Prickett L, Kurasawa J, Overstreet MG, Criscione S, Buenrostro JD, Mele DA. Degradation of IKZF1 prevents epigenetic progression of T cell exhaustion in an antigen-specific assay. Cell Rep Med 2024; 5:101804. [PMID: 39486420 PMCID: PMC11604474 DOI: 10.1016/j.xcrm.2024.101804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 07/30/2024] [Accepted: 10/04/2024] [Indexed: 11/04/2024]
Abstract
In cancer, chronic antigen stimulation drives effector T cells to exhaustion, limiting the efficacy of T cell therapies. Recent studies have demonstrated that epigenetic rewiring governs the transition of T cells from effector to exhausted states and makes a subset of exhausted T cells non-responsive to PD1 checkpoint blockade. Here, we describe an antigen-specific assay for T cell exhaustion that generates T cells phenotypically and transcriptionally similar to those found in human tumors. We perform a screen of human epigenetic regulators, identifying IKZF1 as a driver of T cell exhaustion. We determine that the IKZF1 degrader iberdomide prevents exhaustion by blocking chromatin remodeling at T cell effector enhancers and preserving the binding of AP-1, NF-κB, and NFAT. Thus, our study uncovers a role for IKZF1 as a driver of T cell exhaustion through epigenetic modulation, providing a rationale for the use of iberdomide in solid tumors to prevent T cell exhaustion.
Collapse
Affiliation(s)
- Tristan Tay
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA; Gene Regulation Observatory, Broad Institute, Cambridge, MA, USA
| | | | | | | | | | - Yan Hu
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA; Gene Regulation Observatory, Broad Institute, Cambridge, MA, USA
| | - Ruochi Zhang
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA; Gene Regulation Observatory, Broad Institute, Cambridge, MA, USA
| | | | | | | | | | | | | | | | - Jason Daniel Buenrostro
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA; Gene Regulation Observatory, Broad Institute, Cambridge, MA, USA.
| | | |
Collapse
|
4
|
Jolma A, Hernandez-Corchado A, Yang AW, Fathi A, Laverty KU, Brechalov A, Razavi R, Albu M, Zheng H, Kulakovskiy IV, Najafabadi HS, Hughes TR. GHT-SELEX demonstrates unexpectedly high intrinsic sequence specificity and complex DNA binding of many human transcription factors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.11.618478. [PMID: 39605368 PMCID: PMC11601218 DOI: 10.1101/2024.11.11.618478] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
A long-standing challenge in human regulatory genomics is that transcription factor (TF) DNA-binding motifs are short and degenerate, while the genome is large. Motif scans therefore produce many false-positive binding site predictions. By surveying 179 TFs across 25 families using >1,500 cyclic in vitro selection experiments with fragmented, naked, and unmodified genomic DNA - a method we term GHT-SELEX (Genomic HT-SELEX) - we find that many human TFs possess much higher sequence specificity than anticipated. Moreover, genomic binding regions from GHT-SELEX are often surprisingly similar to those obtained in vivo (i.e. ChIP-seq peaks). We find that comparable specificity can also be obtained from motif scans, but performance is highly dependent on derivation and use of the motifs, including accounting for multiple local matches in the scans. We also observe alternative engagement of multiple DNA-binding domains within the same protein: long C2H2 zinc finger proteins often utilize modular DNA recognition, engaging different subsets of their DNA binding domain (DBD) arrays to recognize multiple types of distinct target sites, frequently evolving via internal duplication and divergence of one or more DBDs. Thus, contrary to conventional wisdom, it is common for TFs to possess sufficient intrinsic specificity to independently delineate cellular targets.
Collapse
Affiliation(s)
- Arttu Jolma
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Aldo Hernandez-Corchado
- Department of Human Genetics, McGill University, Montréal, QC H3A 0C7, Canada
- Victor P. Dahdaleh Institute of Genomic Medicine, Montréal, QC H3A 0G1, Canada
| | - Ally W.H. Yang
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Ali Fathi
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Kaitlin U. Laverty
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
- Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | | | - Rozita Razavi
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Mihai Albu
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Hong Zheng
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | | | - Ivan V. Kulakovskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia and Institute of Protein Research, Russian Academy of Sciences, 142290, Pushchino, Russia
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
| | - Hamed S. Najafabadi
- Department of Human Genetics, McGill University, Montréal, QC H3A 0C7, Canada
- Victor P. Dahdaleh Institute of Genomic Medicine, Montréal, QC H3A 0G1, Canada
| | - Timothy R. Hughes
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| |
Collapse
|
5
|
Razavi R, Fathi A, Yellan I, Brechalov A, Laverty KU, Jolma A, Hernandez-Corchado A, Zheng H, Yang AW, Albu M, Barazandeh M, Hu C, Vorontsov IE, Patel ZM, Kulakovskiy IV, Bucher P, Morris Q, Najafabadi HS, Hughes TR. Extensive binding of uncharacterized human transcription factors to genomic dark matter. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.11.622123. [PMID: 39605320 PMCID: PMC11601254 DOI: 10.1101/2024.11.11.622123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
Most of the human genome is thought to be non-functional, and includes large segments often referred to as "dark matter" DNA. The genome also encodes hundreds of putative and poorly characterized transcription factors (TFs). We determined genomic binding locations of 166 uncharacterized human TFs in living cells. Nearly half of them associated strongly with known regulatory regions such as promoters and enhancers, often at conserved motif matches and co-localizing with each other. Surprisingly, the other half often associated with genomic dark matter, at largely unique sites, via intrinsic sequence recognition. Dozens of these, which we term "Dark TFs", mainly bind within regions of closed chromatin. Dark TF binding sites are enriched for transposable elements, and are rarely under purifying selection. Some Dark TFs are KZNFs, which contain the repressive KRAB domain, but many are not: the Dark TFs also include known or potential pioneer TFs. Compiled literature information supports that the Dark TFs exert diverse functions ranging from early development to tumor suppression. Thus, our results sheds light on a large fraction of previously uncharacterized human TFs and their unappreciated activities within the dark matter genome.
Collapse
Affiliation(s)
- Rozita Razavi
- Donnelly Centre and Department of Molecular Genetics, 160 College Street, Toronto, ON M5S 3E1, Canada
| | - Ali Fathi
- Donnelly Centre and Department of Molecular Genetics, 160 College Street, Toronto, ON M5S 3E1, Canada
| | - Isaac Yellan
- Donnelly Centre and Department of Molecular Genetics, 160 College Street, Toronto, ON M5S 3E1, Canada
| | - Alexander Brechalov
- Donnelly Centre and Department of Molecular Genetics, 160 College Street, Toronto, ON M5S 3E1, Canada
| | - Kaitlin U. Laverty
- Donnelly Centre and Department of Molecular Genetics, 160 College Street, Toronto, ON M5S 3E1, Canada
- Memorial Sloan Kettering Cancer Center, Rockefeller Research Laboratories, New York, NY 10065, USA
| | - Arttu Jolma
- Donnelly Centre and Department of Molecular Genetics, 160 College Street, Toronto, ON M5S 3E1, Canada
| | - Aldo Hernandez-Corchado
- Victor P. Dahdaleh Institute of Genomic Medicine, 740 Dr. Penfield Avenue, Room 7202, Montréal, Québec, H3A 0G1, Canada
| | - Hong Zheng
- Donnelly Centre and Department of Molecular Genetics, 160 College Street, Toronto, ON M5S 3E1, Canada
| | - Ally W.H. Yang
- Donnelly Centre and Department of Molecular Genetics, 160 College Street, Toronto, ON M5S 3E1, Canada
| | - Mihai Albu
- Donnelly Centre and Department of Molecular Genetics, 160 College Street, Toronto, ON M5S 3E1, Canada
| | - Marjan Barazandeh
- Donnelly Centre and Department of Molecular Genetics, 160 College Street, Toronto, ON M5S 3E1, Canada
| | - Chun Hu
- Donnelly Centre and Department of Molecular Genetics, 160 College Street, Toronto, ON M5S 3E1, Canada
| | - Ilya E. Vorontsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
| | - Zain M. Patel
- Donnelly Centre and Department of Molecular Genetics, 160 College Street, Toronto, ON M5S 3E1, Canada
| | | | - Ivan V. Kulakovskiy
- Institute of Protein Research, Russian Academy of Sciences, 142290, Pushchino, Russia
| | - Philipp Bucher
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Quaid Morris
- Memorial Sloan Kettering Cancer Center, Rockefeller Research Laboratories, New York, NY 10065, USA
| | - Hamed S. Najafabadi
- Victor P. Dahdaleh Institute of Genomic Medicine, 740 Dr. Penfield Avenue, Room 7202, Montréal, Québec, H3A 0G1, Canada
- Department of Human Genetics, McGill University, Montréal, Québec, H3A 0C7, Canada
| | - Timothy R. Hughes
- Donnelly Centre and Department of Molecular Genetics, 160 College Street, Toronto, ON M5S 3E1, Canada
| |
Collapse
|
6
|
Jolma A, Laverty KU, Fathi A, Yang AWH, Yellan I, Vorontsov IE, Inukai S, Kribelbauer-Swietek JF, Gralak AJ, Razavi R, Albu M, Brechalov A, Patel ZM, Nozdrin V, Meshcheryakov G, Kozin I, Abramov S, Boytsov A, Fornes O, Makeev VJ, Grau J, Grosse I, Bucher P, Deplancke B, Kulakovskiy IV, Hughes TR. Perspectives on Codebook: sequence specificity of uncharacterized human transcription factors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.11.622097. [PMID: 39605729 PMCID: PMC11601247 DOI: 10.1101/2024.11.11.622097] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2024]
Abstract
We describe an effort ("Codebook") to determine the sequence specificity of 332 putative and largely uncharacterized human transcription factors (TFs), as well as 61 control TFs. Nearly 5,000 independent experiments across multiple in vitro and in vivo assays produced motifs for just over half of the putative TFs analyzed (177, or 53%), of which most are unique to a single TF. The data highlight the extensive contribution of transposable elements to TF evolution, both in cis and trans, and identify tens of thousands of conserved, base-level binding sites in the human genome. The use of multiple assays provides an unprecedented opportunity to benchmark and analyze TF sequence specificity, function, and evolution, as further explored in accompanying manuscripts. 1,421 human TFs are now associated with a DNA binding motif. Extrapolation from the Codebook benchmarking, however, suggests that many of the currently known binding motifs for well-studied TFs may inaccurately describe the TF's true sequence preferences.
Collapse
Affiliation(s)
- Arttu Jolma
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Kaitlin U Laverty
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
- Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Ali Fathi
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Ally W H Yang
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Isaac Yellan
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Ilya E Vorontsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
| | - Sachi Inukai
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, 1015, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Judith F Kribelbauer-Swietek
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, 1015, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Antoni J Gralak
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, 1015, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Rozita Razavi
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Mihai Albu
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | | | - Zain M Patel
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Vladimir Nozdrin
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119991, Moscow, Russia
| | - Georgy Meshcheryakov
- Institute of Protein Research, Russian Academy of Sciences, 142290, Pushchino, Russia
| | - Ivan Kozin
- Institute of Protein Research, Russian Academy of Sciences, 142290, Pushchino, Russia
| | - Sergey Abramov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
- Altius Institute for Biomedical Sciences, Seattle, WA 98121, USA
| | - Alexandr Boytsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
- Altius Institute for Biomedical Sciences, Seattle, WA 98121, USA
| | - Oriol Fornes
- Department of Medical Genetics, Centre for Molecular Medicine and Therapeutics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC V5Z 4H4, Canada
| | - Vsevolod J Makeev
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
| | - Jan Grau
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, 06099, Halle, Germany
| | - Ivo Grosse
- Institute of Computer Science, Martin Luther University Halle-Wittenberg, 06099, Halle, Germany
| | - Philipp Bucher
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Bart Deplancke
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, 1015, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Ivan V Kulakovskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, Moscow, Russia
- Institute of Protein Research, Russian Academy of Sciences, 142290, Pushchino, Russia
| | - Timothy R Hughes
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| |
Collapse
|
7
|
Dudek MF, Wenz BM, Brown CD, Voight BF, Almasy L, Grant SF. Characterization of non-coding variants associated with transcription factor binding through ATAC-seq-defined footprint QTLs in liver. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.24.614730. [PMID: 39386531 PMCID: PMC11463493 DOI: 10.1101/2024.09.24.614730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/12/2024]
Abstract
Non-coding variants discovered by genome-wide association studies (GWAS) are enriched in regulatory elements harboring transcription factor (TF) binding motifs, strongly suggesting a connection between disease association and the disruption of cis-regulatory sequences. Occupancy of a TF inside a region of open chromatin can be detected in ATAC-seq where bound TFs block the transposase Tn5, leaving a pattern of relatively depleted Tn5 insertions known as a "footprint". Here, we sought to identify variants associated with TF-binding, or "footprint quantitative trait loci" (fpQTLs) in ATAC-seq data generated from 170 human liver samples. We used computational tools to scan the ATAC-seq reads to quantify TF binding likelihood as "footprint scores" at variants derived from whole genome sequencing generated in the same samples. We tested for association between genotype and footprint score and observed 693 fpQTLs associated with footprint-inferred TF binding (FDR < 5%). Given that Tn5 insertion sites are measured with base-pair resolution, we show that fpQTLs can aid GWAS and QTL fine-mapping by precisely pinpointing TF activity within broad trait-associated loci where the underlying causal variant is unknown. Liver fpQTLs were strongly enriched across ChIP-seq peaks, liver expression QTLs (eQTLs), and liver-related GWAS loci, and their inferred effect on TF binding was concordant with their effect on underlying sequence motifs in 80% of cases. We conclude that fpQTLs can reveal causal GWAS variants, define the role of TF binding site disruption in disease and provide functional insights into non-coding variants, ultimately informing novel treatments for common diseases.
Collapse
Affiliation(s)
- Max F. Dudek
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Brandon M. Wenz
- Cell and Molecular Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Christopher D. Brown
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Cell and Molecular Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Benjamin F. Voight
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Perelman School of Medicine, Philadelphia, PA 19104, USA
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Laura Almasy
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Lifespan Brain Institute, Children’s Hospital of Philadelphia and Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Biomedical and Health Informatics, Children’s Hospital of Philadelphia
| | - Struan F.A. Grant
- Center for Spatial and Functional Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Division of Human Genetics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Division of Endocrinology and Diabetes, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| |
Collapse
|
8
|
Becerra B, Wittibschlager S, Patel ZM, Kutschat AP, Delano J, Che E, Karjalainen A, Wu T, Starrs M, Jankowiak M, Bauer DE, Seruggia D, Pinello L. CRISPR-CLEAR: Nucleotide-Resolution Mapping of Regulatory Elements via Allelic Readout of Tiled Base Editing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.09.612085. [PMID: 39314441 PMCID: PMC11419122 DOI: 10.1101/2024.09.09.612085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
CRISPR tiling screens have advanced the identification and characterization of regulatory sequences but are limited by low resolution arising from the indirect readout of editing via guide RNA sequencing. This study introduces CRISPR-CLEAR, an end-to-end experimental assay and computational pipeline, which leverages targeted sequencing of CRISPR-introduced alleles at the endogenous target locus following dense base-editing mutagenesis. This approach enables the dissection of regulatory elements at nucleotide resolution, facilitating a direct assessment of genotype-phenotype effects.
Collapse
Affiliation(s)
- Basheer Becerra
- Bioinformatics and Integrative Genomics PhD Program, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Molecular Pathology Unit, Krantz Family Center for Cancer Research, Massachusetts General Hospital Research Institute, Boston, MA, USA
- Department of Pathology, Harvard Medical School, Boston, MA, USA
- Division of Hematology/Oncology, Boston Children's Hospital, Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Stem Cell Institute, Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Sandra Wittibschlager
- St. Anna Children's Cancer Research Institute (CCRI), Vienna, Austria
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Zain M Patel
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Molecular Pathology Unit, Krantz Family Center for Cancer Research, Massachusetts General Hospital Research Institute, Boston, MA, USA
- Department of Pathology, Harvard Medical School, Boston, MA, USA
| | - Ana P Kutschat
- St. Anna Children's Cancer Research Institute (CCRI), Vienna, Austria
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Justin Delano
- Bioinformatics and Integrative Genomics PhD Program, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Eric Che
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Molecular Pathology Unit, Krantz Family Center for Cancer Research, Massachusetts General Hospital Research Institute, Boston, MA, USA
- Division of Hematology/Oncology, Boston Children's Hospital, Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Stem Cell Institute, Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Anzhelika Karjalainen
- St. Anna Children's Cancer Research Institute (CCRI), Vienna, Austria
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Ting Wu
- Division of Hematology/Oncology, Boston Children's Hospital, Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Stem Cell Institute, Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Marlena Starrs
- Division of Hematology/Oncology, Boston Children's Hospital, Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Stem Cell Institute, Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | | | - Daniel E Bauer
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Pathology, Harvard Medical School, Boston, MA, USA
- Division of Hematology/Oncology, Boston Children's Hospital, Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Stem Cell Institute, Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Davide Seruggia
- St. Anna Children's Cancer Research Institute (CCRI), Vienna, Austria
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - Luca Pinello
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Molecular Pathology Unit, Krantz Family Center for Cancer Research, Massachusetts General Hospital Research Institute, Boston, MA, USA
- Department of Pathology, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
9
|
Augustijn HE, Karapliafis D, Joosten KMM, Rigali S, van Wezel GP, Medema MH. LogoMotif: A Comprehensive Database of Transcription Factor Binding Site Profiles in Actinobacteria. J Mol Biol 2024; 436:168558. [PMID: 38580076 DOI: 10.1016/j.jmb.2024.168558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 03/28/2024] [Accepted: 03/30/2024] [Indexed: 04/07/2024]
Abstract
Actinobacteria undergo a complex multicellular life cycle and produce a wide range of specialized metabolites, including the majority of the antibiotics. These biological processes are controlled by intricate regulatory pathways, and to better understand how they are controlled we need to augment our insights into the transcription factor binding sites. Here, we present LogoMotif (https://logomotif.bioinformatics.nl), an open-source database for characterized and predicted transcription factor binding sites in Actinobacteria, along with their cognate position weight matrices and hidden Markov models. Genome-wide predictions of binding site locations in Streptomyces model organisms are supplied and visualized in interactive regulatory networks. In the web interface, users can freely access, download and investigate the underlying data. With this curated collection of actinobacterial regulatory interactions, LogoMotif serves as a basis for binding site predictions, thus providing users with clues on how to elicit the expression of genes of interest and guide genome mining efforts.
Collapse
Affiliation(s)
- Hannah E Augustijn
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands; Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | | | - Kristy M M Joosten
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Sébastien Rigali
- InBioS - Center for Protein Engineering, University of Liège, Institut de Chimie, B-4000 Liège, Belgium
| | - Gilles P van Wezel
- Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands.
| | - Marnix H Medema
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands; Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands.
| |
Collapse
|
10
|
Read DF, Booth GT, Daza RM, Jackson DL, Gladden RG, Srivatsan SR, Ewing B, Franks JM, Spurrell CH, Gomes AR, O'Day D, Gogate AA, Martin BK, Larson H, Pfleger C, Starita L, Lin Y, Shendure J, Lin S, Trapnell C. Single-cell analysis of chromatin and expression reveals age- and sex-associated alterations in the human heart. Commun Biol 2024; 7:1052. [PMID: 39187646 PMCID: PMC11347658 DOI: 10.1038/s42003-024-06582-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Accepted: 07/11/2024] [Indexed: 08/28/2024] Open
Abstract
Sex differences and age-related changes in the human heart at the tissue, cell, and molecular level have been well-documented and many may be relevant for cardiovascular disease. However, how molecular programs within individual cell types vary across individuals by age and sex remains poorly characterized. To better understand this variation, we performed single-nucleus combinatorial indexing (sci) ATAC- and RNA-Seq in human heart samples from nine donors. We identify hundreds of differentially expressed genes by age and sex and find epigenetic signatures of variation in ATAC-Seq data in this discovery cohort. We then scale up our single-cell RNA-Seq analysis by combining our data with five recently published single nucleus RNA-Seq datasets of healthy adult hearts. We find variation such as metabolic alterations by sex and immune changes by age in differential expression tests, as well as alterations in abundance of cardiomyocytes by sex and neurons with age. In addition, we compare our adult-derived ATAC-Seq profiles to analogous fetal cell types to identify putative developmental-stage-specific regulatory factors. Finally, we train predictive models of cell-type-specific RNA expression levels utilizing ATAC-Seq profiles to link distal regulatory sequences to promoters, quantifying the predictive value of a simple TF-to-expression regulatory grammar and identifying cell-type-specific TFs. Our analysis represents the largest single-cell analysis of cardiac variation by age and sex to date and provides a resource for further study of healthy cardiac variation and transcriptional regulation at single-cell resolution.
Collapse
Affiliation(s)
- David F Read
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Gregory T Booth
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Riza M Daza
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Dana L Jackson
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Rula Green Gladden
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Sanjay R Srivatsan
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Brent Ewing
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Jennifer M Franks
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | | | | | - Diana O'Day
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Aishwarya A Gogate
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Seattle Children's Research Institute, Seattle, WA, USA
| | - Beth K Martin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Haleigh Larson
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Christian Pfleger
- University of Washington School of Medicine, Division of Cardiology, Seattle, WA, USA
| | - Lea Starita
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Yiing Lin
- Department of Surgery, Washington University, St Louis, MO, USA
| | - Jay Shendure
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA.
- Seattle Children's Research Institute, Seattle, WA, USA.
- Howard Hughes Medical Institute, Seattle, WA, USA.
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA.
| | - Shin Lin
- University of Washington School of Medicine, Division of Cardiology, Seattle, WA, USA.
| | - Cole Trapnell
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA.
| |
Collapse
|
11
|
Handler JS, Li Z, Dveirin RK, Fang W, Goodarzi H, Fertig EJ, Kalhor R. Identifying a gene signature of metastatic potential by linking pre-metastatic state to ultimate metastatic fate. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.14.607813. [PMID: 39185156 PMCID: PMC11343111 DOI: 10.1101/2024.08.14.607813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Identifying the key molecular pathways that enable metastasis by analyzing the eventual metastatic tumor is challenging because the state of the founder subclone likely changes following metastatic colonization. To address this challenge, we labeled primary mouse pancreatic ductal adenocarcinoma (PDAC) subclones with DNA barcodes to characterize their pre-metastatic state using ATAC-seq and RNA-seq and determine their relative in vivo metastatic potential prospectively. We identified a gene signature separating metastasis-high and metastasis-low subclones orthogonal to the normal-to-PDAC and classical-to-basal axes. The metastasis-high subclones feature activation of IL-1 pathway genes and high NF-κB and Zeb/Snail family activity and the metastasis-low subclones feature activation of neuroendocrine, motility, and Wnt pathway genes and high CDX2 and HOXA13 activity. In a functional screen, we validated novel mediators of PDAC metastasis in the IL-1 pathway, including the NF-κB targets Fos and Il23a, and beyond the IL-1 pathway including Myo1b and Tmem40. We scored human PDAC tumors for our signature of metastatic potential from mouse and found that metastases have higher scores than primary tumors. Moreover, primary tumors with higher scores are associated with worse prognosis. We also found that our metastatic potential signature is enriched in other human carcinomas, suggesting that it is conserved across epithelial malignancies. This work establishes a strategy for linking cancer cell state to future behavior, reveals novel functional regulators of PDAC metastasis, and establishes a method for scoring human carcinomas based on metastatic potential.
Collapse
Affiliation(s)
- Jesse S Handler
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Center for Epigenetics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Sidney Kimmel Comprehensive Cancer Center, Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Zijie Li
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Center for Epigenetics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Rachel K Dveirin
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Center for Epigenetics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Weixiang Fang
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Center for Epigenetics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Hani Goodarzi
- Department of Biochemistry & Biophysics, University of California, San Francisco, San Francisco, California, USA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, California, USA
- Arc Institute, Palo Alto 94305, USA
| | - Elana J Fertig
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Sidney Kimmel Comprehensive Cancer Center, Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Convergence Institute, Johns Hopkins Data Science and AI Institute, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Reza Kalhor
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Center for Epigenetics, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
- Department of Molecular Biology and Genetics, Department of Neuroscience, Department of Medicine, Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| |
Collapse
|
12
|
Zhang H, Mulqueen RM, Iannuzo N, Farrera DO, Polverino F, Galligan JJ, Ledford JG, Adey AC, Cusanovich DA. txci-ATAC-seq: a massive-scale single-cell technique to profile chromatin accessibility. Genome Biol 2024; 25:78. [PMID: 38519979 PMCID: PMC10958877 DOI: 10.1186/s13059-023-03150-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 12/20/2023] [Indexed: 03/25/2024] Open
Abstract
We develop a large-scale single-cell ATAC-seq method by combining Tn5-based pre-indexing with 10× Genomics barcoding, enabling the indexing of up to 200,000 nuclei across multiple samples in a single reaction. We profile 449,953 nuclei across diverse tissues, including the human cortex, mouse brain, human lung, mouse lung, mouse liver, and lung tissue from a club cell secretory protein knockout (CC16-/-) model. Our study of CC16-/- nuclei uncovers previously underappreciated technical artifacts derived from remnant 129 mouse strain genetic material, which cause profound cell-type-specific changes in regulatory elements near many genes, thereby confounding the interpretation of this commonly referenced mouse model.
Collapse
Affiliation(s)
- Hao Zhang
- Department of Cellular and Molecular Medicine, University of Arizona, Tucson, AZ, USA
- Asthma & Airway Disease Research Center, University of Arizona, Tucson, AZ, USA
| | - Ryan M Mulqueen
- Department of Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR, USA
| | - Natalie Iannuzo
- Department of Cellular and Molecular Medicine, University of Arizona, Tucson, AZ, USA
| | - Dominique O Farrera
- Department of Pharmacology and Toxicology, University of Arizona, Tucson, AZ, USA
| | - Francesca Polverino
- Asthma & Airway Disease Research Center, University of Arizona, Tucson, AZ, USA
- Division of Pulmonary, Allergy, Critical Care, and Sleep Medicine, University of Arizona, Tucson, AZ, USA
- Banner - University Medicine North, Pulmonary - Clinic F, Tucson, AZ, USA
| | - James J Galligan
- Department of Pharmacology and Toxicology, University of Arizona, Tucson, AZ, USA
| | - Julie G Ledford
- Department of Cellular and Molecular Medicine, University of Arizona, Tucson, AZ, USA
- Asthma & Airway Disease Research Center, University of Arizona, Tucson, AZ, USA
| | - Andrew C Adey
- Department of Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR, USA.
- Cancer Early Detection Advanced Research Center, Oregon Health & Science University, Portland, OR, USA.
- Oregon Health & Science University, Knight Cancer Institute, Portland, OR, USA.
- Oregon Health & Science University, Knight Cardiovascular Institute, Portland, OR, USA.
| | - Darren A Cusanovich
- Department of Cellular and Molecular Medicine, University of Arizona, Tucson, AZ, USA.
- Asthma & Airway Disease Research Center, University of Arizona, Tucson, AZ, USA.
| |
Collapse
|
13
|
DaSilva LF, Senan S, Patel ZM, Janardhan Reddy A, Gabbita S, Nussbaum Z, Valdez Córdova CM, Wenteler A, Weber N, Tunjic TM, Ahmad Khan T, Li Z, Smith C, Bejan M, Karmel Louis L, Cornejo P, Connell W, Wong ES, Meuleman W, Pinello L. DNA-Diffusion: Leveraging Generative Models for Controlling Chromatin Accessibility and Gene Expression via Synthetic Regulatory Elements. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.01.578352. [PMID: 38352499 PMCID: PMC10862870 DOI: 10.1101/2024.02.01.578352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/25/2024]
Abstract
The challenge of systematically modifying and optimizing regulatory elements for precise gene expression control is central to modern genomics and synthetic biology. Advancements in generative AI have paved the way for designing synthetic sequences with the aim of safely and accurately modulating gene expression. We leverage diffusion models to design context-specific DNA regulatory sequences, which hold significant potential toward enabling novel therapeutic applications requiring precise modulation of gene expression. Our framework uses a cell type-specific diffusion model to generate synthetic 200 bp regulatory elements based on chromatin accessibility across different cell types. We evaluate the generated sequences based on key metrics to ensure they retain properties of endogenous sequences: transcription factor binding site composition, potential for cell type-specific chromatin accessibility, and capacity for sequences generated by DNA diffusion to activate gene expression in different cell contexts using state-of-the-art prediction models. Our results demonstrate the ability to robustly generate DNA sequences with cell type-specific regulatory potential. DNA-Diffusion paves the way for revolutionizing a regulatory modulation approach to mammalian synthetic biology and precision gene therapy.
Collapse
Affiliation(s)
- Lucas Ferreira DaSilva
- Department of Pathology, Harvard Medical School, Boston, MA, USA
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
| | - Simon Senan
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Zain Munir Patel
- Department of Pathology, Harvard Medical School, Boston, MA, USA
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Aniketh Janardhan Reddy
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
| | - Sameer Gabbita
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Johns Hopkins University, Baltimore, MD, USA
| | | | | | | | | | | | | | - Zelun Li
- Victor Chang Cardiac Institute, Darlinghurst, New South Wales, Australia
- School of Biotechnology and Biomolecular Sciences, Faculty of Science, UNSW Sydney, Sydney, Australia
| | - Cameron Smith
- Department of Pathology, Harvard Medical School, Boston, MA, USA
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | | | - Lithin Karmel Louis
- Victor Chang Cardiac Institute, Darlinghurst, New South Wales, Australia
- School of Biotechnology and Biomolecular Sciences, Faculty of Science, UNSW Sydney, Sydney, Australia
| | - Paola Cornejo
- Victor Chang Cardiac Institute, Darlinghurst, New South Wales, Australia
- School of Biotechnology and Biomolecular Sciences, Faculty of Science, UNSW Sydney, Sydney, Australia
| | | | - Emily S. Wong
- Victor Chang Cardiac Institute, Darlinghurst, New South Wales, Australia
- School of Biotechnology and Biomolecular Sciences, Faculty of Science, UNSW Sydney, Sydney, Australia
| | - Wouter Meuleman
- Altius Institute for Biomedical Sciences, Seattle, WA, USA
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA
| | - Luca Pinello
- Department of Pathology, Harvard Medical School, Boston, MA, USA
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| |
Collapse
|
14
|
Manosalva Pérez N, Ferrari C, Engelhorn J, Depuydt T, Nelissen H, Hartwig T, Vandepoele K. MINI-AC: inference of plant gene regulatory networks using bulk or single-cell accessible chromatin profiles. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2024; 117:280-301. [PMID: 37788349 DOI: 10.1111/tpj.16483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 09/13/2023] [Accepted: 09/16/2023] [Indexed: 10/05/2023]
Abstract
Gene regulatory networks (GRNs) represent the interactions between transcription factors (TF) and their target genes. Plant GRNs control transcriptional programs involved in growth, development, and stress responses, ultimately affecting diverse agricultural traits. While recent developments in accessible chromatin (AC) profiling technologies make it possible to identify context-specific regulatory DNA, learning the underlying GRNs remains a major challenge. We developed MINI-AC (Motif-Informed Network Inference based on Accessible Chromatin), a method that combines AC data from bulk or single-cell experiments with TF binding site (TFBS) information to learn GRNs in plants. We benchmarked MINI-AC using bulk AC datasets from different Arabidopsis thaliana tissues and showed that it outperforms other methods to identify correct TFBS. In maize, a crop with a complex genome and abundant distal AC regions, MINI-AC successfully inferred leaf GRNs with experimentally confirmed, both proximal and distal, TF-target gene interactions. Furthermore, we showed that both AC regions and footprints are valid alternatives to infer AC-based GRNs with MINI-AC. Finally, we combined MINI-AC predictions from bulk and single-cell AC datasets to identify general and cell-type specific maize leaf regulators. Focusing on C4 metabolism, we identified diverse regulatory interactions in specialized cell types for this photosynthetic pathway. MINI-AC represents a powerful tool for inferring accurate AC-derived GRNs in plants and identifying known and novel candidate regulators, improving our understanding of gene regulation in plants.
Collapse
Affiliation(s)
- Nicolás Manosalva Pérez
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052, Ghent, Belgium
- Center for Plant Systems Biology, VIB, 9052, Ghent, Belgium
| | - Camilla Ferrari
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052, Ghent, Belgium
- Center for Plant Systems Biology, VIB, 9052, Ghent, Belgium
| | - Julia Engelhorn
- Molecular Physiology Department, Heinrich-Heine University, 40225, Düsseldorf, Germany
- Max Planck Institute for Plant Breeding Research, 50829, Cologne, Germany
| | - Thomas Depuydt
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052, Ghent, Belgium
- Center for Plant Systems Biology, VIB, 9052, Ghent, Belgium
| | - Hilde Nelissen
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052, Ghent, Belgium
- Center for Plant Systems Biology, VIB, 9052, Ghent, Belgium
| | - Thomas Hartwig
- Molecular Physiology Department, Heinrich-Heine University, 40225, Düsseldorf, Germany
- Max Planck Institute for Plant Breeding Research, 50829, Cologne, Germany
- Cluster of Excellence on Plant Sciences, Düsseldorf, Germany
| | - Klaas Vandepoele
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052, Ghent, Belgium
- Center for Plant Systems Biology, VIB, 9052, Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, 9052, Ghent, Belgium
| |
Collapse
|
15
|
Wen C, Yuan Z, Zhang X, Chen H, Luo L, Li W, Li T, Ma N, Mao F, Lin D, Lin Z, Lin C, Xu T, Lü P, Lin J, Zhu F. Sea-ATI unravels novel vocabularies of plant active cistrome. Nucleic Acids Res 2023; 51:11568-11583. [PMID: 37850650 PMCID: PMC10681729 DOI: 10.1093/nar/gkad853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Revised: 08/11/2023] [Accepted: 09/25/2023] [Indexed: 10/19/2023] Open
Abstract
The cistrome consists of all cis-acting regulatory elements recognized by transcription factors (TFs). However, only a portion of the cistrome is active for TF binding in a specific tissue. Resolving the active cistrome in plants remains challenging. In this study, we report the assay sequential extraction assisted-active TF identification (sea-ATI), a low-input method that profiles the DNA sequences recognized by TFs in a target tissue. We applied sea-ATI to seven plant tissues to survey their active cistrome and generated 41 motif models, including 15 new models that represent previously unidentified cis-regulatory vocabularies. ATAC-seq and RNA-seq analyses confirmed the functionality of the cis-elements from the new models, in that they are actively bound in vivo, located near the transcription start site, and influence chromatin accessibility and transcription. Furthermore, comparing dimeric WRKY CREs between sea-ATI and DAP-seq libraries revealed that thermodynamics and genetic drifts cooperatively shaped their evolution. Notably, sea-ATI can identify not only positive but also negative regulatory cis-elements, thereby providing unique insights into the functional non-coding genome of plants.
Collapse
Affiliation(s)
- Chenjin Wen
- College of Life Science, Haixia Institute of Science and Technology, National Engineering Research Center of JUNCAO, Fujian Agriculture and Forestry University, Fuzhou 350002, Fujian, China
| | - Zhen Yuan
- College of Life Science, Haixia Institute of Science and Technology, National Engineering Research Center of JUNCAO, Fujian Agriculture and Forestry University, Fuzhou 350002, Fujian, China
| | - Xiaotian Zhang
- College of Life Science, Haixia Institute of Science and Technology, National Engineering Research Center of JUNCAO, Fujian Agriculture and Forestry University, Fuzhou 350002, Fujian, China
| | - Hao Chen
- College of Life Science, Haixia Institute of Science and Technology, National Engineering Research Center of JUNCAO, Fujian Agriculture and Forestry University, Fuzhou 350002, Fujian, China
| | - Lin Luo
- College of Life Science, Haixia Institute of Science and Technology, National Engineering Research Center of JUNCAO, Fujian Agriculture and Forestry University, Fuzhou 350002, Fujian, China
| | - Wanying Li
- College of Life Science, Haixia Institute of Science and Technology, National Engineering Research Center of JUNCAO, Fujian Agriculture and Forestry University, Fuzhou 350002, Fujian, China
| | - Tian Li
- College of Life Science, Haixia Institute of Science and Technology, National Engineering Research Center of JUNCAO, Fujian Agriculture and Forestry University, Fuzhou 350002, Fujian, China
| | - Nana Ma
- College of Life Science, Haixia Institute of Science and Technology, National Engineering Research Center of JUNCAO, Fujian Agriculture and Forestry University, Fuzhou 350002, Fujian, China
| | - Fei Mao
- College of Life Science, Haixia Institute of Science and Technology, National Engineering Research Center of JUNCAO, Fujian Agriculture and Forestry University, Fuzhou 350002, Fujian, China
| | - Dongmei Lin
- College of Life Science, Haixia Institute of Science and Technology, National Engineering Research Center of JUNCAO, Fujian Agriculture and Forestry University, Fuzhou 350002, Fujian, China
| | - Zhanxi Lin
- College of Life Science, Haixia Institute of Science and Technology, National Engineering Research Center of JUNCAO, Fujian Agriculture and Forestry University, Fuzhou 350002, Fujian, China
| | - Chentao Lin
- College of Life Science, Haixia Institute of Science and Technology, National Engineering Research Center of JUNCAO, Fujian Agriculture and Forestry University, Fuzhou 350002, Fujian, China
| | - Tongda Xu
- College of Life Science, Haixia Institute of Science and Technology, National Engineering Research Center of JUNCAO, Fujian Agriculture and Forestry University, Fuzhou 350002, Fujian, China
| | - Peitao Lü
- College of Horticulture, Fujian Agriculture and Forestry University, Fuzhou 350002, Fujian, China
| | - Juncheng Lin
- College of Life Science, Haixia Institute of Science and Technology, National Engineering Research Center of JUNCAO, Fujian Agriculture and Forestry University, Fuzhou 350002, Fujian, China
| | - Fangjie Zhu
- College of Life Science, Haixia Institute of Science and Technology, National Engineering Research Center of JUNCAO, Fujian Agriculture and Forestry University, Fuzhou 350002, Fujian, China
| |
Collapse
|
16
|
Burdziak C, Zhao CJ, Haviv D, Alonso-Curbelo D, Lowe SW, Pe’er D. scKINETICS: inference of regulatory velocity with single-cell transcriptomics data. Bioinformatics 2023; 39:i394-i403. [PMID: 37387147 PMCID: PMC10311321 DOI: 10.1093/bioinformatics/btad267] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Transcriptional dynamics are governed by the action of regulatory proteins and are fundamental to systems ranging from normal development to disease. RNA velocity methods for tracking phenotypic dynamics ignore information on the regulatory drivers of gene expression variability through time. RESULTS We introduce scKINETICS (Key regulatory Interaction NETwork for Inferring Cell Speed), a dynamical model of gene expression change which is fit with the simultaneous learning of per-cell transcriptional velocities and a governing gene regulatory network. Fitting is accomplished through an expectation-maximization approach designed to learn the impact of each regulator on its target genes, leveraging biologically motivated priors from epigenetic data, gene-gene coexpression, and constraints on cells' future states imposed by the phenotypic manifold. Applying this approach to an acute pancreatitis dataset recapitulates a well-studied axis of acinar-to-ductal transdifferentiation whilst proposing novel regulators of this process, including factors with previously appreciated roles in driving pancreatic tumorigenesis. In benchmarking experiments, we show that scKINETICS successfully extends and improves existing velocity approaches to generate interpretable, mechanistic models of gene regulatory dynamics. AVAILABILITY AND IMPLEMENTATION All python code and an accompanying Jupyter notebook with demonstrations are available at http://github.com/dpeerlab/scKINETICS.
Collapse
Affiliation(s)
- Cassandra Burdziak
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, Sloan Kettering Institute, 408 E 69th Street, New York, NY 10021, United States
| | - Chujun Julia Zhao
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, Sloan Kettering Institute, 408 E 69th Street, New York, NY 10021, United States
- Department of Biomedical Engineering, Columbia University, 1210 Amsterdam Ave, New York, NY 10027, United States
| | - Doron Haviv
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, Sloan Kettering Institute, 408 E 69th Street, New York, NY 10021, United States
| | - Direna Alonso-Curbelo
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Carrer de Baldiri Reixac, 10, Barcelona 08028, Spain
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, Sloan Kettering Institute, 408 E 69th Street, New York, NY 10021, United States
| | - Scott W Lowe
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, Sloan Kettering Institute, 408 E 69th Street, New York, NY 10021, United States
- Howard Hughes Medical Institute, 4000 Jones Bridge Road, Chevy Chase, Maryland 20815, United States
| | - Dana Pe’er
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, Sloan Kettering Institute, 408 E 69th Street, New York, NY 10021, United States
- Howard Hughes Medical Institute, 4000 Jones Bridge Road, Chevy Chase, Maryland 20815, United States
| |
Collapse
|
17
|
Chen Z, Javed N, Moore M, Wu J, Sun G, Vinyard M, Collins A, Pinello L, Najm FJ, Bernstein BE. Integrative dissection of gene regulatory elements at base resolution. CELL GENOMICS 2023; 3:100318. [PMID: 37388913 PMCID: PMC10300548 DOI: 10.1016/j.xgen.2023.100318] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Revised: 02/21/2023] [Accepted: 03/31/2023] [Indexed: 07/01/2023]
Abstract
Although vast numbers of putative gene regulatory elements have been cataloged, the sequence motifs and individual bases that underlie their functions remain largely unknown. Here, we combine epigenetic perturbations, base editing, and deep learning to dissect regulatory sequences within the exemplar immune locus encoding CD69. We converge on a ∼170 base interval within a differentially accessible and acetylated enhancer critical for CD69 induction in stimulated Jurkat T cells. Individual C-to-T base edits within the interval markedly reduce element accessibility and acetylation, with corresponding reduction of CD69 expression. The most potent base edits may be explained by their effect on regulatory interactions between the transcriptional activators GATA3 and TAL1 and the repressor BHLHE40. Systematic analysis suggests that the interplay between GATA3 and BHLHE40 plays a general role in rapid T cell transcriptional responses. Our study provides a framework for parsing regulatory elements in their endogenous chromatin contexts and identifying operative artificial variants.
Collapse
Affiliation(s)
- Zeyu Chen
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
- Gene Regulation Observatory, Broad Institute, Cambridge, MA, USA
- Department of Cell Biology and Pathology, Harvard Medical School, Boston, MA, USA
| | - Nauman Javed
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
- Gene Regulation Observatory, Broad Institute, Cambridge, MA, USA
- Department of Cell Biology and Pathology, Harvard Medical School, Boston, MA, USA
| | - Molly Moore
- Gene Regulation Observatory, Broad Institute, Cambridge, MA, USA
| | - Jingyi Wu
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
- Gene Regulation Observatory, Broad Institute, Cambridge, MA, USA
- Department of Cell Biology and Pathology, Harvard Medical School, Boston, MA, USA
| | - Gary Sun
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Cell Biology and Pathology, Harvard Medical School, Boston, MA, USA
| | - Michael Vinyard
- Gene Regulation Observatory, Broad Institute, Cambridge, MA, USA
- Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA
| | | | - Luca Pinello
- Gene Regulation Observatory, Broad Institute, Cambridge, MA, USA
- Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Fadi J. Najm
- Gene Regulation Observatory, Broad Institute, Cambridge, MA, USA
| | - Bradley E. Bernstein
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA
- Gene Regulation Observatory, Broad Institute, Cambridge, MA, USA
- Department of Cell Biology and Pathology, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
18
|
Tognon M, Giugno R, Pinello L. A survey on algorithms to characterize transcription factor binding sites. Brief Bioinform 2023; 24:bbad156. [PMID: 37099664 PMCID: PMC10422928 DOI: 10.1093/bib/bbad156] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Revised: 03/27/2023] [Accepted: 04/01/2023] [Indexed: 04/28/2023] Open
Abstract
Transcription factors (TFs) are key regulatory proteins that control the transcriptional rate of cells by binding short DNA sequences called transcription factor binding sites (TFBS) or motifs. Identifying and characterizing TFBS is fundamental to understanding the regulatory mechanisms governing the transcriptional state of cells. During the last decades, several experimental methods have been developed to recover DNA sequences containing TFBS. In parallel, computational methods have been proposed to discover and identify TFBS motifs based on these DNA sequences. This is one of the most widely investigated problems in bioinformatics and is referred to as the motif discovery problem. In this manuscript, we review classical and novel experimental and computational methods developed to discover and characterize TFBS motifs in DNA sequences, highlighting their advantages and drawbacks. We also discuss open challenges and future perspectives that could fill the remaining gaps in the field.
Collapse
Affiliation(s)
- Manuel Tognon
- Computer Science Department, University of Verona, Verona, Italy
- Molecular Pathology Unit, Center for Computational and Integrative Biology and Center for Cancer Research, Massachusetts General Hospital, Charlestown, Massachusetts, United States of America
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Rosalba Giugno
- Computer Science Department, University of Verona, Verona, Italy
| | - Luca Pinello
- Molecular Pathology Unit, Center for Computational and Integrative Biology and Center for Cancer Research, Massachusetts General Hospital, Charlestown, Massachusetts, United States of America
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- Department of Pathology, Harvard Medical School, Boston, Massachusetts, United States of America
| |
Collapse
|
19
|
Li Z, Kuo CC, Ticconi F, Shaigan M, Gehrmann J, Gusmao EG, Allhoff M, Manolov M, Zenke M, Costa IG. RGT: a toolbox for the integrative analysis of high throughput regulatory genomics data. BMC Bioinformatics 2023; 24:79. [PMID: 36879236 PMCID: PMC9990262 DOI: 10.1186/s12859-023-05184-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 02/13/2023] [Indexed: 03/08/2023] Open
Abstract
BACKGROUND Massive amounts of data are produced by combining next-generation sequencing with complex biochemistry techniques to characterize regulatory genomics profiles, such as protein-DNA interaction and chromatin accessibility. Interpretation of such high-throughput data typically requires different computation methods. However, existing tools are usually developed for a specific task, which makes it challenging to analyze the data in an integrative manner. RESULTS We here describe the Regulatory Genomics Toolbox (RGT), a computational library for the integrative analysis of regulatory genomics data. RGT provides different functionalities to handle genomic signals and regions. Based on that, we developed several tools to perform distinct downstream analyses, including the prediction of transcription factor binding sites using ATAC-seq data, identification of differential peaks from ChIP-seq data, and detection of triple helix mediated RNA and DNA interactions, visualization, and finding an association between distinct regulatory factors. CONCLUSION We present here RGT; a framework to facilitate the customization of computational methods to analyze genomic data for specific regulatory genomics problems. RGT is a comprehensive and flexible Python package for analyzing high throughput regulatory genomics data and is available at: https://github.com/CostaLab/reg-gen . The documentation is available at: https://reg-gen.readthedocs.io.
Collapse
Affiliation(s)
- Zhijian Li
- Institute for Computational Genomics, Medical Faculty, RWTH Aachen University, 52074, Aachen, Germany.
- Joint Research Center for Computational Biomedicine, RWTH Aachen University Hospital, 52074, Aachen, Germany.
| | - Chao-Chung Kuo
- Institute for Computational Genomics, Medical Faculty, RWTH Aachen University, 52074, Aachen, Germany
- Joint Research Center for Computational Biomedicine, RWTH Aachen University Hospital, 52074, Aachen, Germany
| | - Fabio Ticconi
- Institute for Computational Genomics, Medical Faculty, RWTH Aachen University, 52074, Aachen, Germany
- Joint Research Center for Computational Biomedicine, RWTH Aachen University Hospital, 52074, Aachen, Germany
| | - Mina Shaigan
- Institute for Computational Genomics, Medical Faculty, RWTH Aachen University, 52074, Aachen, Germany
- Joint Research Center for Computational Biomedicine, RWTH Aachen University Hospital, 52074, Aachen, Germany
| | - Julia Gehrmann
- Institute for Computational Genomics, Medical Faculty, RWTH Aachen University, 52074, Aachen, Germany
- Joint Research Center for Computational Biomedicine, RWTH Aachen University Hospital, 52074, Aachen, Germany
| | - Eduardo Gade Gusmao
- Institute for Computational Genomics, Medical Faculty, RWTH Aachen University, 52074, Aachen, Germany
- Joint Research Center for Computational Biomedicine, RWTH Aachen University Hospital, 52074, Aachen, Germany
| | - Manuel Allhoff
- Institute for Computational Genomics, Medical Faculty, RWTH Aachen University, 52074, Aachen, Germany
- Joint Research Center for Computational Biomedicine, RWTH Aachen University Hospital, 52074, Aachen, Germany
| | - Martin Manolov
- Institute for Computational Genomics, Medical Faculty, RWTH Aachen University, 52074, Aachen, Germany
- Joint Research Center for Computational Biomedicine, RWTH Aachen University Hospital, 52074, Aachen, Germany
| | - Martin Zenke
- Department of Cell Biology, Institute of Biomedical Engineering, RWTH Aachen University Medical School, 52074, Aachen, Germany
- Helmholtz Institute for Biomedical Engineering, RWTH Aachen University, 52074, Aachen, Germany
- Department of Hematology, Oncology, Hemostaseology, and Stem Cell Transplantation, Faculty of Medicine, RWTH Aachen University, 52074, Aachen, Germany
| | - Ivan G Costa
- Institute for Computational Genomics, Medical Faculty, RWTH Aachen University, 52074, Aachen, Germany.
- Joint Research Center for Computational Biomedicine, RWTH Aachen University Hospital, 52074, Aachen, Germany.
| |
Collapse
|
20
|
Ma X, Fan L, Zhang Z, Yang X, Liu Y, Ma Y, Pan Y, Zhou G, Zhang M, Ning H, Kong F, Ma J, Liu S, Tian Z. Global dissection of the recombination landscape in soybean using a high-density 600K SoySNP array. PLANT BIOTECHNOLOGY JOURNAL 2023; 21:606-620. [PMID: 36458856 PMCID: PMC9946146 DOI: 10.1111/pbi.13975] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Revised: 12/09/2022] [Accepted: 11/27/2022] [Indexed: 05/17/2023]
Abstract
Recombination is crucial for crop breeding because it can break linkage drag and generate novel allele combinations. However, the high-resolution recombination landscape and its driving forces in soybean are largely unknown. Here, we constructed eight recombinant inbred line (RIL) populations and genotyped individual lines using the high-density 600K SoySNP array, which yielded a high-resolution recombination map with 5636 recombination sites at a resolution of 1.37 kb. The recombination rate was negatively correlated with transposable element density and GC content but positively correlated with gene density. Interestingly, we found that meiotic recombination was enriched at the promoters of active genes. Further investigations revealed that chromatin accessibility and active epigenetic modifications promoted recombination. Our findings provide important insights into the control of homologous recombination and thus will increase our ability to accelerate soybean breeding by manipulating meiotic recombination rate.
Collapse
Affiliation(s)
- Xin Ma
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy for Seed DesignChinese Academy of SciencesBeijingChina
| | - Lei Fan
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy for Seed DesignChinese Academy of SciencesBeijingChina
| | - Zhifang Zhang
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy for Seed DesignChinese Academy of SciencesBeijingChina
| | - Xia Yang
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy for Seed DesignChinese Academy of SciencesBeijingChina
- University of Chinese Academy of SciencesBeijingChina
| | - Yucheng Liu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy for Seed DesignChinese Academy of SciencesBeijingChina
| | - Yanming Ma
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy for Seed DesignChinese Academy of SciencesBeijingChina
| | - Yi Pan
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy for Seed DesignChinese Academy of SciencesBeijingChina
| | - Guoan Zhou
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy for Seed DesignChinese Academy of SciencesBeijingChina
| | - Min Zhang
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy for Seed DesignChinese Academy of SciencesBeijingChina
| | - Hailong Ning
- Key Laboratory of Soybean Biology, Chinese Ministry of EducationNortheast Agricultural UniversityHarbinChina
| | - Fanjiang Kong
- Innovative Center of Molecular Genetics and Evolution, School of Life SciencesGuangzhou UniversityGuangzhouChina
| | - Junkui Ma
- The Industrial Crop InstituteShanxi Agricultural UniversityTaiyuanChina
| | - Shulin Liu
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy for Seed DesignChinese Academy of SciencesBeijingChina
| | - Zhixi Tian
- State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Innovative Academy for Seed DesignChinese Academy of SciencesBeijingChina
- University of Chinese Academy of SciencesBeijingChina
| |
Collapse
|
21
|
Cazares TA, Rizvi FW, Iyer B, Chen X, Kotliar M, Bejjani AT, Wayman JA, Donmez O, Wronowski B, Parameswaran S, Kottyan LC, Barski A, Weirauch MT, Prasath VBS, Miraldi ER. maxATAC: Genome-scale transcription-factor binding prediction from ATAC-seq with deep neural networks. PLoS Comput Biol 2023; 19:e1010863. [PMID: 36719906 PMCID: PMC9917285 DOI: 10.1371/journal.pcbi.1010863] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 02/10/2023] [Accepted: 01/10/2023] [Indexed: 02/01/2023] Open
Abstract
Transcription factors read the genome, fundamentally connecting DNA sequence to gene expression across diverse cell types. Determining how, where, and when TFs bind chromatin will advance our understanding of gene regulatory networks and cellular behavior. The 2017 ENCODE-DREAM in vivo Transcription-Factor Binding Site (TFBS) Prediction Challenge highlighted the value of chromatin accessibility data to TFBS prediction, establishing state-of-the-art methods for TFBS prediction from DNase-seq. However, the more recent Assay-for-Transposase-Accessible-Chromatin (ATAC)-seq has surpassed DNase-seq as the most widely-used chromatin accessibility profiling method. Furthermore, ATAC-seq is the only such technique available at single-cell resolution from standard commercial platforms. While ATAC-seq datasets grow exponentially, suboptimal motif scanning is unfortunately the most common method for TFBS prediction from ATAC-seq. To enable community access to state-of-the-art TFBS prediction from ATAC-seq, we (1) curated an extensive benchmark dataset (127 TFs) for ATAC-seq model training and (2) built "maxATAC", a suite of user-friendly, deep neural network models for genome-wide TFBS prediction from ATAC-seq in any cell type. With models available for 127 human TFs, maxATAC is the largest collection of high-performance TFBS prediction models for ATAC-seq. maxATAC performance extends to primary cells and single-cell ATAC-seq, enabling improved TFBS prediction in vivo. We demonstrate maxATAC's capabilities by identifying TFBS associated with allele-dependent chromatin accessibility at atopic dermatitis genetic risk loci.
Collapse
Affiliation(s)
- Tareian A. Cazares
- Immunology Graduate Program, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
| | - Faiz W. Rizvi
- Systems Biology and Physiology Graduate Program, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
| | - Balaji Iyer
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, Ohio, United States of America
| | - Xiaoting Chen
- The Center for Autoimmune Genetics and Etiology (CAGE), Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Michael Kotliar
- Division of Allergy and Immunology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Anthony T. Bejjani
- Molecular and Developmental Biology Graduate Program, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
| | - Joseph A. Wayman
- Division of Immunobiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Omer Donmez
- The Center for Autoimmune Genetics and Etiology (CAGE), Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Benjamin Wronowski
- Division of Allergy and Immunology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Sreeja Parameswaran
- The Center for Autoimmune Genetics and Etiology (CAGE), Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Leah C. Kottyan
- The Center for Autoimmune Genetics and Etiology (CAGE), Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
- Division of Human Genetics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Artem Barski
- Division of Allergy and Immunology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
- Division of Human Genetics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - Matthew T. Weirauch
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- The Center for Autoimmune Genetics and Etiology (CAGE), Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
- Division of Human Genetics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
| | - V. B. Surya Prasath
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, Ohio, United States of America
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
| | - Emily R. Miraldi
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, Ohio, United States of America
- Division of Immunobiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, United States of America
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America
| |
Collapse
|
22
|
Ameen M, Sundaram L, Shen M, Banerjee A, Kundu S, Nair S, Shcherbina A, Gu M, Wilson KD, Varadarajan A, Vadgama N, Balsubramani A, Wu JC, Engreitz JM, Farh K, Karakikes I, Wang KC, Quertermous T, Greenleaf WJ, Kundaje A. Integrative single-cell analysis of cardiogenesis identifies developmental trajectories and non-coding mutations in congenital heart disease. Cell 2022; 185:4937-4953.e23. [PMID: 36563664 PMCID: PMC10122433 DOI: 10.1016/j.cell.2022.11.028] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Revised: 09/13/2022] [Accepted: 11/23/2022] [Indexed: 12/24/2022]
Abstract
To define the multi-cellular epigenomic and transcriptional landscape of cardiac cellular development, we generated single-cell chromatin accessibility maps of human fetal heart tissues. We identified eight major differentiation trajectories involving primary cardiac cell types, each associated with dynamic transcription factor (TF) activity signatures. We contrasted regulatory landscapes of iPSC-derived cardiac cell types and their in vivo counterparts, which enabled optimization of in vitro differentiation of epicardial cells. Further, we interpreted sequence based deep learning models of cell-type-resolved chromatin accessibility profiles to decipher underlying TF motif lexicons. De novo mutations predicted to affect chromatin accessibility in arterial endothelium were enriched in congenital heart disease (CHD) cases vs. controls. In vitro studies in iPSCs validated the functional impact of identified variation on the predicted developmental cell types. This work thus defines the cell-type-resolved cis-regulatory sequence determinants of heart development and identifies disruption of cell type-specific regulatory elements in CHD.
Collapse
Affiliation(s)
- Mohamed Ameen
- Department of Cancer Biology, Stanford University, Stanford, CA, USA; Illumina Artificial Intelligence Laboratory, Illumina Inc, Foster City, CA, USA
| | - Laksshman Sundaram
- Department of Computer Science, Stanford University, Stanford, CA, USA; Illumina Artificial Intelligence Laboratory, Illumina Inc, Foster City, CA, USA
| | - Mengcheng Shen
- Cardiovascular Institute, Stanford University, Stanford, CA, USA
| | - Abhimanyu Banerjee
- Illumina Artificial Intelligence Laboratory, Illumina Inc, Foster City, CA, USA; Department of Physics, Stanford University, Stanford, CA, USA
| | - Soumya Kundu
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Surag Nair
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Anna Shcherbina
- Department of Biomedical Informatics, Stanford University, Stanford, CA, USA
| | - Mingxia Gu
- Center for Stem Cell and Organoid Medicine, CuSTOM, Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | | | - Avyay Varadarajan
- Department of Computer Science, California Institute of Technology, Pasadena, CA, USA
| | - Nirmal Vadgama
- Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, USA
| | | | - Joseph C Wu
- Cardiovascular Institute, Stanford University, Stanford, CA, USA
| | | | - Kyle Farh
- Illumina Artificial Intelligence Laboratory, Illumina Inc, Foster City, CA, USA
| | - Ioannis Karakikes
- Cardiovascular Institute, Stanford University, Stanford, CA, USA; Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, USA.
| | - Kevin C Wang
- Department of Cancer Biology, Stanford University, Stanford, CA, USA; Department of Dermatology, Stanford University School of Medicine, Stanford, CA, USA; Veterans Affairs Palo Alto Healthcare System, Palo Alto, CA, USA.
| | - Thomas Quertermous
- Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, CA, USA.
| | - William J Greenleaf
- Department of Genetics, Stanford University, Stanford, CA, USA; Department of Applied Physics, Stanford University, Stanford, CA, USA.
| | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford, CA, USA; Department of Genetics, Stanford University, Stanford, CA, USA.
| |
Collapse
|
23
|
Delos Santos NP, Duttke S, Heinz S, Benner C. MEPP: more transparent motif enrichment by profiling positional correlations. NAR Genom Bioinform 2022; 4:lqac075. [PMID: 36267125 PMCID: PMC9575187 DOI: 10.1093/nargab/lqac075] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Revised: 08/18/2022] [Accepted: 09/23/2022] [Indexed: 11/11/2022] Open
Abstract
Score-based motif enrichment analysis (MEA) is typically applied to regulatory DNA to infer transcription factors (TFs) that may modulate transcription and chromatin state in different conditions. Most MEA methods determine motif enrichment independent of motif position within a sequence, even when those sequences harbor anchor points that motifs and their bound TFs may functionally interact with in a distance-dependent fashion, such as other TF binding motifs, transcription start sites (TSS), sequencing assay cleavage sites, or other biologically meaningful features. We developed motif enrichment positional profiling (MEPP), a novel MEA method that outputs a positional enrichment profile of a given TF's binding motif relative to key anchor points (e.g. transcription start sites, or other motifs) within the analyzed sequences while accounting for lower-order nucleotide bias. Using transcription initiation and TF binding as test cases, we demonstrate MEPP's utility in determining the sequence positions where motif presence correlates with measures of biological activity, inferring positional dependencies of binding site function. We demonstrate how MEPP can be applied to interpretation and hypothesis generation from experiments that quantify transcription initiation, chromatin structure, or TF binding measurements. MEPP is available for download from https://github.com/npdeloss/mepp.
Collapse
Affiliation(s)
- Nathaniel P Delos Santos
- Department of Biomedical Informatics, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0634, USA
| | - Sascha Duttke
- School of Molecular Biosciences, College of Veterinary Medicine, Washington State University, Pullman, WA, USA
| | - Sven Heinz
- Department of Medicine, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0634, USA
| | - Christopher Benner
- Department of Medicine, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0634, USA
| |
Collapse
|
24
|
Donohue LK, Guo MG, Zhao Y, Jung N, Bussat RT, Kim DS, Neela PH, Kellman LN, Garcia OS, Meyers RM, Altman RB, Khavari PA. A cis-regulatory lexicon of DNA motif combinations mediating cell-type-specific gene regulation. CELL GENOMICS 2022; 2:100191. [PMID: 36742369 PMCID: PMC9894309 DOI: 10.1016/j.xgen.2022.100191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Gene expression is controlled by transcription factors (TFs) that bind cognate DNA motif sequences in cis-regulatory elements (CREs). The combinations of DNA motifs acting within homeostasis and disease, however, are unclear. Gene expression, chromatin accessibility, TF footprinting, and H3K27ac-dependent DNA looping data were generated and a random-forest-based model was applied to identify 7,531 cell-type-specific cis-regulatory modules (CRMs) across 15 diploid human cell types. A co-enrichment framework within CRMs nominated 838 cell-type-specific, recurrent heterotypic DNA motif combinations (DMCs), which were functionally validated using massively parallel reporter assays. Cancer cells engaged DMCs linked to neoplasia-enabling processes operative in normal cells while also activating new DMCs only seen in the neoplastic state. This integrative approach identifies cell-type-specific cis-regulatory combinatorial DNA motifs in diverse normal and diseased human cells and represents a general framework for deciphering cis-regulatory sequence logic in gene regulation.
Collapse
Affiliation(s)
- Laura K.H. Donohue
- Program in Epithelial Biology, Stanford University School of Medicine, Stanford, CA, USA,Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA,Synthego, Redwood City, CA, USA,These authors contributed equally
| | - Margaret G. Guo
- Program in Epithelial Biology, Stanford University School of Medicine, Stanford, CA, USA,Stanford Program in Biomedical Informatics, Stanford University, Stanford, CA, USA,These authors contributed equally
| | - Yang Zhao
- Program in Epithelial Biology, Stanford University School of Medicine, Stanford, CA, USA,Synthego, Redwood City, CA, USA
| | - Namyoung Jung
- Program in Epithelial Biology, Stanford University School of Medicine, Stanford, CA, USA,Department of Life Science, Pohang University of Science and Technology, Pohang, Korea
| | - Rose T. Bussat
- Program in Epithelial Biology, Stanford University School of Medicine, Stanford, CA, USA,23andMe, Inc., Sunnyvale, CA, USA
| | - Daniel S. Kim
- Program in Epithelial Biology, Stanford University School of Medicine, Stanford, CA, USA,Stanford Program in Biomedical Informatics, Stanford University, Stanford, CA, USA
| | - Poornima H. Neela
- Program in Epithelial Biology, Stanford University School of Medicine, Stanford, CA, USA,Fauna Bio, Emeryville, CA, USA
| | - Laura N. Kellman
- Program in Epithelial Biology, Stanford University School of Medicine, Stanford, CA, USA,Stanford Program in Cancer Biology, Stanford University, Stanford, CA, USA
| | - Omar S. Garcia
- Program in Epithelial Biology, Stanford University School of Medicine, Stanford, CA, USA
| | - Robin M. Meyers
- Program in Epithelial Biology, Stanford University School of Medicine, Stanford, CA, USA,Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Russ B. Altman
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA,Stanford Program in Biomedical Informatics, Stanford University, Stanford, CA, USA,Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Paul A. Khavari
- Program in Epithelial Biology, Stanford University School of Medicine, Stanford, CA, USA,Stanford Program in Cancer Biology, Stanford University, Stanford, CA, USA,Veterans Affairs Palo Alto Healthcare System, Palo Alto, CA, USA,Lead contact,Correspondence:
| |
Collapse
|
25
|
Heide T, Househam J, Cresswell GD, Spiteri I, Lynn C, Mossner M, Kimberley C, Fernandez-Mateos J, Chen B, Zapata L, James C, Barozzi I, Chkhaidze K, Nichol D, Gunasri V, Berner A, Schmidt M, Lakatos E, Baker AM, Costa H, Mitchinson M, Piazza R, Jansen M, Caravagna G, Ramazzotti D, Shibata D, Bridgewater J, Rodriguez-Justo M, Magnani L, Graham TA, Sottoriva A. The co-evolution of the genome and epigenome in colorectal cancer. Nature 2022; 611:733-743. [PMID: 36289335 PMCID: PMC9684080 DOI: 10.1038/s41586-022-05202-1] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Accepted: 08/05/2022] [Indexed: 12/13/2022]
Abstract
Colorectal malignancies are a leading cause of cancer-related death1 and have undergone extensive genomic study2,3. However, DNA mutations alone do not fully explain malignant transformation4-7. Here we investigate the co-evolution of the genome and epigenome of colorectal tumours at single-clone resolution using spatial multi-omic profiling of individual glands. We collected 1,370 samples from 30 primary cancers and 8 concomitant adenomas and generated 1,207 chromatin accessibility profiles, 527 whole genomes and 297 whole transcriptomes. We found positive selection for DNA mutations in chromatin modifier genes and recurrent somatic chromatin accessibility alterations, including in regulatory regions of cancer driver genes that were otherwise devoid of genetic mutations. Genome-wide alterations in accessibility for transcription factor binding involved CTCF, downregulation of interferon and increased accessibility for SOX and HOX transcription factor families, suggesting the involvement of developmental genes during tumourigenesis. Somatic chromatin accessibility alterations were heritable and distinguished adenomas from cancers. Mutational signature analysis showed that the epigenome in turn influences the accumulation of DNA mutations. This study provides a map of genetic and epigenetic tumour heterogeneity, with fundamental implications for understanding colorectal cancer biology.
Collapse
Affiliation(s)
- Timon Heide
- Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK
- Computational Biology Research Centre, Human Technopole, Milan, Italy
| | - Jacob Househam
- Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK
- Evolution and Cancer Lab, Centre for Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London, UK
| | - George D Cresswell
- Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK
| | - Inmaculada Spiteri
- Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK
| | - Claire Lynn
- Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK
| | - Maximilian Mossner
- Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK
- Evolution and Cancer Lab, Centre for Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London, UK
| | - Chris Kimberley
- Evolution and Cancer Lab, Centre for Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London, UK
| | | | - Bingjie Chen
- Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK
| | - Luis Zapata
- Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK
| | - Chela James
- Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK
| | - Iros Barozzi
- Department of Surgery and Cancer, Imperial College London, London, UK
- Centre for Cancer Research, Medical University of Vienna, Vienna, Austria
| | - Ketevan Chkhaidze
- Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK
| | - Daniel Nichol
- Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK
| | - Vinaya Gunasri
- Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK
- Evolution and Cancer Lab, Centre for Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London, UK
| | - Alison Berner
- Evolution and Cancer Lab, Centre for Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London, UK
| | - Melissa Schmidt
- Evolution and Cancer Lab, Centre for Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London, UK
| | - Eszter Lakatos
- Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK
- Evolution and Cancer Lab, Centre for Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London, UK
| | - Ann-Marie Baker
- Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK
- Evolution and Cancer Lab, Centre for Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London, UK
| | - Helena Costa
- Department of Pathology, UCL Cancer Institute, University College London, London, UK
| | - Miriam Mitchinson
- Department of Pathology, UCL Cancer Institute, University College London, London, UK
| | - Rocco Piazza
- Department of Medicine and Surgery, University of Milano-Bicocca, Milan, Italy
| | - Marnix Jansen
- Department of Pathology, UCL Cancer Institute, University College London, London, UK
| | - Giulio Caravagna
- Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK
- Department of Mathematics and Geosciences, University of Triest, Triest, Italy
| | - Daniele Ramazzotti
- Department of Medicine and Surgery, University of Milano-Bicocca, Milan, Italy
| | - Darryl Shibata
- Department of Pathology, University of Southern California Keck School of Medicine, Los Angeles, CA, USA
| | | | | | - Luca Magnani
- Department of Surgery and Cancer, Imperial College London, London, UK
| | - Trevor A Graham
- Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK.
- Evolution and Cancer Lab, Centre for Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London, UK.
| | - Andrea Sottoriva
- Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK.
- Computational Biology Research Centre, Human Technopole, Milan, Italy.
| |
Collapse
|
26
|
Yang T, Henao R. TAMC: A deep-learning approach to predict motif-centric transcriptional factor binding activity based on ATAC-seq profile. PLoS Comput Biol 2022; 18:e1009921. [PMID: 36094959 PMCID: PMC9499209 DOI: 10.1371/journal.pcbi.1009921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 09/22/2022] [Accepted: 08/24/2022] [Indexed: 11/18/2022] Open
Abstract
Determining transcriptional factor binding sites (TFBSs) is critical for understanding the molecular mechanisms regulating gene expression in different biological conditions. Biological assays designed to directly mapping TFBSs require large sample size and intensive resources. As an alternative, ATAC-seq assay is simple to conduct and provides genomic cleavage profiles that contain rich information for imputing TFBSs indirectly. Previous footprint-based tools are inheritably limited by the accuracy of their bias correction algorithms and the efficiency of their feature extraction models. Here we introduce TAMC (Transcriptional factor binding prediction from ATAC-seq profile at Motif-predicted binding sites using Convolutional neural networks), a deep-learning approach for predicting motif-centric TF binding activity from paired-end ATAC-seq data. TAMC does not require bias correction during signal processing. By leveraging a one-dimensional convolutional neural network (1D-CNN) model, TAMC make predictions based on both footprint and non-footprint features at binding sites for each TF and outperforms existing footprinting tools in TFBS prediction particularly for ATAC-seq data with limited sequencing depth.
Collapse
Affiliation(s)
- Tianqi Yang
- Department of Pharmacology and Cancer Biology, Duke University School of Medicine, Durham, North Carolina, United States of America
- Department of Cell Biology, Duke University School of Medicine, Durham, North Carolina, United States of America
- * E-mail: (TY); (RH)
| | - Ricardo Henao
- Center for Applied Genomics and Precision Medicine, Duke University School of Medicine, Durham, North Carolina, United States of America
- Department of Biostatistics and Informatics, Duke University, Durham, North Carolina, United States of America
- * E-mail: (TY); (RH)
| |
Collapse
|
27
|
Lal A, Galvao Ferrarini M, Gruber AJ. Investigating the Human Host-ssRNA Virus Interaction Landscape Using the SMEAGOL Toolbox. Viruses 2022; 14:1436. [PMID: 35891416 PMCID: PMC9317827 DOI: 10.3390/v14071436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 06/19/2022] [Accepted: 06/24/2022] [Indexed: 12/04/2022] Open
Abstract
Viruses have evolved numerous mechanisms to exploit the molecular machinery of their host cells, including the broad spectrum of host RNA-binding proteins (RBPs). However, the RBP interactomes of most viruses are largely unknown. To shed light on the interaction landscape of RNA viruses with human host cell RBPs, we have analysed 197 single-stranded RNA (ssRNA) viral genome sequences and found that the majority of ssRNA virus genomes are significantly enriched or depleted in motifs for specific human RBPs, suggesting selection pressure on these interactions. To facilitate tailored investigations and the analysis of genomes sequenced in future, we have released our methodology as a fast and user-friendly computational toolbox named SMEAGOL. Our resources will contribute to future studies of specific ssRNA virus-host cell interactions and support the identification of antiviral drug targets.
Collapse
Affiliation(s)
| | - Mariana Galvao Ferrarini
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France;
- Laboratoire de Biométrie et Biologie Évolutive, UMR 5558, CNRS, Université de Lyon, Université Lyon 1, 69622 Villeurbanne, France
| | - Andreas J. Gruber
- Department of Biology, University of Konstanz, Universitaetsstrasse 10, D-78464 Konstanz, Germany
| |
Collapse
|
28
|
Structural and genome-wide analyses suggest that transposon-derived protein SETMAR alters transcription and splicing. J Biol Chem 2022; 298:101894. [PMID: 35378129 PMCID: PMC9062482 DOI: 10.1016/j.jbc.2022.101894] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Revised: 03/25/2022] [Accepted: 03/26/2022] [Indexed: 11/22/2022] Open
Abstract
Extensive portions of the human genome have unknown function, including those derived from transposable elements. One such element, the DNA transposon Hsmar1, entered the primate lineage approximately 50 million years ago leaving behind terminal inverted repeat (TIR) sequences and a single intact copy of the Hsmar1 transposase, which retains its ancestral TIR-DNA-binding activity, and is fused with a lysine methyltransferase SET domain to constitute the chimeric SETMAR gene. Here, we provide a structural basis for recognition of TIRs by SETMAR and investigate the function of SETMAR through genome-wide approaches. As elucidated in our 2.37 Å crystal structure, SETMAR forms a dimeric complex with each DNA-binding domain bound specifically to TIR-DNA through the formation of 32 hydrogen bonds. We found that SETMAR recognizes primarily TIR sequences (∼5000 sites) within the human genome as assessed by chromatin immunoprecipitation sequencing analysis. In two SETMAR KO cell lines, we identified 163 shared differentially expressed genes and 233 shared alternative splicing events. Among these genes are several pre–mRNA-splicing factors, transcription factors, and genes associated with neuronal function, and one alternatively spliced primate-specific gene, TMEM14B, which has been identified as a marker for neocortex expansion associated with brain evolution. Taken together, our results suggest a model in which SETMAR impacts differential expression and alternative splicing of genes associated with transcription and neuronal function, potentially through both its TIR-specific DNA-binding and lysine methyltransferase activities, consistent with a role for SETMAR in simian primate development.
Collapse
|
29
|
Zhang J, Zhang Y, You Q, Huang C, Zhang T, Wang M, Zhang T, Yang X, Xiong J, Li Y, Liu CP, Zhang Z, Xu RM, Zhu B. Highly enriched BEND3 prevents the premature activation of bivalent genes during differentiation. Science 2022; 375:1053-1058. [PMID: 35143257 DOI: 10.1126/science.abm0730] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Bivalent genes are ready for activation upon the arrival of developmental cues. Here, we report that BEND3 is a CpG island (CGI)-binding protein that is enriched at regulatory elements. The cocrystal structure of BEND3 in complex with its target DNA reveals the structural basis for its DNA methylation-sensitive binding property. Mouse embryos ablated of Bend3 died at the pregastrulation stage. Bend3 null embryonic stem cells (ESCs) exhibited severe defects in differentiation, during which hundreds of CGI-containing bivalent genes were prematurely activated. BEND3 is required for the stable association of polycomb repressive complex 2 (PRC2) at bivalent genes that are highly occupied by BEND3, which suggests a reining function of BEND3 in maintaining high levels of H3K27me3 at these bivalent genes in ESCs to prevent their premature activation in the forthcoming developmental stage.
Collapse
Affiliation(s)
- Jing Zhang
- National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Yan Zhang
- National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Qinglong You
- National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.,College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Chang Huang
- National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Tiantian Zhang
- National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Mingzhu Wang
- Institutes of Physical Science and Information Technology, Anhui University, Hefei 230601, Anhui, China
| | - Tianwei Zhang
- National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Xiaocheng Yang
- National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.,College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jun Xiong
- National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Yingfeng Li
- National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Chao-Pei Liu
- National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Zhuqiang Zhang
- National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Rui-Ming Xu
- National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.,College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Bing Zhu
- National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.,College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
30
|
Abstract
DNA can determine where and when genes are expressed, but the full set of sequence determinants that control gene expression is unknown. Here, we measured the transcriptional activity of DNA sequences that represent an ~100 times larger sequence space than the human genome using massively parallel reporter assays (MPRAs). Machine learning models revealed that transcription factors (TFs) generally act in an additive manner with weak grammar and that most enhancers increase expression from a promoter by a mechanism that does not appear to involve specific TF–TF interactions. The enhancers themselves can be classified into three types: classical, closed chromatin and chromatin dependent. We also show that few TFs are strongly active in a cell, with most activities being similar between cell types. Individual TFs can have multiple gene regulatory activities, including chromatin opening and enhancing, promoting and determining transcription start site (TSS) activity, consistent with the view that the TF binding motif is the key atomic unit of gene expression. Analysis of massively parallel reporter assays measuring the transcriptional activity of DNA sequences indicates that most transcription factor (TF) activity is additive and does not rely on specific TF–TF interactions. Individual TFs can have different gene regulatory activities.
Collapse
|
31
|
Hammelman J, Krismer K, Gifford DK. spatzie: an R package for identifying significant transcription factor motif co-enrichment from enhancer–promoter interactions. Nucleic Acids Res 2022; 50:e52. [PMID: 35100401 PMCID: PMC9122533 DOI: 10.1093/nar/gkac036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 01/07/2022] [Accepted: 01/29/2022] [Indexed: 01/30/2023] Open
Abstract
Genomic interactions provide important context to our understanding of the state of the genome. One question is whether specific transcription factor interactions give rise to genome organization. We introduce spatzie, an R package and a website that implements statistical tests for significant transcription factor motif cooperativity between enhancer–promoter interactions. We conducted controlled experiments under realistic simulated data from ChIP-seq to confirm spatzie is capable of discovering co-enriched motif interactions even in noisy conditions. We then use spatzie to investigate cell type specific transcription factor cooperativity within recent human ChIA-PET enhancer–promoter interaction data. The method is available online at https://spatzie.mit.edu.
Collapse
Affiliation(s)
- Jennifer Hammelman
- Computational and Systems Biology Program, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32 Vassar Street, Cambridge, MA 02139, USA
| | - Konstantin Krismer
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32 Vassar Street, Cambridge, MA 02139, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
| | - David K Gifford
- Computational and Systems Biology Program, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32 Vassar Street, Cambridge, MA 02139, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
| |
Collapse
|
32
|
El Ghamrasni S, Quevedo R, Hawley J, Mazrooei P, Hanna Y, Cirlan I, Zhu H, Bruce JP, Oldfield LE, Yang SYC, Guilhamon P, Reimand J, Cescon DW, Done SJ, Lupien M, Pugh TJ. Mutations in Noncoding Cis-Regulatory Elements Reveal Cancer Driver Cistromes in Luminal Breast Cancer. Mol Cancer Res 2022; 20:102-113. [PMID: 34556523 PMCID: PMC9398156 DOI: 10.1158/1541-7786.mcr-21-0471] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 07/31/2021] [Accepted: 09/17/2021] [Indexed: 01/07/2023]
Abstract
Whole-genome sequencing of primary breast tumors enabled the identification of cancer driver genes and noncoding cancer driver plexuses from somatic mutations. However, differentiating driver from passenger events among noncoding genetic variants remains a challenge. Herein, we reveal cancer-driver cis-regulatory elements linked to transcription factors previously shown to be involved in development of luminal breast cancers by defining a tumor-enriched catalogue of approximately 100,000 unique cis-regulatory elements from 26 primary luminal estrogen receptor (ER)+ progesterone receptor (PR)+ breast tumors. Integrating this catalog with somatic mutations from 350 publicly available breast tumor whole genomes, we uncovered cancer driver cistromes, defined as the sum of binding sites for a transcription factor, for ten transcription factors in luminal breast cancer such as FOXA1 and ER, nine of which are essential for growth in breast cancer with four exclusive to the luminal subtype. Collectively, we present a strategy to find cancer driver cistromes relying on quantifying the enrichment of noncoding mutations over cis-regulatory elements concatenated into a functional unit. IMPLICATIONS: Mapping the accessible chromatin of luminal breast cancer led to discovery of an accumulation of mutations within cistromes of transcription factors essential to luminal breast cancer. This demonstrates coopting of regulatory networks to drive cancer and provides a framework to derive insight into the noncoding space of cancer.
Collapse
Affiliation(s)
- Samah El Ghamrasni
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Rene Quevedo
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | - James Hawley
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | - Parisa Mazrooei
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
- Genentech, South San Francisco, California
| | - Youstina Hanna
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Iulia Cirlan
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Helen Zhu
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
- Vector Institute, Toronto, Ontario, Canada
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Jeff P Bruce
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Leslie E Oldfield
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - S Y Cindy Yang
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | - Paul Guilhamon
- Developmental and Stem Cell Biology Program, The Hospital for Sick Children, Toronto, Ontario, Canada
- Arthur and Sonia Labatt Brain Tumor Research Centre, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Jüri Reimand
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Dave W Cescon
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Susan J Done
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
- Department of Laboratory Medicine & Pathobiology, University of Toronto, Toronto, Ontario, Canada
| | - Mathieu Lupien
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada.
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Trevor J Pugh
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada.
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| |
Collapse
|
33
|
Li D, Xu J, Yang MQ. Gene Regulation Analysis Reveals Perturbations of Autism Spectrum Disorder during Neural System Development. Genes (Basel) 2021; 12:genes12121901. [PMID: 34946850 PMCID: PMC8700980 DOI: 10.3390/genes12121901] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 11/24/2021] [Accepted: 11/24/2021] [Indexed: 01/21/2023] Open
Abstract
Autism spectrum disorder (ASD) is a neurodevelopmental disorder that impedes patients' cognition, social, speech and communication skills. ASD is highly heterogeneous with a variety of etiologies and clinical manifestations. The prevalence rate of ASD increased steadily in recent years. Presently, molecular mechanisms underlying ASD occurrence and development remain to be elucidated. Here, we integrated multi-layer genomics data to investigate the transcriptome and pathway dysregulations in ASD development. The RNA sequencing (RNA-seq) expression profiles of induced pluripotent stem cells (iPSCs), neural progenitor cells (NPCs) and neuron cells from ASD and normal samples were compared in our study. We found that substantially more genes were differentially expressed in the NPCs than the iPSCs. Consistently, gene set variation analysis revealed that the activity of the known ASD pathways in NPCs and neural cells were significantly different from the iPSCs, suggesting that ASD occurred at the early stage of neural system development. We further constructed comprehensive brain- and neural-specific regulatory networks by incorporating transcription factor (TF) and gene interactions with long 5 non-coding RNA(lncRNA) and protein interactions. We then overlaid the transcriptomes of different cell types on the regulatory networks to infer the regulatory cascades. The variations of the regulatory cascades between ASD and normal samples uncovered a set of novel disease-associated genes and gene interactions, particularly highlighting the functional roles of ELF3 and the interaction between STAT1 and lncRNA ELF3-AS 1 in the disease development. These new findings extend our understanding of ASD and offer putative new therapeutic targets for further studies.
Collapse
Affiliation(s)
- Dan Li
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA;
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR 72079, USA;
- Correspondence: (J.X.); (M.Q.Y.)
| | - Mary Qu Yang
- MidSouth Bioinformatics Center, Joint Bioinformatics Graduate Program of University of Arkansas at Little Rock, University of Arkansas for Medical Sciences, Little Rock, AR 72204, USA
- Correspondence: (J.X.); (M.Q.Y.)
| |
Collapse
|
34
|
Patel ZM, Hughes TR. Global properties of regulatory sequences are predicted by transcription factor recognition mechanisms. Genome Biol 2021; 22:285. [PMID: 34620190 PMCID: PMC8496038 DOI: 10.1186/s13059-021-02503-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Accepted: 09/16/2021] [Indexed: 01/07/2023] Open
Abstract
Background Mammalian genomes contain millions of putative regulatory sequences, which are delineated by binding of multiple transcription factors. The degree to which spacing and orientation constraints among transcription factor binding sites contribute to the recognition and identity of regulatory sequence is an unresolved but important question that impacts our understanding of genome function and evolution. Global mechanisms that underlie phenomena including the size of regulatory sequences, their uniqueness, and their evolutionary turnover remain poorly described. Results Here, we ask whether models incorporating different degrees of spacing and orientation constraints among transcription factor binding sites are broadly consistent with several global properties of regulatory sequence. These properties include length, sequence diversity, turnover rate, and dominance of specific TFs in regulatory site identity and cell type specification. Models with and without spacing and orientation constraints are generally consistent with all observed properties of regulatory sequence, and with regulatory sequences being fundamentally small (~ 1 nucleosome). Uniqueness of regulatory regions and their rapid evolutionary turnover are expected under all models examined. An intriguing issue we identify is that the complexity of eukaryotic regulatory sites must scale with the number of active transcription factors, in order to accomplish observed specificity. Conclusions Models of transcription factor binding with or without spacing and orientation constraints predict that regulatory sequences should be fundamentally short, unique, and turn over rapidly. We posit that the existence of master regulators may be, in part, a consequence of evolutionary pressure to limit the complexity and increase evolvability of regulatory sites. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-021-02503-y.
Collapse
Affiliation(s)
- Zain M Patel
- Donnelly Centre for Cellular and Biomolecular Research and Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 3E1, Canada
| | - Timothy R Hughes
- Donnelly Centre for Cellular and Biomolecular Research and Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 3E1, Canada.
| |
Collapse
|
35
|
Tognon M, Bonnici V, Garrison E, Giugno R, Pinello L. GRAFIMO: Variant and haplotype aware motif scanning on pangenome graphs. PLoS Comput Biol 2021; 17:e1009444. [PMID: 34570769 PMCID: PMC8519448 DOI: 10.1371/journal.pcbi.1009444] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Revised: 10/15/2021] [Accepted: 09/10/2021] [Indexed: 11/18/2022] Open
Abstract
Transcription factors (TFs) are proteins that promote or reduce the expression of genes by binding short genomic DNA sequences known as transcription factor binding sites (TFBS). While several tools have been developed to scan for potential occurrences of TFBS in linear DNA sequences or reference genomes, no tool exists to find them in pangenome variation graphs (VGs). VGs are sequence-labelled graphs that can efficiently encode collections of genomes and their variants in a single, compact data structure. Because VGs can losslessly compress large pangenomes, TFBS scanning in VGs can efficiently capture how genomic variation affects the potential binding landscape of TFs in a population of individuals. Here we present GRAFIMO (GRAph-based Finding of Individual Motif Occurrences), a command-line tool for the scanning of known TF DNA motifs represented as Position Weight Matrices (PWMs) in VGs. GRAFIMO extends the standard PWM scanning procedure by considering variations and alternative haplotypes encoded in a VG. Using GRAFIMO on a VG based on individuals from the 1000 Genomes project we recover several potential binding sites that are enhanced, weakened or missed when scanning only the reference genome, and which could constitute individual-specific binding events. GRAFIMO is available as an open-source tool, under the MIT license, at https://github.com/pinellolab/GRAFIMO and https://github.com/InfOmics/GRAFIMO. Transcription factors (TFs) are key regulatory proteins and mutations occurring in their binding sites can alter the normal transcriptional landscape of a cell and lead to disease states. Pangenome variation graphs (VGs) efficiently encode genomes from a population of individuals and their genetic variations. GRAFIMO is an open-source tool that extends the traditional PWM scanning procedure to VGs. By scanning for potential TBFS in VGs, GRAFIMO can simultaneously search thousands of genomes while accounting for SNPs, indels, and structural variants. GRAFIMO reports motif occurrences, their statistical significance, frequency, and location within the reference or alternative haplotypes in a given VG. GRAFIMO makes it possible to study how genetic variation affects the binding landscape of known TFs within a population of individuals.
Collapse
Affiliation(s)
- Manuel Tognon
- Computer Science Department, University of Verona, Verona, Italy
| | - Vincenzo Bonnici
- Computer Science Department, University of Verona, Verona, Italy
| | - Erik Garrison
- University of Tennessee Health Science Center, Memphis, Tennessee, United States of America
| | - Rosalba Giugno
- Computer Science Department, University of Verona, Verona, Italy
- * E-mail: (RG); (LP)
| | - Luca Pinello
- Molecular Pathology Unit, Center for Computational and Integrative Biology and Center for Cancer Research, Massachusetts General Hospital Charlestown, Massachusetts, United States of America
- Department of Pathology, Harvard Medical School, Boston, Massachusetts, United States of America
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- * E-mail: (RG); (LP)
| |
Collapse
|
36
|
Yao Q, Ferragina P, Reshef Y, Lettre G, Bauer DE, Pinello L. Motif-Raptor: a cell type-specific and transcription factor centric approach for post-GWAS prioritization of causal regulators. Bioinformatics 2021; 37:2103-2111. [PMID: 33532840 PMCID: PMC11025460 DOI: 10.1093/bioinformatics/btab072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Revised: 11/30/2020] [Accepted: 01/28/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Genome-wide association studies (GWASs) have identified thousands of common trait-associated genetic variants but interpretation of their function remains challenging. These genetic variants can overlap the binding sites of transcription factors (TFs) and therefore could alter gene expression. However, we currently lack a systematic understanding on how this mechanism contributes to phenotype. RESULTS We present Motif-Raptor, a TF-centric computational tool that integrates sequence-based predictive models, chromatin accessibility, gene expression datasets and GWAS summary statistics to systematically investigate how TF function is affected by genetic variants. Given trait-associated non-coding variants, Motif-Raptor can recover relevant cell types and critical TFs to drive hypotheses regarding their mechanism of action. We tested Motif-Raptor on complex traits such as rheumatoid arthritis and red blood cell count and demonstrated its ability to prioritize relevant cell types, potential regulatory TFs and non-coding SNPs which have been previously characterized and validated. AVAILABILITY AND IMPLEMENTATION Motif-Raptor is freely available as a Python package at: https://github.com/pinellolab/MotifRaptor. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qiuming Yao
- Department of Pathology, Massachusetts General Hospital, Charlestown, MA 02129, USA
- Division of Hematology/Oncology, Boston Children’s Hospital, Boston, MA 02115, USA
- Harvard Medical School, Boston, MA 02115, USA
| | - Paolo Ferragina
- Department of Computer Science, University of Pisa, Pisa 56128, Italy
| | - Yakir Reshef
- Department of Computer Science, Harvard University, Cambridge, MA 02138, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Guillaume Lettre
- Faculty of Medicine, Université de Montréal, Montreal, Quebec H3C3J7, Canada
- Montreal Heart Institute, Montreal, Quebec H1T1C8, Canada
| | - Daniel E Bauer
- Division of Hematology/Oncology, Boston Children’s Hospital, Boston, MA 02115, USA
- Harvard Medical School, Boston, MA 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA 02115, USA
| | - Luca Pinello
- Department of Pathology, Massachusetts General Hospital, Charlestown, MA 02129, USA
- Harvard Medical School, Boston, MA 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| |
Collapse
|
37
|
Weidemüller P, Kholmatov M, Petsalaki E, Zaugg JB. Transcription factors: Bridge between cell signaling and gene regulation. Proteomics 2021; 21:e2000034. [PMID: 34314098 DOI: 10.1002/pmic.202000034] [Citation(s) in RCA: 102] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 07/05/2021] [Accepted: 07/16/2021] [Indexed: 01/17/2023]
Abstract
Transcription factors (TFs) are key regulators of intrinsic cellular processes, such as differentiation and development, and of the cellular response to external perturbation through signaling pathways. In this review we focus on the role of TFs as a link between signaling pathways and gene regulation. Cell signaling tends to result in the modulation of a set of TFs that then lead to changes in the cell's transcriptional program. We highlight the molecular layers at which TF activity can be measured and the associated technical and conceptual challenges. These layers include post-translational modifications (PTMs) of the TF, regulation of TF binding to DNA through chromatin accessibility and epigenetics, and expression of target genes. We highlight that a large number of TFs are understudied in both signaling and gene regulation studies, and that our knowledge about known TF targets has a strong literature bias. We argue that TFs serve as a perfect bridge between the fields of gene regulation and signaling, and that separating these fields hinders our understanding of cell functions. Multi-omics approaches that measure multiple dimensions of TF activity are ideally suited to study the interplay of cell signaling and gene regulation using TFs as the anchor to link the two fields.
Collapse
Affiliation(s)
- Paula Weidemüller
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Maksim Kholmatov
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstraße 1, Heidelberg, 69117, Germany
| | - Evangelia Petsalaki
- European Bioinformatics Institute, European Molecular Biology Laboratory, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Judith B Zaugg
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstraße 1, Heidelberg, 69117, Germany
| |
Collapse
|
38
|
Sobreira DR, Joslin AC, Zhang Q, Williamson I, Hansen GT, Farris KM, Sakabe NJ, Sinnott-Armstrong N, Bozek G, Jensen-Cody SO, Flippo KH, Ober C, Bickmore WA, Potthoff M, Chen M, Claussnitzer M, Aneas I, Nóbrega MA. Extensive pleiotropism and allelic heterogeneity mediate metabolic effects of IRX3 and IRX5. Science 2021; 372:1085-1091. [PMID: 34083488 PMCID: PMC8386003 DOI: 10.1126/science.abf1008] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 04/28/2021] [Indexed: 12/11/2022]
Abstract
Whereas coding variants often have pleiotropic effects across multiple tissues, noncoding variants are thought to mediate their phenotypic effects by specific tissue and temporal regulation of gene expression. Here, we investigated the genetic and functional architecture of a genomic region within the FTO gene that is strongly associated with obesity risk. We show that multiple variants on a common haplotype modify the regulatory properties of several enhancers targeting IRX3 and IRX5 from megabase distances. We demonstrate that these enhancers affect gene expression in multiple tissues, including adipose and brain, and impart regulatory effects during a restricted temporal window. Our data indicate that the genetic architecture of disease-associated loci may involve extensive pleiotropy, allelic heterogeneity, shared allelic effects across tissues, and temporally restricted effects.
Collapse
Affiliation(s)
- Débora R Sobreira
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA.
| | - Amelia C Joslin
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | - Qi Zhang
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Iain Williamson
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK
| | - Grace T Hansen
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | - Kathryn M Farris
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | - Noboru J Sakabe
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | - Nasa Sinnott-Armstrong
- Department of Genetics, Stanford University, Stanford 94305 CA, USA
- Metabolism Program and Cardiovascular Disease Initiative, Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142, USA
| | - Grazyna Bozek
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | - Sharon O Jensen-Cody
- Department of Pharmacology, University of Iowa Carver College of Medicine, Iowa City, IA 52242, USA
| | - Kyle H Flippo
- Department of Pharmacology, University of Iowa Carver College of Medicine, Iowa City, IA 52242, USA
| | - Carole Ober
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | - Wendy A Bickmore
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK
| | - Matthew Potthoff
- Department of Pharmacology, University of Iowa Carver College of Medicine, Iowa City, IA 52242, USA
| | - Mengjie Chen
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Melina Claussnitzer
- Metabolism Program and Cardiovascular Disease Initiative, Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02142, USA
- Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA 02131, USA
| | - Ivy Aneas
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA.
| | - Marcelo A Nóbrega
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA.
| |
Collapse
|
39
|
Gajos M, Jasnovidova O, van Bömmel A, Freier S, Vingron M, Mayer A. Conserved DNA sequence features underlie pervasive RNA polymerase pausing. Nucleic Acids Res 2021; 49:4402-4420. [PMID: 33788942 PMCID: PMC8096220 DOI: 10.1093/nar/gkab208] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Revised: 03/05/2021] [Accepted: 03/15/2021] [Indexed: 12/17/2022] Open
Abstract
Pausing of transcribing RNA polymerase is regulated and creates opportunities to control gene expression. Research in metazoans has so far mainly focused on RNA polymerase II (Pol II) promoter-proximal pausing leaving the pervasive nature of pausing and its regulatory potential in mammalian cells unclear. Here, we developed a pause detecting algorithm (PDA) for nucleotide-resolution occupancy data and a new native elongating transcript sequencing approach, termed nested NET-seq, that strongly reduces artifactual peaks commonly misinterpreted as pausing sites. Leveraging PDA and nested NET-seq reveal widespread genome-wide Pol II pausing at single-nucleotide resolution in human cells. Notably, the majority of Pol II pauses occur outside of promoter-proximal gene regions primarily along the gene-body of transcribed genes. Sequence analysis combined with machine learning modeling reveals DNA sequence properties underlying widespread transcriptional pausing including a new pause motif. Interestingly, key sequence determinants of RNA polymerase pausing are conserved between human cells and bacteria. These studies indicate pervasive sequence-induced transcriptional pausing in human cells and the knowledge of exact pause locations implies potential functional roles in gene expression.
Collapse
Affiliation(s)
- Martyna Gajos
- Otto-Warburg-Laboratory, Max Planck Institute for Molecular Genetics, Berlin 14195, Germany.,Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin 14195, Germany
| | - Olga Jasnovidova
- Otto-Warburg-Laboratory, Max Planck Institute for Molecular Genetics, Berlin 14195, Germany
| | - Alena van Bömmel
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin 14195, Germany.,Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin 14195, Germany
| | - Susanne Freier
- Otto-Warburg-Laboratory, Max Planck Institute for Molecular Genetics, Berlin 14195, Germany
| | - Martin Vingron
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin 14195, Germany
| | - Andreas Mayer
- Otto-Warburg-Laboratory, Max Planck Institute for Molecular Genetics, Berlin 14195, Germany
| |
Collapse
|
40
|
Wang HLV, Forestier S, Corces VG. Exposure to sevoflurane results in changes of transcription factor occupancy in sperm and inheritance of autism. Biol Reprod 2021; 105:705-719. [PMID: 33982067 DOI: 10.1093/biolre/ioab097] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Revised: 05/03/2021] [Accepted: 05/07/2021] [Indexed: 11/13/2022] Open
Abstract
One in 54 children in the U.S. is diagnosed with Autism Spectrum Disorder (ASD). De novo germline and somatic mutations cannot account for all cases of ASD, suggesting that epigenetic alterations triggered by environmental exposures may be responsible for a subset of ASD cases. Human and animal studies have shown that exposure of the developing brain to general anesthetic (GA) agents can trigger neurodegeneration and neurobehavioral abnormalities but the effects of general anesthetics on the germ line have not been explored in detail. We exposed pregnant mice to sevoflurane during the time of embryonic development when the germ cells undergo epigenetic reprogramming and found that more than 38% of the directly exposed F1 animals exhibit impairments in anxiety and social interactions. Strikingly, 44-47% of the F2 and F3 animals, which were not directly exposed to sevoflurane, show the same behavioral problems. We performed ATAC-seq and identified more than 1200 differentially accessible sites in the sperm of F1 animals, 69 of which are also present in the sperm of F2 animals. These sites are located in regulatory regions of genes strongly associated with ASD, including Arid1b, Ntrk2, and Stmn2. These findings suggest that epimutations caused by exposing germ cells to sevoflurane can lead to ASD in the offspring, and this effect can be transmitted through the male germline inter and trans-generationally.
Collapse
Affiliation(s)
- Hsiao-Lin V Wang
- Department of Human Genetics, Emory University School of Medicine, 615 Michael St, Atlanta, GA 30322, USA
| | - Samantha Forestier
- Department of Human Genetics, Emory University School of Medicine, 615 Michael St, Atlanta, GA 30322, USA
| | - Victor G Corces
- Department of Human Genetics, Emory University School of Medicine, 615 Michael St, Atlanta, GA 30322, USA
| |
Collapse
|
41
|
Rodrigues DC, Mufteev M, Weatheritt RJ, Djuric U, Ha KCH, Ross PJ, Wei W, Piekna A, Sartori MA, Byres L, Mok RSF, Zaslavsky K, Pasceri P, Diamandis P, Morris Q, Blencowe BJ, Ellis J. Shifts in Ribosome Engagement Impact Key Gene Sets in Neurodevelopment and Ubiquitination in Rett Syndrome. Cell Rep 2021; 30:4179-4196.e11. [PMID: 32209477 DOI: 10.1016/j.celrep.2020.02.107] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Revised: 12/30/2019] [Accepted: 02/27/2020] [Indexed: 12/21/2022] Open
Abstract
Regulation of translation during human development is poorly understood, and its dysregulation is associated with Rett syndrome (RTT). To discover shifts in mRNA ribosomal engagement (RE) during human neurodevelopment, we use parallel translating ribosome affinity purification sequencing (TRAP-seq) and RNA sequencing (RNA-seq) on control and RTT human induced pluripotent stem cells, neural progenitor cells, and cortical neurons. We find that 30% of transcribed genes are translationally regulated, including key gene sets (neurodevelopment, transcription and translation factors, and glycolysis). Approximately 35% of abundant intergenic long noncoding RNAs (lncRNAs) are ribosome engaged. Neurons translate mRNAs more efficiently and have longer 3' UTRs, and RE correlates with elements for RNA-binding proteins. RTT neurons have reduced global translation and compromised mTOR signaling, and >2,100 genes are translationally dysregulated. NEDD4L E3-ubiquitin ligase is translationally impaired, ubiquitinated protein levels are reduced, and protein targets accumulate in RTT neurons. Overall, the dynamic translatome in neurodevelopment is disturbed in RTT and provides insight into altered ubiquitination that may have therapeutic implications.
Collapse
Affiliation(s)
- Deivid C Rodrigues
- Program in Developmental & Stem Cell Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Marat Mufteev
- Program in Developmental & Stem Cell Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada; Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Robert J Weatheritt
- Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Ugljesa Djuric
- Laboratory Medicine and Pathology Program, University Health Network, Toronto, ON M5G 2C4, Canada
| | - Kevin C H Ha
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada; Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 3E1, Canada; Vector Institute, 661 University Avenue, Toronto, ON M5G 1M1, Canada
| | - P Joel Ross
- Program in Developmental & Stem Cell Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Wei Wei
- Program in Developmental & Stem Cell Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Alina Piekna
- Program in Developmental & Stem Cell Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Maria A Sartori
- Program in Developmental & Stem Cell Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Loryn Byres
- Program in Developmental & Stem Cell Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada; Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Rebecca S F Mok
- Program in Developmental & Stem Cell Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada; Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Kirill Zaslavsky
- Program in Developmental & Stem Cell Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada; Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Peter Pasceri
- Program in Developmental & Stem Cell Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Phedias Diamandis
- Laboratory Medicine and Pathology Program, University Health Network, Toronto, ON M5G 2C4, Canada; Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON M5S 1A1, Canada; Department of Pathology, University Health Network, Toronto, ON M5G 2C4, Canada
| | - Quaid Morris
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada; Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 3E1, Canada; Vector Institute, 661 University Avenue, Toronto, ON M5G 1M1, Canada
| | - Benjamin J Blencowe
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada; Donnelly Center for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - James Ellis
- Program in Developmental & Stem Cell Biology, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada; Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada.
| |
Collapse
|
42
|
Pinaud S, Tetreau G, Poteaux P, Galinier R, Chaparro C, Lassalle D, Portet A, Simphor E, Gourbal B, Duval D. New Insights Into Biomphalysin Gene Family Diversification in the Vector Snail Biomphalaria glabrata. Front Immunol 2021; 12:635131. [PMID: 33868258 PMCID: PMC8047071 DOI: 10.3389/fimmu.2021.635131] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Accepted: 03/08/2021] [Indexed: 11/30/2022] Open
Abstract
Aerolysins initially characterized as virulence factors in bacteria are increasingly found in massive genome and transcriptome sequencing data from metazoans. Horizontal gene transfer has been demonstrated as the main way of aerolysin-related toxins acquisition in metazoans. However, only few studies have focused on their potential biological functions in such organisms. Herein, we present an extensive characterization of a multigene family encoding aerolysins - named biomphalysin - in Biomphalaria glabrata snail, the intermediate host of the trematode Schistosoma mansoni. Our results highlight that duplication and domestication of an acquired bacterial toxin gene in the snail genome result in the acquisition of a novel and diversified toxin family. Twenty-three biomphalysin genes were identified. All are expressed and exhibited a tissue-specific expression pattern. An in silico structural analysis was performed to highlight the central role played by two distinct domains i) a large lobe involved in the lytic function of these snail toxins which constrained their evolution and ii) a small lobe which is structurally variable between biomphalysin toxins and that matched to various functional domains involved in moiety recognition of targets cells. A functional approach suggests that the repertoire of biomphalysins that bind to pathogens, depends on the type of pathogen encountered. These results underline a neo-and sub-functionalization of the biomphalysin toxins, which have the potential to increase the range of effectors in the snail’s immune arsenal.
Collapse
Affiliation(s)
- Silvain Pinaud
- IHPE, Univ Montpellier, CNRS, IFREMER, Univ Perpignan Via Domitia, Perpignan, France.,CNRS, IFREMER, University of Montpellier, Perpignan, France
| | - Guillaume Tetreau
- IHPE, Univ Montpellier, CNRS, IFREMER, Univ Perpignan Via Domitia, Perpignan, France.,CNRS, IFREMER, University of Montpellier, Perpignan, France
| | - Pierre Poteaux
- IHPE, Univ Montpellier, CNRS, IFREMER, Univ Perpignan Via Domitia, Perpignan, France.,CNRS, IFREMER, University of Montpellier, Perpignan, France
| | - Richard Galinier
- IHPE, Univ Montpellier, CNRS, IFREMER, Univ Perpignan Via Domitia, Perpignan, France.,CNRS, IFREMER, University of Montpellier, Perpignan, France
| | - Cristian Chaparro
- IHPE, Univ Montpellier, CNRS, IFREMER, Univ Perpignan Via Domitia, Perpignan, France.,CNRS, IFREMER, University of Montpellier, Perpignan, France
| | - Damien Lassalle
- IHPE, Univ Montpellier, CNRS, IFREMER, Univ Perpignan Via Domitia, Perpignan, France.,CNRS, IFREMER, University of Montpellier, Perpignan, France
| | - Anaïs Portet
- IHPE, Univ Montpellier, CNRS, IFREMER, Univ Perpignan Via Domitia, Perpignan, France.,CNRS, IFREMER, University of Montpellier, Perpignan, France
| | - Elodie Simphor
- IHPE, Univ Montpellier, CNRS, IFREMER, Univ Perpignan Via Domitia, Perpignan, France.,CNRS, IFREMER, University of Montpellier, Perpignan, France
| | - Benjamin Gourbal
- IHPE, Univ Montpellier, CNRS, IFREMER, Univ Perpignan Via Domitia, Perpignan, France.,CNRS, IFREMER, University of Montpellier, Perpignan, France
| | - David Duval
- IHPE, Univ Montpellier, CNRS, IFREMER, Univ Perpignan Via Domitia, Perpignan, France.,CNRS, IFREMER, University of Montpellier, Perpignan, France
| |
Collapse
|
43
|
Sinnott-Armstrong N, Sousa IS, Laber S, Rendina-Ruedy E, Nitter Dankel SE, Ferreira T, Mellgren G, Karasik D, Rivas M, Pritchard J, Guntur AR, Cox RD, Lindgren CM, Hauner H, Sallari R, Rosen CJ, Hsu YH, Lander ES, Kiel DP, Claussnitzer M. A regulatory variant at 3q21.1 confers an increased pleiotropic risk for hyperglycemia and altered bone mineral density. Cell Metab 2021; 33:615-628.e13. [PMID: 33513366 PMCID: PMC7928941 DOI: 10.1016/j.cmet.2021.01.001] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Revised: 11/14/2019] [Accepted: 12/31/2020] [Indexed: 02/07/2023]
Abstract
Skeletal and glycemic traits have shared etiology, but the underlying genetic factors remain largely unknown. To identify genetic loci that may have pleiotropic effects, we studied Genome-wide association studies (GWASs) for bone mineral density and glycemic traits and identified a bivariate risk locus at 3q21. Using sequence and epigenetic modeling, we prioritized an adenylate cyclase 5 (ADCY5) intronic causal variant, rs56371916. This SNP changes the binding affinity of SREBP1 and leads to differential ADCY5 gene expression, altering the chromatin landscape from poised to repressed. These alterations result in bone- and type 2 diabetes-relevant cell-autonomous changes in lipid metabolism in osteoblasts and adipocytes. We validated our findings by directly manipulating the regulator SREBP1, the target gene ADCY5, and the variant rs56371916, which together imply a novel link between fatty acid oxidation and osteoblast differentiation. Our work, by systematic functional dissection of pleiotropic GWAS loci, represents a framework to uncover biological mechanisms affecting pleiotropic traits.
Collapse
Affiliation(s)
- Nasa Sinnott-Armstrong
- Metabolism Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Cell Circuits and Epigenomics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Genetics, Stanford University, Stanford 94305 CA, USA
| | - Isabel S Sousa
- Metabolism Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Else Kröner-Fresenius-Center for Nutritional Medicine, School of Life Sciences, Technical University of Munich, Freising 85354, Germany
| | - Samantha Laber
- Metabolism Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Cell Circuits and Epigenomics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Big Data Institute, University of Oxford, Oxford, UK
| | - Elizabeth Rendina-Ruedy
- Center for Molecular Medicine, Maine Medical Center Research Institute, Scarborough, ME 04074, USA
| | - Simon E Nitter Dankel
- University of Bergen, Bergen 5020, Norway; Mohn Nutrition Research Laboratory, Department of Clinical Science, University of Bergen, 5020 Bergen, Norway; Hormone Laboratory, Department of Medical Biochemistry and Pharmacology, Haukeland University Hospital, 5021 Bergen, Norway
| | | | - Gunnar Mellgren
- University of Bergen, Bergen 5020, Norway; Mohn Nutrition Research Laboratory, Department of Clinical Science, University of Bergen, 5020 Bergen, Norway; Hormone Laboratory, Department of Medical Biochemistry and Pharmacology, Haukeland University Hospital, 5021 Bergen, Norway
| | - David Karasik
- Institute for Aging Research, Hebrew SeniorLife and Harvard Medical School, Boston, MA 02131, USA; Faculty of Medicine of the Galilee, Bar-Ilan University, Safed, Israel
| | - Manuel Rivas
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Jonathan Pritchard
- Department of Genetics, Stanford University, Stanford 94305 CA, USA; Department of Biology, Stanford University, Stanford, CA 94305, USA
| | - Anyonya R Guntur
- Center for Molecular Medicine, Maine Medical Center Research Institute, Scarborough, ME 04074, USA
| | - Roger D Cox
- Medical Research Council Harwell, Oxfordshire, UK
| | - Cecilia M Lindgren
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Big Data Institute, University of Oxford, Oxford, UK
| | - Hans Hauner
- Else Kröner-Fresenius-Center for Nutritional Medicine, School of Life Sciences, Technical University of Munich, Freising 85354, Germany; Institute of Nutritional Medicine, School of Medicine, Technical University of Munich, Freising 85354, Germany; Clinical Cooperation Group "Nutrigenomics and Type 2 Diabetes" of the German Center of Diabetes Research, Helmholtz Center Munich, Munich 85764, Germany
| | - Richard Sallari
- Metabolism Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Clifford J Rosen
- Center for Molecular Medicine, Maine Medical Center Research Institute, Scarborough, ME 04074, USA
| | - Yi-Hsiang Hsu
- Institute for Aging Research, Hebrew SeniorLife and Harvard Medical School, Boston, MA 02131, USA; Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02131, USA
| | - Eric S Lander
- Metabolism Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Cell Circuits and Epigenomics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Biology, MIT, Cambridge, MA 02142, USA; Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
| | - Douglas P Kiel
- Institute for Aging Research, Hebrew SeniorLife and Harvard Medical School, Boston, MA 02131, USA; Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02131, USA
| | - Melina Claussnitzer
- Metabolism Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Cell Circuits and Epigenomics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02131, USA; University of Hohenheim, Institute of Nutritional Science, Stuttgart 70599, Germany.
| |
Collapse
|
44
|
Frigola J, Sabarinathan R, Gonzalez-Perez A, Lopez-Bigas N. Variable interplay of UV-induced DNA damage and repair at transcription factor binding sites. Nucleic Acids Res 2021; 49:891-901. [PMID: 33347579 PMCID: PMC7826277 DOI: 10.1093/nar/gkaa1219] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 11/12/2020] [Accepted: 12/03/2020] [Indexed: 12/13/2022] Open
Abstract
An abnormally high rate of UV-light related mutations appears at transcription factor binding sites (TFBS) across melanomas. The binding of transcription factors (TFs) to the DNA impairs the repair of UV-induced lesions and certain TFs have been shown to increase the rate of generation of these lesions at their binding sites. However, the precise contribution of these two elements to the increase in mutation rate at TFBS in these malignant cells is not understood. Here, exploiting nucleotide-resolution data, we computed the rate of formation and repair of UV-lesions within the binding sites of TFs of different families. We observed, at certain dipyrimidine positions within the binding site of TFs in the Tryptophan Cluster family, an increased rate of formation of UV-induced lesions, corroborating previous studies. Nevertheless, across most families of TFs, the observed increased mutation rate within the entire DNA region covered by the protein results from the decreased repair efficiency. While the rate of mutations across all TFBS does not agree with the amount of UV-induced lesions observed immediately after UV exposure, it strongly agrees with that observed after 48 h. This corroborates the determinant role of the impaired repair in the observed increase of mutation rate.
Collapse
Affiliation(s)
- Joan Frigola
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain.,Thoracictumors and head and neck cancer group, Vall d'Hebron Institute of Oncology. Natzaret, 115-117, 08035, Barcelona, Spain
| | - Radhakrishnan Sabarinathan
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore 560065, India
| | - Abel Gonzalez-Perez
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain.,Research Program on Biomedical Informatics, Universitat Pompeu Fabra,Barcelona, Catalonia, Spain
| | - Nuria Lopez-Bigas
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain.,Research Program on Biomedical Informatics, Universitat Pompeu Fabra,Barcelona, Catalonia, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
45
|
Convex hulls in hamming space enable efficient search for similarity and clustering of genomic sequences. BMC Bioinformatics 2020; 21:482. [PMID: 33375937 PMCID: PMC7772912 DOI: 10.1186/s12859-020-03811-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Accepted: 10/13/2020] [Indexed: 12/09/2022] Open
Abstract
Background In molecular epidemiology, comparison of intra-host viral variants among infected persons is frequently used for tracing transmissions in human population and detecting viral infection outbreaks. Application of Ultra-Deep Sequencing (UDS) immensely increases the sensitivity of transmission detection but brings considerable computational challenges when comparing all pairs of sequences. We developed a new population comparison method based on convex hulls in hamming space. We applied this method to a large set of UDS samples obtained from unrelated cases infected with hepatitis C virus (HCV) and compared its performance with three previously published methods. Results The convex hull in hamming space is a data structure that provides information on: (1) average hamming distance within the set, (2) average hamming distance between two sets; (3) closeness centrality of each sequence; and (4) lower and upper bound of all the pairwise distances among the members of two sets. This filtering strategy rapidly and correctly removes 96.2% of all pairwise HCV sample comparisons, outperforming all previous methods. The convex hull distance (CHD) algorithm showed variable performance depending on sequence heterogeneity of the studied populations in real and simulated datasets, suggesting the possibility of using clustering methods to improve the performance. To address this issue, we developed a new clustering algorithm, k-hulls, that reduces heterogeneity of the convex hull. This efficient algorithm is an extension of the k-means algorithm and can be used with any type of categorical data. It is 6.8-times more accurate than k-mode, a previously developed clustering algorithm for categorical data. Conclusions CHD is a fast and efficient filtering strategy for massively reducing the computational burden of pairwise comparison among large samples of sequences, and thus, aiding the calculation of transmission links among infected individuals using threshold-based methods. In addition, the convex hull efficiently obtains important summary metrics for intra-host viral populations.
Collapse
|
46
|
Yang H, Luan Y, Liu T, Lee HJ, Fang L, Wang Y, Wang X, Zhang B, Jin Q, Ang KC, Xing X, Wang J, Xu J, Song F, Sriranga I, Khunsriraksakul C, Salameh T, Li D, Choudhary MNK, Topczewski J, Wang K, Gerhard GS, Hardison RC, Wang T, Cheng KC, Yue F. A map of cis-regulatory elements and 3D genome structures in zebrafish. Nature 2020; 588:337-343. [PMID: 33239788 DOI: 10.1038/s41586-020-2962-9] [Citation(s) in RCA: 62] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2019] [Accepted: 09/17/2020] [Indexed: 01/08/2023]
Abstract
The zebrafish (Danio rerio) has been widely used in the study of human disease and development, and about 70% of the protein-coding genes are conserved between the two species1. However, studies in zebrafish remain constrained by the sparse annotation of functional control elements in the zebrafish genome. Here we performed RNA sequencing, assay for transposase-accessible chromatin using sequencing (ATAC-seq), chromatin immunoprecipitation with sequencing, whole-genome bisulfite sequencing, and chromosome conformation capture (Hi-C) experiments in up to eleven adult and two embryonic tissues to generate a comprehensive map of transcriptomes, cis-regulatory elements, heterochromatin, methylomes and 3D genome organization in the zebrafish Tübingen reference strain. A comparison of zebrafish, human and mouse regulatory elements enabled the identification of both evolutionarily conserved and species-specific regulatory sequences and networks. We observed enrichment of evolutionary breakpoints at topologically associating domain boundaries, which were correlated with strong histone H3 lysine 4 trimethylation (H3K4me3) and CCCTC-binding factor (CTCF) signals. We performed single-cell ATAC-seq in zebrafish brain, which delineated 25 different clusters of cell types. By combining long-read DNA sequencing and Hi-C, we assembled the sex-determining chromosome 4 de novo. Overall, our work provides an additional epigenomic anchor for the functional annotation of vertebrate genomes and the study of evolutionarily conserved elements of 3D genome organization.
Collapse
Affiliation(s)
- Hongbo Yang
- Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine Northwestern University, Chicago, IL, USA
| | - Yu Luan
- Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine Northwestern University, Chicago, IL, USA
| | - Tingting Liu
- Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine Northwestern University, Chicago, IL, USA
| | - Hyung Joo Lee
- Department of Genetics, The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St Louis, MO, USA
| | - Li Fang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Yanli Wang
- Bioinformatics and Genomics Program, The Pennsylvania State University, State College, PA, USA
| | - Xiaotao Wang
- Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine Northwestern University, Chicago, IL, USA
| | - Bo Zhang
- Bioinformatics and Genomics Program, The Pennsylvania State University, State College, PA, USA
| | - Qiushi Jin
- Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine Northwestern University, Chicago, IL, USA
| | - Khai Chung Ang
- Department of Pathology and Penn State Zebrafish Functional Genomics Core, College of Medicine, The Pennsylvania State University, Hershey, PA, USA
| | - Xiaoyun Xing
- Department of Genetics, The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St Louis, MO, USA
| | - Juan Wang
- Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine Northwestern University, Chicago, IL, USA
| | - Jie Xu
- Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine Northwestern University, Chicago, IL, USA
| | - Fan Song
- Bioinformatics and Genomics Program, The Pennsylvania State University, State College, PA, USA
| | - Iyyanki Sriranga
- Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine Northwestern University, Chicago, IL, USA
| | | | - Tarik Salameh
- Bioinformatics and Genomics Program, The Pennsylvania State University, State College, PA, USA
| | - Daofeng Li
- Department of Genetics, The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St Louis, MO, USA
| | - Mayank N K Choudhary
- Department of Genetics, The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St Louis, MO, USA
| | - Jacek Topczewski
- Department of Pediatrics, Northwestern University Feinberg School of Medicine, Chicago, IL, USA.,Stanley Manne Children's Research Institute, Ann and Robert H. Lurie Children's Hospital of Chicago, Chicago, IL, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Glenn S Gerhard
- Department of Medical Genetics and Molecular Biochemistry, Lewis Katz School of Medicine at Temple University, Philadelphia, PA, USA
| | - Ross C Hardison
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA
| | - Ting Wang
- Department of Genetics, The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St Louis, MO, USA
| | - Keith C Cheng
- Department of Pathology and Penn State Zebrafish Functional Genomics Core, College of Medicine, The Pennsylvania State University, Hershey, PA, USA
| | - Feng Yue
- Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine Northwestern University, Chicago, IL, USA. .,Robert H. Lurie Comprehensive Cancer Center of Northwestern University, Chicago, IL, USA.
| |
Collapse
|
47
|
Domcke S, Hill AJ, Daza RM, Cao J, O'Day DR, Pliner HA, Aldinger KA, Pokholok D, Zhang F, Milbank JH, Zager MA, Glass IA, Steemers FJ, Doherty D, Trapnell C, Cusanovich DA, Shendure J. A human cell atlas of fetal chromatin accessibility. Science 2020; 370:eaba7612. [PMID: 33184180 PMCID: PMC7785298 DOI: 10.1126/science.aba7612] [Citation(s) in RCA: 227] [Impact Index Per Article: 45.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Accepted: 09/10/2020] [Indexed: 12/12/2022]
Abstract
The chromatin landscape underlying the specification of human cell types is of fundamental interest. We generated human cell atlases of chromatin accessibility and gene expression in fetal tissues. For chromatin accessibility, we devised a three-level combinatorial indexing assay and applied it to 53 samples representing 15 organs, profiling ~800,000 single cells. We leveraged cell types defined by gene expression to annotate these data and cataloged hundreds of thousands of candidate regulatory elements that exhibit cell type-specific chromatin accessibility. We investigated the properties of lineage-specific transcription factors (such as POU2F1 in neurons), organ-specific specializations of broadly distributed cell types (such as blood and endothelial), and cell type-specific enrichments of complex trait heritability. These data represent a rich resource for the exploration of in vivo human gene regulation in diverse tissues and cell types.
Collapse
Affiliation(s)
- Silvia Domcke
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Andrew J Hill
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Riza M Daza
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Junyue Cao
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Diana R O'Day
- Department of Pediatrics, University of Washington School of Medicine, Seattle, WA, USA
| | - Hannah A Pliner
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Kimberly A Aldinger
- Department of Pediatrics, University of Washington School of Medicine, Seattle, WA, USA
- Center for Integrative Brain Research, Seattle Children's Research Institute, Seattle, WA, USA
| | | | | | - Jennifer H Milbank
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Michael A Zager
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Center for Data Visualization, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Ian A Glass
- Department of Pediatrics, University of Washington School of Medicine, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Center for Integrative Brain Research, Seattle Children's Research Institute, Seattle, WA, USA
| | | | - Dan Doherty
- Department of Pediatrics, University of Washington School of Medicine, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Center for Integrative Brain Research, Seattle Children's Research Institute, Seattle, WA, USA
| | - Cole Trapnell
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA
| | - Darren A Cusanovich
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
- Department of Cellular and Molecular Medicine, University of Arizona, Tucson, AZ, USA
- Asthma and Airway Disease Research Center, University of Arizona, Tucson, AZ, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA
- Howard Hughes Medical Institute, Seattle, WA, USA
| |
Collapse
|
48
|
Delos Santos NP, Texari L, Benner C. MEIRLOP: improving score-based motif enrichment by incorporating sequence bias covariates. BMC Bioinformatics 2020; 21:410. [PMID: 32938397 PMCID: PMC7493370 DOI: 10.1186/s12859-020-03739-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Accepted: 09/04/2020] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Motif enrichment analysis (MEA) identifies over-represented transcription factor binding (TF) motifs in the DNA sequence of regulatory regions, enabling researchers to infer which transcription factors can regulate transcriptional response to a stimulus, or identify sequence features found near a target protein in a ChIP-seq experiment. Score-based MEA determines motifs enriched in regions exhibiting extreme differences in regulatory activity, but existing methods do not control for biases in GC content or dinucleotide composition. This lack of control for sequence bias, such as those often found in CpG islands, can obscure the enrichment of biologically relevant motifs. RESULTS We developed Motif Enrichment In Ranked Lists of Peaks (MEIRLOP), a novel MEA method that determines enrichment of TF binding motifs in a list of scored regulatory regions, while controlling for sequence bias. In this study, we compare MEIRLOP against other MEA methods in identifying binding motifs found enriched in differentially active regulatory regions after interferon-beta stimulus, finding that using logistic regression and covariates improves the ability to call enrichment of ISGF3 binding motifs from differential acetylation ChIP-seq data compared to other methods. Our method achieves similar or better performance compared to other methods when quantifying the enrichment of TF binding motifs from ENCODE TF ChIP-seq datasets. We also demonstrate how MEIRLOP is broadly applicable to the analysis of numerous types of NGS assays and experimental designs. CONCLUSIONS Our results demonstrate the importance of controlling for sequence bias when accurately identifying enriched DNA sequence motifs using score-based MEA. MEIRLOP is available for download from https://github.com/npdeloss/meirlop under the MIT license.
Collapse
Affiliation(s)
- Nathaniel P Delos Santos
- Department of Biomedical Informatics, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093-0640, USA
| | - Lorane Texari
- Department of Medicine, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093-0640, USA
| | - Christopher Benner
- Department of Medicine, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093-0640, USA.
| |
Collapse
|
49
|
Beisaw A, Kuenne C, Guenther S, Dallmann J, Wu CC, Bentsen M, Looso M, Stainier DYR. AP-1 Contributes to Chromatin Accessibility to Promote Sarcomere Disassembly and Cardiomyocyte Protrusion During Zebrafish Heart Regeneration. Circ Res 2020; 126:1760-1778. [PMID: 32312172 DOI: 10.1161/circresaha.119.316167] [Citation(s) in RCA: 77] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
RATIONALE The adult human heart is an organ with low regenerative potential. Heart failure following acute myocardial infarction is a leading cause of death due to the inability of cardiomyocytes to proliferate and replenish lost cardiac muscle. While the zebrafish has emerged as a powerful model to study endogenous cardiac regeneration, the molecular mechanisms by which cardiomyocytes respond to damage by disassembling sarcomeres, proliferating, and repopulating the injured area remain unclear. Furthermore, we are far from understanding the regulation of the chromatin landscape and epigenetic barriers that must be overcome for cardiac regeneration to occur. OBJECTIVE To identify transcription factor regulators of the chromatin landscape, which promote cardiomyocyte regeneration in zebrafish, and investigate their function. METHODS AND RESULTS Using the Assay for Transposase-Accessible Chromatin coupled to high-throughput sequencing (ATAC-Seq), we first find that the regenerating cardiomyocyte chromatin accessibility landscape undergoes extensive changes following cryoinjury, and that activator protein-1 (AP-1) binding sites are the most highly enriched motifs in regions that gain accessibility during cardiac regeneration. Furthermore, using bioinformatic and gene expression analyses, we find that the AP-1 response in regenerating adult zebrafish cardiomyocytes is largely different from the response in adult mammalian cardiomyocytes. Using a cardiomyocyte-specific dominant negative approach, we show that blocking AP-1 function leads to defects in cardiomyocyte proliferation as well as decreased chromatin accessibility at the fbxl22 and ilk loci, which regulate sarcomere disassembly and cardiomyocyte protrusion into the injured area, respectively. We further show that overexpression of the AP-1 family members Junb and Fosl1 can promote changes in mammalian cardiomyocyte behavior in vitro. CONCLUSIONS AP-1 transcription factors play an essential role in the cardiomyocyte response to injury by regulating chromatin accessibility changes, thereby allowing the activation of gene expression programs that promote cardiomyocyte dedifferentiation, proliferation, and protrusion into the injured area.
Collapse
Affiliation(s)
- Arica Beisaw
- From the Department of Developmental Genetics (A.B., J.D., C.-C.W., D.Y.R.S.), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany.,German Centre for Cardiovascular Research (DZHK) Partner Site Rhine-Main (A.B., S.G., D.Y.R.S.), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Carsten Kuenne
- ECCPS Bioinformatics and Deep Sequencing Platform (C.K., S.G., M.B., M.L.), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Stefan Guenther
- ECCPS Bioinformatics and Deep Sequencing Platform (C.K., S.G., M.B., M.L.), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany.,German Centre for Cardiovascular Research (DZHK) Partner Site Rhine-Main (A.B., S.G., D.Y.R.S.), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Julia Dallmann
- From the Department of Developmental Genetics (A.B., J.D., C.-C.W., D.Y.R.S.), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Chi-Chung Wu
- From the Department of Developmental Genetics (A.B., J.D., C.-C.W., D.Y.R.S.), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Mette Bentsen
- ECCPS Bioinformatics and Deep Sequencing Platform (C.K., S.G., M.B., M.L.), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Mario Looso
- ECCPS Bioinformatics and Deep Sequencing Platform (C.K., S.G., M.B., M.L.), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Didier Y R Stainier
- From the Department of Developmental Genetics (A.B., J.D., C.-C.W., D.Y.R.S.), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany.,German Centre for Cardiovascular Research (DZHK) Partner Site Rhine-Main (A.B., S.G., D.Y.R.S.), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| |
Collapse
|
50
|
Fostier J. BLAMM: BLAS-based algorithm for finding position weight matrix occurrences in DNA sequences on CPUs and GPUs. BMC Bioinformatics 2020; 21:81. [PMID: 32164557 PMCID: PMC7068855 DOI: 10.1186/s12859-020-3348-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND The identification of all matches of a large set of position weight matrices (PWMs) in long DNA sequences requires significant computational resources for which a number of efficient yet complex algorithms have been proposed. RESULTS We propose BLAMM, a simple and efficient tool inspired by high performance computing techniques. The workload is expressed in terms of matrix-matrix products that are evaluated with high efficiency using optimized BLAS library implementations. The algorithm is easy to parallelize and implement on CPUs and GPUs and has a runtime that is independent of the selected p-value. In terms of single-core performance, it is competitive with state-of-the-art software for PWM matching while being much more efficient when using multithreading. Additionally, BLAMM requires negligible memory. For example, both strands of the entire human genome can be scanned for 1404 PWMs in the JASPAR database in 13 min with a p-value of 10-4 using a 36-core machine. On a dual GPU system, the same task can be performed in under 5 min. CONCLUSIONS BLAMM is an efficient tool for identifying PWM matches in large DNA sequences. Its C++ source code is available under the GNU General Public License Version 3 at https://github.com/biointec/blamm.
Collapse
Affiliation(s)
- Jan Fostier
- Department of Information Technology - IDLab, Ghent University - imec, Technologiepark 126, Ghent (Zwijnaarde), B-9052, Belgium.
| |
Collapse
|