1
|
Kindel F, Triesch S, Schlüter U, Randarevitch LA, Reichel-Deland V, Weber APM, Denton AK. Predmoter-cross-species prediction of plant promoter and enhancer regions. BIOINFORMATICS ADVANCES 2024; 4:vbae074. [PMID: 38841126 PMCID: PMC11150885 DOI: 10.1093/bioadv/vbae074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 04/10/2024] [Accepted: 05/22/2024] [Indexed: 06/07/2024]
Abstract
Motivation Identifying cis-regulatory elements (CREs) is crucial for analyzing gene regulatory networks. Next generation sequencing methods were developed to identify CREs but represent a considerable expenditure for targeted analysis of few genomic loci. Thus, predicting the outputs of these methods would significantly cut costs and time investment. Results We present Predmoter, a deep neural network that predicts base-wise Assay for Transposase Accessible Chromatin using sequencing (ATAC-seq) and histone Chromatin immunoprecipitation DNA-sequencing (ChIP-seq) read coverage for plant genomes. Predmoter uses only the DNA sequence as input. We trained our final model on 21 species for 13 of which ATAC-seq data and for 17 of which ChIP-seq data was publicly available. We evaluated our models on Arabidopsis thaliana and Oryza sativa. Our best models showed accurate predictions in peak position and pattern for ATAC- and histone ChIP-seq. Annotating putatively accessible chromatin regions provides valuable input for the identification of CREs. In conjunction with other in silico data, this can significantly reduce the search space for experimentally verifiable DNA-protein interaction pairs. Availability and implementation The source code for Predmoter is available at: https://github.com/weberlab-hhu/Predmoter. Predmoter takes a fasta file as input and outputs h5, and optionally bigWig and bedGraph files.
Collapse
Affiliation(s)
- Felicitas Kindel
- Institute of Plant Biochemistry, Math.-Nat. Faculty, Heinrich Heine University, Düsseldorf 40225, Germany
| | - Sebastian Triesch
- Institute of Plant Biochemistry, Math.-Nat. Faculty, Heinrich Heine University, Düsseldorf 40225, Germany
- Cluster of Excellence on Plant Sciences (CEPLAS), Germany
| | - Urte Schlüter
- Institute of Plant Biochemistry, Math.-Nat. Faculty, Heinrich Heine University, Düsseldorf 40225, Germany
| | - Laura Alexandra Randarevitch
- Cluster of Excellence on Plant Sciences (CEPLAS), Germany
- Institute of Population Genetics, Math.-Nat. Faculty, Heinrich Heine University, Düsseldorf 40225, Germany
| | - Vanessa Reichel-Deland
- Institute of Plant Biochemistry, Math.-Nat. Faculty, Heinrich Heine University, Düsseldorf 40225, Germany
| | - Andreas P M Weber
- Institute of Plant Biochemistry, Math.-Nat. Faculty, Heinrich Heine University, Düsseldorf 40225, Germany
- Cluster of Excellence on Plant Sciences (CEPLAS), Germany
| | - Alisandra K Denton
- Institute of Plant Biochemistry, Math.-Nat. Faculty, Heinrich Heine University, Düsseldorf 40225, Germany
- Cluster of Excellence on Plant Sciences (CEPLAS), Germany
- Valence Labs, Montréal, Québec H2S 3H1, Canada
| |
Collapse
|
2
|
Franco-Enzástiga Ú, Inturi NN, Natarajan K, Mwirigi JM, Mazhar K, Schlachetzki JC, Schumacher M, Price TJ. Epigenomic landscape of the human dorsal root ganglion: sex differences and transcriptional regulation of nociceptive genes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.27.587047. [PMID: 38586055 PMCID: PMC10996669 DOI: 10.1101/2024.03.27.587047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Gene expression is influenced by chromatin architecture via controlled access of regulatory factors to DNA. To better understand regulation of gene expression in the human dorsal root ganglion (hDRG) we used bulk and spatial transposase-accessible chromatin technology followed by sequencing (ATAC-seq). We detected a total of 3005 differentially accessible chromatin regions (DARs) between sexes using bulk ATAC-seq. DARs in female hDRG mapped mainly to the X chromosome. In males, DARs were found in autosomal genes. We also found differential transcription factor binding motifs within DARs. EGR1/3 and SP1/4 were abundant in females, and JUN, FOS and other AP-1 family members in males. With the aim of dissecting the open chromatin profile in hDRG neurons, we used spatial ATAC-seq. Consistent with our bulk ATAC-seq data, most of the DARs in female hDRG were located in X chromosome genes. Neuron cluster showed higher chromatin accessibility in GABAergic, glutamatergic, and interferon-related genes in females, and in Ca2+-signaling-related genes in males. Sex differences in open chromatin transcription factor binding sites in neuron-proximal barcodes were consistent with the bulk data, having EGR1 transcription factor activity in females and AP-1 family members in males. Accordingly, we showed higher expression of EGR1 in female hDRG compared to male with in-situ hybridization. Our findings point to epigenomic sex differences in the hDRG that likely underlie divergent transcriptional responses that determine mechanistic sex differences in pain.
Collapse
Affiliation(s)
- Úrzula Franco-Enzástiga
- Center for Advanced Pain Studies, School of Behavioral and Brain Sciences, University of Texas at Dallas, Richardson, Texas 75080
| | - Nikhil N. Inturi
- Center for Advanced Pain Studies, School of Behavioral and Brain Sciences, University of Texas at Dallas, Richardson, Texas 75080
| | - Keerthana Natarajan
- Center for Advanced Pain Studies, School of Behavioral and Brain Sciences, University of Texas at Dallas, Richardson, Texas 75080
| | - Juliet M. Mwirigi
- Center for Advanced Pain Studies, School of Behavioral and Brain Sciences, University of Texas at Dallas, Richardson, Texas 75080
| | - Khadja Mazhar
- Center for Advanced Pain Studies, School of Behavioral and Brain Sciences, University of Texas at Dallas, Richardson, Texas 75080
| | - Johannes C.M. Schlachetzki
- Department of Cellular and Molecular Medicine, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0651, USA
| | - Mark Schumacher
- Department of Anesthesia and Perioperative Care and the UCSF Pain and Addiction Research Center, University of California, San Francisco, California, 94143 USA
| | - Theodore J. Price
- Center for Advanced Pain Studies, School of Behavioral and Brain Sciences, University of Texas at Dallas, Richardson, Texas 75080
| |
Collapse
|
3
|
Stefan K, Barski A. Cis-regulatory atlas of primary human CD4+ T cells. BMC Genomics 2023; 24:253. [PMID: 37170195 PMCID: PMC10173520 DOI: 10.1186/s12864-023-09288-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Accepted: 03/31/2023] [Indexed: 05/13/2023] Open
Abstract
Cis-regulatory elements (CRE) are critical for coordinating gene expression programs that dictate cell-specific differentiation and homeostasis. Recently developed self-transcribing active regulatory region sequencing (STARR-Seq) has allowed for genome-wide annotation of functional CREs. Despite this, STARR-Seq assays are only employed in cell lines, in part, due to difficulties in delivering reporter constructs. Herein, we implemented and validated a STARR-Seq-based screen in human CD4+ T cells using a non-integrating lentiviral transduction system. Lenti-STARR-Seq is the first example of a genome-wide assay of CRE function in human primary cells, identifying thousands of functional enhancers and negative regulatory elements (NREs) in human CD4+ T cells. We find an unexpected difference in nucleosome organization between enhancers and NRE: enhancers are located between nucleosomes, whereas NRE are occupied by nucleosomes in their endogenous locations. We also describe chromatin modification, eRNA production, and transcription factor binding at both enhancers and NREs. Our findings support the idea of silencer repurposing as enhancers in alternate cell types. Collectively, these data suggest that Lenti-STARR-Seq is a successful approach for CRE screening in primary human cell types, and provides an atlas of functional CREs in human CD4+ T cells.
Collapse
Affiliation(s)
- Kurtis Stefan
- Division of Allergy & Immunology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 7028, Cincinnati, OH, 45229-3026, USA
- Medical Scientist Training Program (MSTP), University of Cincinnati College of Medicine, Cincinnati, OH, 45267, USA
| | - Artem Barski
- Division of Allergy & Immunology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 7028, Cincinnati, OH, 45229-3026, USA.
- Division of Human Genetics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, 45229-3026, USA.
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, 45267, USA.
| |
Collapse
|
4
|
Huang D, Ovcharenko I. Enhancer-silencer transitions in the human genome. Genome Res 2022; 32:437-448. [PMID: 35105669 PMCID: PMC8896465 DOI: 10.1101/gr.275992.121] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 01/27/2022] [Indexed: 11/24/2022]
Abstract
Dual-function regulatory elements (REs), acting as enhancers in some cellular contexts and as silencers in others, have been reported to facilitate the precise gene regulatory response to developmental signals in Drosophila melanogaster. However, with few isolated examples detected, dual-function REs in mammals have yet to be systematically studied. We herein investigated this class of REs in the human genome and profiled their activity across multiple cell types. Focusing on enhancer–silencer transitions specific to the development of T cells, we built an accurate deep learning classifier of REs and identified about 12,000 silencers active in primary peripheral blood T cells that act as enhancers in embryonic stem cells. Compared with regular silencers, these dual-function REs are evolving under stronger purifying selection and are enriched for mutations associated with disease phenotypes and altered gene expression. In addition, they are enriched in the loci of transcriptional regulators, such as transcription factors (TFs) and chromatin remodeling genes. Dual-function REs consist of two intertwined but largely distinct sets of binding sites bound by either activating or repressing TFs, depending on the type of RE function in a given cell line. This indicates the recruitment of different TFs for different regulatory modes and a complex DNA sequence composition of these REs with dual activating and repressive encoding. With an estimated >6% of cell type–specific human silencers acting as dual-function REs, this overlooked class of REs requires a specific investigation on how their inherent functional plasticity might be a contributing factor to human diseases.
Collapse
|
5
|
Thibodeau A, Khetan S, Eroglu A, Tewhey R, Stitzel ML, Ucar D. CoRE-ATAC: A deep learning model for the functional classification of regulatory elements from single cell and bulk ATAC-seq data. PLoS Comput Biol 2021; 17:e1009670. [PMID: 34898596 PMCID: PMC8699717 DOI: 10.1371/journal.pcbi.1009670] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Revised: 12/23/2021] [Accepted: 11/19/2021] [Indexed: 02/06/2023] Open
Abstract
Cis-Regulatory elements (cis-REs) include promoters, enhancers, and insulators that regulate gene expression programs via binding of transcription factors. ATAC-seq technology effectively identifies active cis-REs in a given cell type (including from single cells) by mapping accessible chromatin at base-pair resolution. However, these maps are not immediately useful for inferring specific functions of cis-REs. For this purpose, we developed a deep learning framework (CoRE-ATAC) with novel data encoders that integrate DNA sequence (reference or personal genotypes) with ATAC-seq cut sites and read pileups. CoRE-ATAC was trained on 4 cell types (n = 6 samples/replicates) and accurately predicted known cis-RE functions from 7 cell types (n = 40 samples) that were not used in model training (mean average precision = 0.80, mean F1 score = 0.70). CoRE-ATAC enhancer predictions from 19 human islet samples coincided with genetically modulated gain/loss of enhancer activity, which was confirmed by massively parallel reporter assays (MPRAs). Finally, CoRE-ATAC effectively inferred cis-RE function from aggregate single nucleus ATAC-seq (snATAC) data from human blood-derived immune cells that overlapped with known functional annotations in sorted immune cells, which established the efficacy of these models to study cis-RE functions of rare cells without the need for cell sorting. ATAC-seq maps from primary human cells reveal individual- and cell-specific variation in cis-RE activity. CoRE-ATAC increases the functional resolution of these maps, a critical step for studying regulatory disruptions behind diseases.
Collapse
Affiliation(s)
- Asa Thibodeau
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut, United States of America
| | - Shubham Khetan
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut, United States of America
| | - Alper Eroglu
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut, United States of America
| | - Ryan Tewhey
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | - Michael L. Stitzel
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut, United States of America
- Institute for Systems Genomics, University of Connecticut Health Center, Farmington, Connecticut, United States of America
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, Connecticut, United States of America
| | - Duygu Ucar
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut, United States of America
- Institute for Systems Genomics, University of Connecticut Health Center, Farmington, Connecticut, United States of America
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, Connecticut, United States of America
| |
Collapse
|
6
|
Li R, Li L, Xu Y, Yang J. Machine learning meets omics: applications and perspectives. Brief Bioinform 2021; 23:6425809. [PMID: 34791021 DOI: 10.1093/bib/bbab460] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 09/29/2021] [Accepted: 10/07/2021] [Indexed: 02/07/2023] Open
Abstract
The innovation of biotechnologies has allowed the accumulation of omics data at an alarming rate, thus introducing the era of 'big data'. Extracting inherent valuable knowledge from various omics data remains a daunting problem in bioinformatics. Better solutions often need some kind of more innovative methods for efficient handlings and effective results. Recent advancements in integrated analysis and computational modeling of multi-omics data helped address such needs in an increasingly harmonious manner. The development and application of machine learning have largely advanced our insights into biology and biomedicine and greatly promoted the development of therapeutic strategies, especially for precision medicine. Here, we propose a comprehensive survey and discussion on what happened, is happening and will happen when machine learning meets omics. Specifically, we describe how artificial intelligence can be applied to omics studies and review recent advancements at the interface between machine learning and the ever-widest range of omics including genomics, transcriptomics, proteomics, metabolomics, radiomics, as well as those at the single-cell resolution. We also discuss and provide a synthesis of ideas, new insights, current challenges and perspectives of machine learning in omics.
Collapse
Affiliation(s)
- Rufeng Li
- Department of Cell Biology and Genetics, School of Basic Medical Sciences, Xi'an Jiaotong University Health Science Center, Xi'an 710061, P. R. China
| | - Lixin Li
- Department of Cell Biology and Genetics, School of Basic Medical Sciences, Xi'an Jiaotong University Health Science Center, Xi'an 710061, P. R. China
| | - Yungang Xu
- School of Electronics and Information, Northwestern Polytechnical University, Xi'an, 710129, China
| | - Juan Yang
- Department of Cell Biology and Genetics, School of Basic Medical Sciences, Xi'an Jiaotong University Health Science Center, Xi'an 710061, P. R. China.,Key Laboratory of Environment and Genes Related to Diseases (Xi'an Jiaotong University), Ministry of Education of China, Xi'an 710061, P. R. China
| |
Collapse
|
7
|
Almeida N, Chung MWH, Drudi EM, Engquist EN, Hamrud E, Isaacson A, Tsang VSK, Watt FM, Spagnoli FM. Employing core regulatory circuits to define cell identity. EMBO J 2021; 40:e106785. [PMID: 33934382 PMCID: PMC8126924 DOI: 10.15252/embj.2020106785] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 02/03/2021] [Accepted: 02/04/2021] [Indexed: 12/12/2022] Open
Abstract
The interplay between extrinsic signaling and downstream gene networks controls the establishment of cell identity during development and its maintenance in adult life. Advances in next-generation sequencing and single-cell technologies have revealed additional layers of complexity in cell identity. Here, we review our current understanding of transcription factor (TF) networks as key determinants of cell identity. We discuss the concept of the core regulatory circuit as a set of TFs and interacting factors that together define the gene expression profile of the cell. We propose the core regulatory circuit as a comprehensive conceptual framework for defining cellular identity and discuss its connections to cell function in different contexts.
Collapse
Affiliation(s)
- Nathalia Almeida
- Centre for Stem Cells and Regenerative MedicineGuy’s HospitalKing’s College LondonLondonUK
| | - Matthew W H Chung
- Centre for Stem Cells and Regenerative MedicineGuy’s HospitalKing’s College LondonLondonUK
| | - Elena M Drudi
- Centre for Stem Cells and Regenerative MedicineGuy’s HospitalKing’s College LondonLondonUK
| | - Elise N Engquist
- Centre for Stem Cells and Regenerative MedicineGuy’s HospitalKing’s College LondonLondonUK
| | - Eva Hamrud
- Centre for Stem Cells and Regenerative MedicineGuy’s HospitalKing’s College LondonLondonUK
| | - Abigail Isaacson
- Centre for Stem Cells and Regenerative MedicineGuy’s HospitalKing’s College LondonLondonUK
| | - Victoria S K Tsang
- Centre for Stem Cells and Regenerative MedicineGuy’s HospitalKing’s College LondonLondonUK
| | - Fiona M Watt
- Centre for Stem Cells and Regenerative MedicineGuy’s HospitalKing’s College LondonLondonUK
| | - Francesca M Spagnoli
- Centre for Stem Cells and Regenerative MedicineGuy’s HospitalKing’s College LondonLondonUK
| |
Collapse
|
8
|
Della Chiara G, Gervasoni F, Fakiola M, Godano C, D'Oria C, Azzolin L, Bonnal RJP, Moreni G, Drufuca L, Rossetti G, Ranzani V, Bason R, De Simone M, Panariello F, Ferrari I, Fabbris T, Zanconato F, Forcato M, Romano O, Caroli J, Gruarin P, Sarnicola ML, Cordenonsi M, Bardelli A, Zucchini N, Ceretti AP, Mariani NM, Cassingena A, Sartore-Bianchi A, Testa G, Gianotti L, Opocher E, Pisati F, Tripodo C, Macino G, Siena S, Bicciato S, Piccolo S, Pagani M. Epigenomic landscape of human colorectal cancer unveils an aberrant core of pan-cancer enhancers orchestrated by YAP/TAZ. Nat Commun 2021; 12:2340. [PMID: 33879786 PMCID: PMC8058065 DOI: 10.1038/s41467-021-22544-y] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Accepted: 03/18/2021] [Indexed: 02/07/2023] Open
Abstract
Cancer is characterized by pervasive epigenetic alterations with enhancer dysfunction orchestrating the aberrant cancer transcriptional programs and transcriptional dependencies. Here, we epigenetically characterize human colorectal cancer (CRC) using de novo chromatin state discovery on a library of different patient-derived organoids. By exploring this resource, we unveil a tumor-specific deregulated enhancerome that is cancer cell-intrinsic and independent of interpatient heterogeneity. We show that the transcriptional coactivators YAP/TAZ act as key regulators of the conserved CRC gained enhancers. The same YAP/TAZ-bound enhancers display active chromatin profiles across diverse human tumors, highlighting a pan-cancer epigenetic rewiring which at single-cell level distinguishes malignant from normal cell populations. YAP/TAZ inhibition in established tumor organoids causes extensive cell death unveiling their essential role in tumor maintenance. This work indicates a common layer of YAP/TAZ-fueled enhancer reprogramming that is key for the cancer cell state and can be exploited for the development of improved therapeutic avenues. The role of epigenetic deregulation in colorectal cancer (CRC) is not fully understood yet. Here the authors use patient-derived organoids, epigenomics and single-cell RNA-seq to reveal that YAP/TAZ are key regulators that bind to active enhancers in CRC and promote tumour survival.
Collapse
Affiliation(s)
- Giulia Della Chiara
- IFOM, the FIRC Institute of Molecular Oncology, Milan, Italy.,Istituto Nazionale Genetica Molecolare INGM 'Romeo ed Enrica Invernizzi', Milan, Italy.,Human Organoid Models Integrative Center HOMIC, University of Milan, Milan, Italy.,Department of Medical Biotechnology and Translational Medicine, Università degli Studi di Milano, Milan, Italy
| | - Federica Gervasoni
- IFOM, the FIRC Institute of Molecular Oncology, Milan, Italy.,Istituto Nazionale Genetica Molecolare INGM 'Romeo ed Enrica Invernizzi', Milan, Italy.,Human Organoid Models Integrative Center HOMIC, University of Milan, Milan, Italy.,Department of Medical Biotechnology and Translational Medicine, Università degli Studi di Milano, Milan, Italy
| | - Michaela Fakiola
- IFOM, the FIRC Institute of Molecular Oncology, Milan, Italy.,Istituto Nazionale Genetica Molecolare INGM 'Romeo ed Enrica Invernizzi', Milan, Italy
| | - Chiara Godano
- Istituto Nazionale Genetica Molecolare INGM 'Romeo ed Enrica Invernizzi', Milan, Italy.,Human Organoid Models Integrative Center HOMIC, University of Milan, Milan, Italy.,Department of Clinical Sciences and Community Health, Università degli Studi di Milano, Milan, Italy
| | - Claudia D'Oria
- IFOM, the FIRC Institute of Molecular Oncology, Milan, Italy.,Istituto Nazionale Genetica Molecolare INGM 'Romeo ed Enrica Invernizzi', Milan, Italy.,Human Organoid Models Integrative Center HOMIC, University of Milan, Milan, Italy.,Department of Medical Biotechnology and Translational Medicine, Università degli Studi di Milano, Milan, Italy
| | - Luca Azzolin
- Department of Molecular Medicine, University of Padua, Padua, Italy
| | - Raoul Jean Pierre Bonnal
- IFOM, the FIRC Institute of Molecular Oncology, Milan, Italy.,Istituto Nazionale Genetica Molecolare INGM 'Romeo ed Enrica Invernizzi', Milan, Italy
| | - Giulia Moreni
- Istituto Nazionale Genetica Molecolare INGM 'Romeo ed Enrica Invernizzi', Milan, Italy.,Department of Medical Microbiology, Laboratory of Clinical Virology, Amsterdam University Medical Center, University of Amsterdam, AZ, Amsterdam, the Netherlands
| | - Lorenzo Drufuca
- IFOM, the FIRC Institute of Molecular Oncology, Milan, Italy.,Istituto Nazionale Genetica Molecolare INGM 'Romeo ed Enrica Invernizzi', Milan, Italy
| | - Grazisa Rossetti
- IFOM, the FIRC Institute of Molecular Oncology, Milan, Italy.,Istituto Nazionale Genetica Molecolare INGM 'Romeo ed Enrica Invernizzi', Milan, Italy
| | - Valeria Ranzani
- Istituto Nazionale Genetica Molecolare INGM 'Romeo ed Enrica Invernizzi', Milan, Italy
| | - Ramona Bason
- IFOM, the FIRC Institute of Molecular Oncology, Milan, Italy.,Istituto Nazionale Genetica Molecolare INGM 'Romeo ed Enrica Invernizzi', Milan, Italy.,Department of Medical Biotechnology and Translational Medicine, Università degli Studi di Milano, Milan, Italy
| | - Marco De Simone
- Istituto Nazionale Genetica Molecolare INGM 'Romeo ed Enrica Invernizzi', Milan, Italy.,Technology Center for Genomics and Bioinformatics, Department of Pathology and Laboratory Medicine, University of California, Los Angeles, CA, USA
| | - Francesco Panariello
- Department of Medical Biotechnology and Translational Medicine, Università degli Studi di Milano, Milan, Italy.,Telethon Institute of Genetics and Medicine TIGEM, Pozzuoli, Italy
| | - Ivan Ferrari
- Department of Medical Biotechnology and Translational Medicine, Università degli Studi di Milano, Milan, Italy
| | - Tanya Fabbris
- Istituto Nazionale Genetica Molecolare INGM 'Romeo ed Enrica Invernizzi', Milan, Italy
| | | | - Mattia Forcato
- Department of Life Sciences, University of Modena and Reggio Emilia, Modena, Italy
| | - Oriana Romano
- Department of Life Sciences, University of Modena and Reggio Emilia, Modena, Italy
| | - Jimmy Caroli
- Department of Life Sciences, University of Modena and Reggio Emilia, Modena, Italy
| | - Paola Gruarin
- Istituto Nazionale Genetica Molecolare INGM 'Romeo ed Enrica Invernizzi', Milan, Italy
| | - Maria Lucia Sarnicola
- Istituto Nazionale Genetica Molecolare INGM 'Romeo ed Enrica Invernizzi', Milan, Italy
| | | | - Alberto Bardelli
- Candiolo Cancer Institute, FPO - IRCCS, Candiolo (TO), Italy.,Department of Oncology, University of Torino, Candiolo (TO), Italy
| | | | | | | | - Andrea Cassingena
- Niguarda Cancer Center, Grande Ospedale Metropolitano Niguarda, Milan, Italy
| | - Andrea Sartore-Bianchi
- Niguarda Cancer Center, Grande Ospedale Metropolitano Niguarda, Milan, Italy.,Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy
| | - Giuseppe Testa
- Human Organoid Models Integrative Center HOMIC, University of Milan, Milan, Italy.,Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy.,Department of Experimental Oncology, European Institute of Oncology, IRCCS, Milan, Italy
| | - Luca Gianotti
- School of Medicine and Surgery, Milano-Bicocca University, and Department of Surgery, San Gerardo Hospital, Monza, Italy
| | - Enrico Opocher
- UO Chirurgia Epatobiliopancreatica e Digestiva Ospedale San Paolo, Milan, Italy.,Department of Health Sciences, Università degli Studi di Milano, Milan, Italy
| | | | - Claudio Tripodo
- Tumor Immunology Unit, University of Palermo, Palermo, Italy.,Tumor and Microenvironment Histopathology Unit, IFOM, FIRC Institute of Molecular Oncology, Milan, Italy
| | - Giuseppe Macino
- Department of Cellular Biotechnologies and Hematology, La Sapienza University of Rome, Rome, Italy
| | - Salvatore Siena
- Niguarda Cancer Center, Grande Ospedale Metropolitano Niguarda, Milan, Italy.,Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy
| | - Silvio Bicciato
- Department of Life Sciences, University of Modena and Reggio Emilia, Modena, Italy
| | - Stefano Piccolo
- IFOM, the FIRC Institute of Molecular Oncology, Milan, Italy. .,Department of Molecular Medicine, University of Padua, Padua, Italy.
| | - Massimiliano Pagani
- IFOM, the FIRC Institute of Molecular Oncology, Milan, Italy. .,Istituto Nazionale Genetica Molecolare INGM 'Romeo ed Enrica Invernizzi', Milan, Italy. .,Human Organoid Models Integrative Center HOMIC, University of Milan, Milan, Italy. .,Department of Medical Biotechnology and Translational Medicine, Università degli Studi di Milano, Milan, Italy.
| |
Collapse
|
9
|
Parisi C, Vashisht S, Winata CL. Fish-Ing for Enhancers in the Heart. Int J Mol Sci 2021; 22:3914. [PMID: 33920121 PMCID: PMC8069060 DOI: 10.3390/ijms22083914] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 04/07/2021] [Accepted: 04/08/2021] [Indexed: 12/19/2022] Open
Abstract
Precise control of gene expression is crucial to ensure proper development and biological functioning of an organism. Enhancers are non-coding DNA elements which play an essential role in regulating gene expression. They contain specific sequence motifs serving as binding sites for transcription factors which interact with the basal transcription machinery at their target genes. Heart development is regulated by intricate gene regulatory network ensuring precise spatiotemporal gene expression program. Mutations affecting enhancers have been shown to result in devastating forms of congenital heart defect. Therefore, identifying enhancers implicated in heart biology and understanding their mechanism is key to improve diagnosis and therapeutic options. Despite their crucial role, enhancers are poorly studied, mainly due to a lack of reliable way to identify them and determine their function. Nevertheless, recent technological advances have allowed rapid progress in enhancer discovery. Model organisms such as the zebrafish have contributed significant insights into the genetics of heart development through enabling functional analyses of genes and their regulatory elements in vivo. Here, we summarize the current state of knowledge on heart enhancers gained through studies in model organisms, discuss various approaches to discover and study their function, and finally suggest methods that could further advance research in this field.
Collapse
Affiliation(s)
- Costantino Parisi
- International Institute of Molecular and Cell Biology in Warsaw, 02-109 Warsaw, Poland; (C.P.); (S.V.)
| | - Shikha Vashisht
- International Institute of Molecular and Cell Biology in Warsaw, 02-109 Warsaw, Poland; (C.P.); (S.V.)
| | - Cecilia Lanny Winata
- International Institute of Molecular and Cell Biology in Warsaw, 02-109 Warsaw, Poland; (C.P.); (S.V.)
- Max Planck Institute for Heart and Lung Research, 61231 Bad Nauheim, Germany
| |
Collapse
|
10
|
Eicher T, Chan J, Luu H, Machiraju R, Mathé EA. Self-organizing maps with variable neighborhoods facilitate learning of chromatin accessibility signal shapes associated with regulatory elements. BMC Bioinformatics 2021; 22:35. [PMID: 33516170 PMCID: PMC7847148 DOI: 10.1186/s12859-021-03976-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Accepted: 01/21/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Assigning chromatin states genome-wide (e.g. promoters, enhancers, etc.) is commonly performed to improve functional interpretation of these states. However, computational methods to assign chromatin state suffer from the following drawbacks: they typically require data from multiple assays, which may not be practically feasible to obtain, and they depend on peak calling algorithms, which require careful parameterization and often exclude the majority of the genome. To address these drawbacks, we propose a novel learning technique built upon the Self-Organizing Map (SOM), Self-Organizing Map with Variable Neighborhoods (SOM-VN), to learn a set of representative shapes from a single, genome-wide, chromatin accessibility dataset to associate with a chromatin state assignment in which a particular RE is prevalent. These shapes can then be used to assign chromatin state using our workflow. RESULTS We validate the performance of the SOM-VN workflow on 14 different samples of varying quality, namely one assay each of A549 and GM12878 cell lines and two each of H1 and HeLa cell lines, primary B-cells, and brain, heart, and stomach tissue. We show that SOM-VN learns shapes that are (1) non-random, (2) associated with known chromatin states, (3) generalizable across sets of chromosomes, and (4) associated with magnitude and multimodality. We compare the accuracy of SOM-VN chromatin states against the Clustering Aggregation Tool (CAGT), an unsupervised method that learns chromatin accessibility signal shapes but does not associate these shapes with REs, and we show that overall precision and recall is increased when learning shapes using SOM-VN as compared to CAGT. We further compare enhancer state assignments from SOM-VN in signals above a set threshold to enhancer state assignments from Predicting Enhancers from ATAC-seq Data (PEAS), a deep learning method that assigns enhancer chromatin states to peaks. We show that the precision-recall area under the curve for the assignment of enhancer states is comparable to PEAS. CONCLUSIONS Our work shows that the SOM-VN workflow can learn relationships between REs and chromatin accessibility signal shape, which is an important step toward the goal of assigning and comparing enhancer state across multiple experiments and phenotypic states.
Collapse
Affiliation(s)
- Tara Eicher
- Department of Biomedical Informatics, The Ohio State University College of Medicine, 370 W. 9th Avenue, Columbus, OH, 43210, USA
- Department of Computer Science and Engineering, The Ohio State University College of Engineering, 2015 Neil Avenue, Columbus, OH, 43210, USA
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences, National Institute of Health, 9800 Medical Center Dr., Rockville, MD, 20892, USA
| | - Jany Chan
- Department of Biomedical Informatics, The Ohio State University College of Medicine, 370 W. 9th Avenue, Columbus, OH, 43210, USA
| | - Han Luu
- Department of Biomedical Informatics, The Ohio State University College of Medicine, 370 W. 9th Avenue, Columbus, OH, 43210, USA
| | - Raghu Machiraju
- Department of Biomedical Informatics, The Ohio State University College of Medicine, 370 W. 9th Avenue, Columbus, OH, 43210, USA.
- Department of Computer Science and Engineering, The Ohio State University College of Engineering, 2015 Neil Avenue, Columbus, OH, 43210, USA.
- Department of Pathology, The Ohio State University College of Medicine, 1645 Neil Ave, Columbus, OH, 43210, USA.
- Translational Data Analytics Institute, The Ohio State University, 1760 Neil Ave., Columbus, OH, 43210, USA.
| | - Ewy A Mathé
- Department of Biomedical Informatics, The Ohio State University College of Medicine, 370 W. 9th Avenue, Columbus, OH, 43210, USA.
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences, National Institute of Health, 9800 Medical Center Dr., Rockville, MD, 20892, USA.
| |
Collapse
|
11
|
Schreiber J, Singh R, Bilmes J, Noble WS. A pitfall for machine learning methods aiming to predict across cell types. Genome Biol 2020; 21:282. [PMID: 33213499 PMCID: PMC7678316 DOI: 10.1186/s13059-020-02177-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Accepted: 10/07/2020] [Indexed: 01/19/2023] Open
Abstract
Machine learning models that predict genomic activity are most useful when they make accurate predictions across cell types. Here, we show that when the training and test sets contain the same genomic loci, the resulting model may falsely appear to perform well by effectively memorizing the average activity associated with each locus across the training cell types. We demonstrate this phenomenon in the context of predicting gene expression and chromatin domain boundaries, and we suggest methods to diagnose and avoid the pitfall. We anticipate that, as more data becomes available, future projects will increasingly risk suffering from this issue.
Collapse
Affiliation(s)
- Jacob Schreiber
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, USA
| | - Ritambhara Singh
- Department of Genome Science, University of Washington, Seattle, USA.,Current Affiliation: Department of Computer Science, and Center for Computational Molecular Biology, Brown University, Providence, 02906, RI, United States
| | - Jeffrey Bilmes
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, USA.,Department of Electrical & Computer Engineering, University of Washington, Seattle, USA
| | - William Stafford Noble
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, USA. .,Department of Genome Science, University of Washington, Seattle, USA.
| |
Collapse
|
12
|
Babenko V, Babenko R, Orlov Y. Analyzing a putative enhancer of optic disc morphology. BMC Genet 2020; 21:73. [PMID: 33092545 PMCID: PMC7583307 DOI: 10.1186/s12863-020-00873-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Accepted: 06/23/2020] [Indexed: 01/06/2023] Open
Abstract
Background Genome-wide association studies have identified the CDC7-TGFBR3 intergenic region on chromosome 1 to be strongly associated with optic disc area size. The mechanism of its function remained unclear until new data on eQTL markers emerged from the Genotype-Tissue Expression project. The target region was found to contain a strong silencer of the distal (800 kb) Transcription Factor (TF) gene GFI1 (Growth Factor Independent Transcription Repressor 1) specifically in neuroendocrine cells (pituitary gland). GFI1 has also been reported to be involved in the development of sensory neurons and hematopoiesis. Therefore, GFI1, being a developmental gene, is likely to affect optic disc area size by altering the expression of the associated genes via long-range interactions. Results Distribution of haplotypes in the putative enhancer region has been assessed using the data on four continental supergroups generated by the 1000 Genomes Project. The East Asian (EAS) populations were shown to manifest a highly homogenous unimodal haplotype distribution pattern within the region with the major haplotype occurring with the frequency of 0.9. Another European specific haplotype was observed with the frequency of 0.21. The major haplotype appears to be involved in silencing GFI1repressor gene expression, which might be the cause of increased optic disc area characteristic of the EAS populations. The enhancer/eQTL region overlaps AluJo element, which implies that this particular regulatory element is primate-specific and confined to few tissues. Conclusion Population specific distribution of GFI1 enhancer alleles may predispose certain ethnic groups to glaucoma.
Collapse
Affiliation(s)
- Vladimir Babenko
- Institute of Cytology and Genetics, Lavrentyeva 10, Novosibirsk, 630090, Russia. .,Novosibirsk State University, Pirogova Str 2, Novosibirsk, 630090, Russia.
| | - Roman Babenko
- Institute of Cytology and Genetics, Lavrentyeva 10, Novosibirsk, 630090, Russia.,Novosibirsk State University, Pirogova Str 2, Novosibirsk, 630090, Russia
| | - Yuri Orlov
- Institute of Cytology and Genetics, Lavrentyeva 10, Novosibirsk, 630090, Russia.,Novosibirsk State University, Pirogova Str 2, Novosibirsk, 630090, Russia.,I.M. Sechenov First Moscow State Medical University of the Ministry of Health of the Russian Federation (Sechenov University), Trubetskaya 8-2, Moscow, 119991, Russia
| |
Collapse
|
13
|
Tripodi IJ, Chowdhury M, Gruca M, Dowell RD. Combining signal and sequence to detect RNA polymerase initiation in ATAC-seq data. PLoS One 2020; 15:e0232332. [PMID: 32353042 PMCID: PMC7192442 DOI: 10.1371/journal.pone.0232332] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Accepted: 04/13/2020] [Indexed: 01/12/2023] Open
Abstract
The assay for transposase-accessible chromatin followed by sequencing (ATAC-seq) is an inexpensive protocol for measuring open chromatin regions. ATAC-seq is also relatively simple and requires fewer cells than many other high-throughput sequencing protocols. Therefore, it is tractable in numerous settings where other high throughput assays are challenging to impossible. Hence it is important to understand the limits of what can be inferred from ATAC-seq data. In this work, we leverage ATAC-seq to predict the presence of nascent transcription. Nascent transcription assays are the current gold standard for identifying regions of active transcription, including markers for functional transcription factor (TF) binding. We combine mapped short reads from ATAC-seq with the underlying peak sequence, to determine regions of active transcription genome-wide. We show that a hybrid signal/sequence representation classified using recurrent neural networks (RNNs) can identify these regions across different cell types.
Collapse
Affiliation(s)
- Ignacio J. Tripodi
- Computer Science, University of Colorado, Boulder, Colorado, United States of America
- BioFrontiers Institute, University of Colorado, Boulder, Colorado, United States of America
| | - Murad Chowdhury
- Computer Science, University of Colorado, Boulder, Colorado, United States of America
| | - Margaret Gruca
- BioFrontiers Institute, University of Colorado, Boulder, Colorado, United States of America
| | - Robin D. Dowell
- Computer Science, University of Colorado, Boulder, Colorado, United States of America
- BioFrontiers Institute, University of Colorado, Boulder, Colorado, United States of America
- Molecular, Cellular and Developmental Biology, University of Colorado, Boulder, Colorado, United States of America
- * E-mail:
| |
Collapse
|
14
|
Manduchi E, Orzechowski PR, Ritchie MD, Moore JH. Exploration of a diversity of computational and statistical measures of association for genome-wide genetic studies. BioData Min 2019; 12:14. [PMID: 31320928 PMCID: PMC6617598 DOI: 10.1186/s13040-019-0201-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Accepted: 06/14/2019] [Indexed: 01/03/2023] Open
Abstract
Background The principal line of investigation in Genome Wide Association Studies (GWAS) is the identification of main effects, that is individual Single Nucleotide Polymorphisms (SNPs) which are associated with the trait of interest, independent of other factors. A variety of methods have been proposed to this end, mostly statistical in nature and differing in assumptions and type of model employed. Moreover, for a given model, there may be multiple choices for the SNP genotype encoding. As an alternative to statistical methods, machine learning methods are often applicable. Typically, for a given GWAS, a single approach is selected and utilized to identify potential SNPs of interest. Even when multiple GWAS are combined through meta-analyses within a consortium, each GWAS is typically analyzed with a single approach and the resulting summary statistics are then utilized in meta-analyses. Results In this work we use as case studies a Type 2 Diabetes (T2D) and a breast cancer GWAS to explore a diversity of applicable approaches spanning different methods and encoding choices. We assess similarity of these approaches based on the derived ranked lists of SNPs and, for each GWAS, we identify a subset of representative approaches that we use as an ensemble to derive a union list of top SNPs. Among these are SNPs which are identified by multiple approaches as well as several SNPs identified by only one or a few of the less frequently used approaches. The latter include SNPs from established loci and SNPs which have other supporting lines of evidence in terms of their potential relevance to the traits. Conclusions Not every main effect analysis method is suitable for every GWAS, but for each GWAS there are typically multiple applicable methods and encoding options. We suggest a workflow for a single GWAS, extensible to multiple GWAS from consortia, where representative approaches are selected among a pool of suitable options, to yield a more comprehensive set of SNPs, potentially including SNPs that would typically be missed with the most popular analyses, but that could provide additional valuable insights for follow-up. Electronic supplementary material The online version of this article (10.1186/s13040-019-0201-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Elisabetta Manduchi
- 1Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA USA.,2Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA USA
| | - Patryk R Orzechowski
- 1Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA USA.,2Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA USA
| | - Marylyn D Ritchie
- 1Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA USA.,3Department of Genetics, University of Pennsylvania, Philadelphia, PA USA
| | - Jason H Moore
- 1Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA USA.,2Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA USA
| |
Collapse
|