1
|
Jones BM, Webb AE, Geib SM, Sim S, Schweizer RM, Branstetter MG, Evans JD, Kocher SD. Repeated Shifts in Sociality Are Associated With Fine-tuning of Highly Conserved and Lineage-Specific Enhancers in a Socially Flexible Bee. Mol Biol Evol 2024; 41:msae229. [PMID: 39487572 PMCID: PMC11568387 DOI: 10.1093/molbev/msae229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Revised: 10/21/2024] [Accepted: 10/28/2024] [Indexed: 11/04/2024] Open
Abstract
Comparative genomic studies of social insects suggest that changes in gene regulation are associated with evolutionary transitions in social behavior, but the activity of predicted regulatory regions has not been tested empirically. We used self-transcribing active regulatory region sequencing, a high-throughput enhancer discovery tool, to identify and measure the activity of enhancers in the socially variable sweat bee, Lasioglossum albipes. We identified over 36,000 enhancers in the L. albipes genome from 3 social and 3 solitary populations. Many enhancers were identified in only a subset of L. albipes populations, revealing rapid divergence in regulatory regions within this species. Population-specific enhancers were often proximal to the same genes across populations, suggesting compensatory gains and losses of regulatory regions may preserve gene activity. We also identified 1,182 enhancers with significant differences in activity between social and solitary populations, some of which are conserved regulatory regions across species of bees. These results indicate that social trait variation in L. albipes is associated with the fine-tuning of ancient enhancers as well as lineage-specific regulatory changes. Combining enhancer activity with population genetic data revealed variants associated with differences in enhancer activity and identified a subset of differential enhancers with signatures of selection associated with social behavior. Together, these results provide the first empirical map of enhancers in a socially flexible bee and highlight links between cis-regulatory variation and the evolution of social behavior.
Collapse
Affiliation(s)
- Beryl M Jones
- Department of Ecology and Evolutionary Biology, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
- Department of Entomology, University of Kentucky, Lexington, KY 40508, USA
| | - Andrew E Webb
- Department of Ecology and Evolutionary Biology, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
- Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| | - Scott M Geib
- U.S. Department of Agriculture, Agricultural Research Service (USDA-ARS), Tropical Pest Genetics and Molecular Biology Research Unit, Hilo, HI 96720, USA
| | - Sheina Sim
- U.S. Department of Agriculture, Agricultural Research Service (USDA-ARS), Tropical Pest Genetics and Molecular Biology Research Unit, Hilo, HI 96720, USA
| | - Rena M Schweizer
- U.S. Department of Agriculture, Agricultural Research Service (USDA-ARS), Pollinating Insects Research Unit, Utah State University, Logan, UT 84322, USA
- Division of Biological Sciences, University of Montana, Missoula, MT 59812, USA
| | - Michael G Branstetter
- U.S. Department of Agriculture, Agricultural Research Service (USDA-ARS), Pollinating Insects Research Unit, Utah State University, Logan, UT 84322, USA
| | - Jay D Evans
- U.S. Department of Agriculture, Agricultural Research Service (USDA-ARS), Bee Research Laboratory BARC-E, Beltsville, MD 20705, USA
| | - Sarah D Kocher
- Department of Ecology and Evolutionary Biology, Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
- Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| |
Collapse
|
2
|
Bond ML, Quiroga-Barber IY, D’Costa S, Wu Y, Bell JL, McAfee JC, Kramer NE, Lee S, Patrucco M, Phanstiel DH, Won H. Deciphering the functional impact of Alzheimer's Disease-associated variants in resting and proinflammatory immune cells. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.09.13.24313654. [PMID: 39371155 PMCID: PMC11451667 DOI: 10.1101/2024.09.13.24313654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/08/2024]
Abstract
Genome-wide association studies have identified loci associated with Alzheimer's Disease (AD), but identifying the exact causal variants and genes at each locus is challenging due to linkage disequilibrium and their largely non-coding nature. To address this, we performed a massively parallel reporter assay of 3,576 AD-associated variants in THP-1 macrophages in both resting and proinflammatory states and identified 47 expression-modulating variants (emVars). To understand the endogenous chromatin context of emVars, we built an activity-by-contact model using epigenomic maps of macrophage inflammation and inferred condition-specific enhancer-promoter pairs. Intersection of emVars with enhancer-promoter pairs and microglia expression quantitative trait loci allowed us to connect 39 emVars to 76 putative AD risk genes enriched for AD-associated molecular signatures. Overall, systematic characterization of AD-associated variants enhances our understanding of the regulatory mechanisms underlying AD pathogenesis.
Collapse
Affiliation(s)
- Marielle L. Bond
- Curriculum in Genetics & Molecular Biology, University of North Carolina at Chapel Hill
- Thurston Arthritis Research Center, University of North Carolina at Chapel Hill
- Department of Genetics, University of North Carolina at Chapel Hill
- Neuroscience Center, University of North Carolina at Chapel Hill
| | | | - Susan D’Costa
- Thurston Arthritis Research Center, University of North Carolina at Chapel Hill
| | - Yijia Wu
- Thurston Arthritis Research Center, University of North Carolina at Chapel Hill
- Department of Genetics, University of North Carolina at Chapel Hill
- Neuroscience Center, University of North Carolina at Chapel Hill
| | - Jessica L. Bell
- Department of Genetics, University of North Carolina at Chapel Hill
- Neuroscience Center, University of North Carolina at Chapel Hill
| | - Jessica C. McAfee
- Curriculum in Genetics & Molecular Biology, University of North Carolina at Chapel Hill
- Department of Genetics, University of North Carolina at Chapel Hill
- Neuroscience Center, University of North Carolina at Chapel Hill
| | - Nicole E. Kramer
- Thurston Arthritis Research Center, University of North Carolina at Chapel Hill
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill
| | - Sool Lee
- Department of Genetics, University of North Carolina at Chapel Hill
- Neuroscience Center, University of North Carolina at Chapel Hill
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill
| | - Mary Patrucco
- Department of Genetics, University of North Carolina at Chapel Hill
- Neuroscience Center, University of North Carolina at Chapel Hill
| | - Douglas H. Phanstiel
- Thurston Arthritis Research Center, University of North Carolina at Chapel Hill
- Department of Cell Biology & Physiology, University of North Carolina at Chapel Hill
| | - Hyejung Won
- Department of Genetics, University of North Carolina at Chapel Hill
- Neuroscience Center, University of North Carolina at Chapel Hill
| |
Collapse
|
3
|
Kwait R, Pinsky ML, Gignoux‐Wolfsohn S, Eskew EA, Kerwin K, Maslo B. Impact of putatively beneficial genomic loci on gene expression in little brown bats ( Myotis lucifugus, Le Conte, 1831) affected by white-nose syndrome. Evol Appl 2024; 17:e13748. [PMID: 39310794 PMCID: PMC11413065 DOI: 10.1111/eva.13748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Revised: 06/06/2024] [Accepted: 06/19/2024] [Indexed: 09/25/2024] Open
Abstract
Genome-wide scans for selection have become a popular tool for investigating evolutionary responses in wildlife to emerging diseases. However, genome scans are susceptible to false positives and do little to demonstrate specific mechanisms by which loci impact survival. Linking putatively resistant genotypes to observable phenotypes increases confidence in genome scan results and provides evidence of survival mechanisms that can guide conservation and management efforts. Here we used an expression quantitative trait loci (eQTL) analysis to uncover relationships between gene expression and alleles associated with the survival of little brown bats (Myotis lucifugus) despite infection with the causative agent of white-nose syndrome. We found that 25 of the 63 single-nucleotide polymorphisms (SNPs) associated with survival were related to gene expression in wing tissue. The differentially expressed genes have functional annotations associated with the innate immune system, metabolism, circadian rhythms, and the cellular response to stress. In addition, we observed differential expression of multiple genes with survival implications related to loci in linkage disequilibrium with focal SNPs. Together, these findings support the selective function of these loci and suggest that part of the mechanism driving survival may be the alteration of immune and other responses in epithelial tissue.
Collapse
Affiliation(s)
- Robert Kwait
- Department of Ecology, Evolution and Natural ResourcesRutgers, The State University of New JerseyNew BrunswickNew JerseyUSA
| | - Malin L. Pinsky
- Department of Ecology, Evolution and Natural ResourcesRutgers, The State University of New JerseyNew BrunswickNew JerseyUSA
- Department of Ecology and Evolutionary BiologyUniversity of California Santa CruzSanta CruzCaliforniaUSA
| | | | - Evan A. Eskew
- Institute for Interdisciplinary Data SciencesUniversity of IdahoMoscowIdahoUSA
| | - Kathleen Kerwin
- Department of Ecology, Evolution and Natural ResourcesRutgers, The State University of New JerseyNew BrunswickNew JerseyUSA
| | - Brooke Maslo
- Department of Ecology, Evolution and Natural ResourcesRutgers, The State University of New JerseyNew BrunswickNew JerseyUSA
| |
Collapse
|
4
|
Huang C, Cheng Y, Hu Y, Fang L, Si Z, Chen J, Cao Y, Guan X, Zhang T. Dynamic patterns of gene expressional and regulatory variations in cotton heterosis. FRONTIERS IN PLANT SCIENCE 2024; 15:1450963. [PMID: 39166253 PMCID: PMC11333441 DOI: 10.3389/fpls.2024.1450963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Accepted: 07/24/2024] [Indexed: 08/22/2024]
Abstract
Purpose Although the application of heterosis has significantly increased crop yield over the past century, the mechanisms underlying this phenomenon still remain obscure. Here, we applied transcriptome sequencing to unravel the impacts of parental expression differences and transcriptomic reprogramming in cotton heterosis. Methods A high-quality transcriptomic atlas covering 15 developmental stages and tissues was constructed for XZM2, an elite hybrid of upland cotton (Gossypium hirsutum L.), and its parental lines, CRI12 and J8891. This atlas allowed us to identify gene expression differences between the parents and to characterize the transcriptomic reprogramming that occurs in the hybrid. Results Our analysis revealed abundant gene expression differences between the parents, with pronounced tissue specificity; a total of 1,112 genes exhibited single-parent expression in at least one tissue. It also illuminated transcriptomic reprogramming in the hybrid XZM2, which included both additive and non-additive expression patterns. Coexpression networks between parents and hybrid constructed via weighted gene coexpression network analysis identified modules closely associated with fiber development. In particular, key regulatory hub genes involved in fiber development showed high-parent dominant or over dominant patterns in the hybrid, potentially driving the emergence of heterosis. Finally, high-depth resequencing data was generated and allele-specific expression patterns examined in the hybrid, enabling the dissection of cis- and trans-regulation contributions to the observed expression differences. Conclusion Parental transcriptional differences and transcriptomic reprogramming in the hybrid, especially the non-additive upregulation of key genes, play an important role in shaping heterosis. Collectively, these findings provide new insights into the molecular basis of heterosis in cotton.
Collapse
Affiliation(s)
- Chujun Huang
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Yu Cheng
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Yan Hu
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
- Hainan Institute of Zhejiang University, Sanya, China
| | - Lei Fang
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
- Hainan Institute of Zhejiang University, Sanya, China
| | - Zhanfeng Si
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Jinwen Chen
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
| | - Yiwen Cao
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
- Hainan Institute of Zhejiang University, Sanya, China
| | - Xueying Guan
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
- Hainan Institute of Zhejiang University, Sanya, China
| | - Tianzhen Zhang
- Zhejiang Provincial Key Laboratory of Crop Genetic Resources, Institute of Crop Science, Plant Precision Breeding Academy, College of Agriculture and Biotechnology, Zhejiang University, Hangzhou, China
- Hainan Institute of Zhejiang University, Sanya, China
| |
Collapse
|
5
|
Hua K, Wu C, Lin C, Chen C. E2F1 promotes cell migration in hepatocellular carcinoma via FNDC3B. FEBS Open Bio 2024; 14:687-694. [PMID: 38403291 PMCID: PMC10988749 DOI: 10.1002/2211-5463.13783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 01/23/2024] [Accepted: 02/16/2024] [Indexed: 02/27/2024] Open
Abstract
FNDC3B (fibronectin type III domain containing 3B) is highly expressed in hepatocellular carcinoma (HCC) and other cancer types, and fusion genes involving FNDC3B have been identified in HCC and leukemia. Growing evidence suggests the significance of FNDC3B in tumorigenesis, particularly in cell migration and tumor metastasis. However, its regulatory mechanisms remain elusive. In this study, we employed bioinformatic, gene regulation, and protein-DNA interaction screening to investigate the transcription factors (TFs) involved in regulating FNDC3B. Initially, 338 candidate TFs were selected based on previous chromatin immunoprecipitation (ChIP)-seq experiments available in public domain databases. Through TF knockdown screening and ChIP coupled with Droplet Digital PCR assays, we identified that E2F1 (E2F transcription factor 1) is crucial for the activation of FNDC3B. Overexpression or knockdown of E2F1 significantly impacts the expression of FNDC3B. In conclusion, our study elucidated the mechanistic link between FNDC3B and E2F1. These findings contribute to a better understanding of FNDC3B in tumorigenesis and provide insights into potential therapeutic targets for cancer treatment.
Collapse
Affiliation(s)
- Kate Hua
- Cancer Progression Research CenterNational Yang Ming Chiao Tung UniversityTaipeiTaiwan
| | - Chen‐Tang Wu
- Cancer Progression Research CenterNational Yang Ming Chiao Tung UniversityTaipeiTaiwan
| | - Chin‐Hui Lin
- Cancer Progression Research CenterNational Yang Ming Chiao Tung UniversityTaipeiTaiwan
| | - Chian‐Feng Chen
- Cancer Progression Research CenterNational Yang Ming Chiao Tung UniversityTaipeiTaiwan
| |
Collapse
|
6
|
Vaknin I, Willinger O, Mandl J, Heuberger H, Ben-Ami D, Zeng Y, Goldberg S, Orenstein Y, Amit R. A universal system for boosting gene expression in eukaryotic cell-lines. Nat Commun 2024; 15:2394. [PMID: 38493141 PMCID: PMC10944472 DOI: 10.1038/s41467-024-46573-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 03/04/2024] [Indexed: 03/18/2024] Open
Abstract
We demonstrate a transcriptional regulatory design algorithm that can boost expression in yeast and mammalian cell lines. The system consists of a simplified transcriptional architecture composed of a minimal core promoter and a synthetic upstream regulatory region (sURS) composed of up to three motifs selected from a list of 41 motifs conserved in the eukaryotic lineage. The sURS system was first characterized using an oligo-library containing 189,990 variants. We validate the resultant expression model using a set of 43 unseen sURS designs. The validation sURS experiments indicate that a generic set of grammar rules for boosting and attenuation may exist in yeast cells. Finally, we demonstrate that this generic set of grammar rules functions similarly in mammalian CHO-K1 and HeLa cells. Consequently, our work provides a design algorithm for boosting the expression of promoters used for expressing industrially relevant proteins in yeast and mammalian cell lines.
Collapse
Affiliation(s)
- Inbal Vaknin
- Department of Biotechnology and Food Engineering, Technion, Haifa, Israel
| | - Or Willinger
- Department of Biotechnology and Food Engineering, Technion, Haifa, Israel
| | - Jonathan Mandl
- Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel
| | - Hadar Heuberger
- School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Dan Ben-Ami
- School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer Sheva, Israel
| | - Yi Zeng
- Department of Biotechnology and Food Engineering, Technion, Haifa, Israel
| | - Sarah Goldberg
- Department of Biotechnology and Food Engineering, Technion, Haifa, Israel
| | - Yaron Orenstein
- Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel
| | - Roee Amit
- Department of Biotechnology and Food Engineering, Technion, Haifa, Israel.
- The Russell Berrie Nanotechnology Institute, Technion, Haifa, Israel.
| |
Collapse
|
7
|
Jiang Y, Ye Y, Zhang X, Yu Y, Huang L, Bao X, Xu X. Identification and characterization of CHD4-associated eRNA as a novel modulator of fetal hemoglobin levels in β-thalassemia. Biochem Biophys Res Commun 2024; 701:149555. [PMID: 38325179 DOI: 10.1016/j.bbrc.2024.149555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 01/18/2024] [Accepted: 01/18/2024] [Indexed: 02/09/2024]
Abstract
Fetal-to-adult hemoglobin switching is controlled by programmed silencing of γ-globin while the re-activation of fetal hemoglobin (HbF) is an effective strategy for ameliorating the clinical severity of β-thalassemia and sickle cell disease. The identification of enhancer RNAs (eRNAs) related to the fetal (α2γ2) to adult hemoglobin (α2β2) switching remains incomplete. In this study, the transcriptomes of GYPA+ cells from six β-thalassemia patients with extreme HbF levels were sequenced to identify differences in patterns of noncoding RNA expression. It is interesting that an enhancer upstream of CHD4, an HbF-related core subunit of the NuRD complex, was differentially transcribed. We found a significantly positive correlation of eRNA-CHD4 enhancer-gene interaction using the public database of FANTOM5. Specifically, the eRNA-CHD4 expression was found to be significantly higher in both CD34+ HSPCs and HUDEP-2 than those in K562 cells which commonly expressed high level of HbF, suggesting a correlation between eRNA and HbF expression. Furthermore, prediction of transcription binding sites of cis-eQTLs and the CHD4 genomic region revealed a putative interaction site between rs73264846 and ZNF410, a known transcription factor regulating HbF expression. Moreover, in-vitro validation showed that the inhibition of eRNA could reduce the expression of HBG expression in HUDEP-2 cells. Taken together, the findings of this study demonstrate that a distal enhancer contributes to stage-specific silencing of γ-globin genes through direct modulation of CHD4 expression and provide insights into the epigenetic mechanisms of NuRD-mediated hemoglobin switching.
Collapse
Affiliation(s)
- Yida Jiang
- Innovation Center for Diagnostics and Treatment of Thalassemia, Nanfang Hospital, Southern Medical University, Guangzhou, Guangdong, China; Department of Medical Genetics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, Guangdong, China; Guangdong Key Chip Laboratory, Guangzhou, Guangdong, China
| | - Yuhua Ye
- Innovation Center for Diagnostics and Treatment of Thalassemia, Nanfang Hospital, Southern Medical University, Guangzhou, Guangdong, China; Department of Medical Genetics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, Guangdong, China; Guangdong Key Chip Laboratory, Guangzhou, Guangdong, China
| | - Xinhua Zhang
- Department of Hematology, 923rd Hospital of the People's Liberation Army, Nanning, Guangxi, China
| | - Yanping Yu
- Department of Pediatric, 923rd Hospital of the People's Liberation Army, Nanning, Guangxi, China
| | - Liping Huang
- Department of Pediatric, 923rd Hospital of the People's Liberation Army, Nanning, Guangxi, China
| | - Xiuqin Bao
- Medical Genetic Center, Guangdong Women and Children Hospital, Guangzhou, Guangdong, China
| | - Xiangmin Xu
- Innovation Center for Diagnostics and Treatment of Thalassemia, Nanfang Hospital, Southern Medical University, Guangzhou, Guangdong, China; Department of Medical Genetics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, Guangdong, China; Guangdong Key Chip Laboratory, Guangzhou, Guangdong, China.
| |
Collapse
|
8
|
Bai J, Wei X. Identification of teleost tnnc1a enhancers for specific pan-cardiac transcription. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.26.582099. [PMID: 38464177 PMCID: PMC10925198 DOI: 10.1101/2024.02.26.582099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Troponin C regulates muscle contraction by forming the troponin complex with troponin I and troponin T. Different muscle types express different troponin C genes. The mechanisms of such differential transcription are not fully understood. The Zebrafish tnnc1a gene is restrictively expressed in cardiac muscles. We here identify the enhancers and promoters of the zebrafish and medaka tnnc1a genes, including intronic enhancers in zebrafish and medaka and an upstream enhancer in the medaka. The intronic and upstream enhancers are likely functionally redundant. The GFP transgenic reporter driven by these enhancers is expressed more strongly in the ventricle than in the atrium, recapitulating the expression pattern of the endogenous zebrafish tnnc1a gene. Our study identifies a new set of enhancers for cardiac-specific transgenic expression in zebrafish. These enhancers can serve as tools for future identification of transcription factor networks that drive cardiac-specific gene transcription.
Collapse
|
9
|
Zhu X, Huang Q, Huang L, Luo J, Li Q, Kong D, Deng B, Gu Y, Wang X, Li C, Kong S, Zhang Y. MAE-seq refines regulatory elements across the genome. Nucleic Acids Res 2024; 52:e9. [PMID: 38038259 PMCID: PMC10810209 DOI: 10.1093/nar/gkad1129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 10/23/2023] [Accepted: 11/10/2023] [Indexed: 12/02/2023] Open
Abstract
Proper cell fate determination relies on precise spatial and temporal genome-wide cooperation between regulatory elements (REs) and their targeted genes. However, the lengths of REs defined using different methods vary, which indicates that there is sequence redundancy and that the context of the genome may be unintelligible. We developed a method called MAE-seq (Massive Active Enhancers by Sequencing) to experimentally identify functional REs at a 25-bp scale. In this study, MAE-seq was used to identify 626879, 541617 and 554826 25-bp enhancers in mouse embryonic stem cells (mESCs), C2C12 and HEK 293T, respectively. Using ∼1.6 trillion 25 bp DNA fragments and screening 12 billion cells, we identified 626879 as active enhancers in mESCs as an example. Comparative analysis revealed that most of the histone modification datasets were annotated by MAE-Seq loci. Furthermore, 33.85% (212195) of the identified enhancers were identified as de novo ones with no epigenetic modification. Intriguingly, distinct chromatin states dictate the requirement for dissimilar cofactors in governing novel and known enhancers. Validation results show that these 25-bp sequences could act as a functional unit, which shows identical or similar expression patterns as the previously defined larger elements, Enhanced resolution facilitated the identification of numerous cell-specific enhancers and their accurate annotation as super enhancers. Moreover, we characterized novel elements capable of augmenting gene activity. By integrating with high-resolution Hi-C data, over 55.64% of novel elements may have a distal association with different targeted genes. For example, we found that the Cdh1 gene interacts with one novel and two known REs in mESCs. The biological effects of these interactions were investigated using CRISPR-Cas9, revealing their role in coordinating Cdh1 gene expression and mESC proliferation. Our study presents an experimental approach to refine the REs at 25-bp resolution, advancing the precision of genome annotation and unveiling the underlying genome context. This novel approach not only advances our understanding of gene regulation but also opens avenues for comprehensive exploration of the genomic landscape.
Collapse
Affiliation(s)
- Xiusheng Zhu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Qitong Huang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
- Department of animal sciences, Wageningen University & Research, Wageningen, 6708PB, Netherlands
| | - Lei Huang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Jing Luo
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Qing Li
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Dashuai Kong
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Biao Deng
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Yi Gu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Xueyan Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Chenying Li
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Siyuan Kong
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
| | - Yubo Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, 518120, China
- Kunpeng Institute of Modern Agriculture at Foshan, Foshan, 528225, China
| |
Collapse
|
10
|
Chai K, Chen S, Wang P, Kong W, Ma X, Zhang X. Multiomics Analysis Reveals the Genetic Basis of Volatile Terpenoid Formation in Oolong Tea. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2023; 71:19888-19899. [PMID: 38048088 DOI: 10.1021/acs.jafc.3c06762] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Oolong tea has gained great popularity in China due to its pleasant floral and fruity aromas. Although numerous studies have investigated the aroma differences across various tea cultivars, the genetic mechanism is unclear. This study performed multiomics analysis of three varieties suitable for oolong tea and three others with different processing suitability. Our analysis revealed that oolong tea varieties contained higher levels of cadinane sesquiterpenoids. PanTFBS was developed to identify variants of transcription factor binding sites (TFBSs). We found that the CsDCS gene had two TFBS variants in the promoter sequence and a single nucleotide polymorphism (SNP) in the coding sequence. Integrating data on genetic variations, gene expression, and protein-binding sites indicated that CsDCS might be a pivotal gene involved in the biosynthesis of cadinane sesquiterpenoids. These findings advance our understanding of the genetic factors involved in the aroma formation of oolong tea and offer insights into the enhancement of tea aroma.
Collapse
Affiliation(s)
- Kun Chai
- College of Life Science, Center for Genomics and Biotechnology, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Shuai Chen
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - Pengjie Wang
- College of Horticulture, Northwest A&F University, Yangling 712100, Shaanxi, China
| | - Weilong Kong
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - Xiaokai Ma
- College of Life Science, Center for Genomics and Biotechnology, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Xingtan Zhang
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| |
Collapse
|
11
|
Puccio G, Ingraffia R, Giambalvo D, Frenda AS, Harkess A, Sunseri F, Mercati F. Exploring the genetic landscape of nitrogen uptake in durum wheat: genome-wide characterization and expression profiling of NPF and NRT2 gene families. FRONTIERS IN PLANT SCIENCE 2023; 14:1302337. [PMID: 38023895 PMCID: PMC10665861 DOI: 10.3389/fpls.2023.1302337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 10/25/2023] [Indexed: 12/01/2023]
Abstract
Nitrate uptake by plants primarily relies on two gene families: Nitrate transporter 1/peptide transporter (NPF) and Nitrate transporter 2 (NRT2). Here, we extensively characterized the NPF and NRT2 families in the durum wheat genome, revealing 211 NPF and 20 NRT2 genes. The two families share many Cis Regulatory Elements (CREs) and Transcription Factor binding sites, highlighting a partially overlapping regulatory system and suggesting a coordinated response for nitrate transport and utilization. Analyzing RNA-seq data from 9 tissues and 20 cultivars, we explored expression profiles and co-expression relationships of both gene families. We observed a strong correlation between nucleotide variation and gene expression within the NRT2 gene family, implicating a shared selection mechanism operating on both coding and regulatory regions. Furthermore, NPF genes showed highly tissue-specific expression profiles, while NRT2s were mainly divided in two co-expression modules, one expressed in roots (NAR2/NRT3 dependent) and the other induced in anthers and/ovaries during maturation. Our evidences confirmed that the majority of these genes were retained after small-scale duplication events, suggesting a neo- or sub-functionalization of many NPFs and NRT2s. Altogether, these findings indicate that the expansion of these gene families in durum wheat could provide valuable genetic variability useful to identify NUE-related and candidate genes for future breeding programs in the context of low-impact and sustainable agriculture.
Collapse
Affiliation(s)
- Guglielmo Puccio
- Department of Agricultural, Food and Forestry Sciences, University of Palermo, Palermo, Italy
- Institute of Biosciences and BioResources (IBBR), National Research Council, Palermo, Italy
| | - Rosolino Ingraffia
- Department of Agricultural, Food and Forestry Sciences, University of Palermo, Palermo, Italy
| | - Dario Giambalvo
- Department of Agricultural, Food and Forestry Sciences, University of Palermo, Palermo, Italy
| | - Alfonso S. Frenda
- Department of Agricultural, Food and Forestry Sciences, University of Palermo, Palermo, Italy
| | - Alex Harkess
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, United States
| | - Francesco Sunseri
- Institute of Biosciences and BioResources (IBBR), National Research Council, Palermo, Italy
- Department Agraria , University Mediterranea of Reggio Calabria, Reggio Calabria, Italy
| | - Francesco Mercati
- Institute of Biosciences and BioResources (IBBR), National Research Council, Palermo, Italy
| |
Collapse
|
12
|
Fang Z, Ford AJ, Hu T, Zhang N, Mantalaris A, Coskun AF. Subcellular spatially resolved gene neighborhood networks in single cells. CELL REPORTS METHODS 2023; 3:100476. [PMID: 37323566 PMCID: PMC10261906 DOI: 10.1016/j.crmeth.2023.100476] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Revised: 02/18/2023] [Accepted: 04/18/2023] [Indexed: 06/17/2023]
Abstract
Image-based spatial omics methods such as fluorescence in situ hybridization (FISH) generate molecular profiles of single cells at single-molecule resolution. Current spatial transcriptomics methods focus on the distribution of single genes. However, the spatial proximity of RNA transcripts can play an important role in cellular function. We demonstrate a spatially resolved gene neighborhood network (spaGNN) pipeline for the analysis of subcellular gene proximity relationships. In spaGNN, machine-learning-based clustering of subcellular spatial transcriptomics data yields subcellular density classes of multiplexed transcript features. The nearest-neighbor analysis produces heterogeneous gene proximity maps in distinct subcellular regions. We illustrate the cell-type-distinguishing capability of spaGNN using multiplexed error-robust FISH data of fibroblast and U2-OS cells and sequential FISH data of mesenchymal stem cells (MSCs), revealing tissue-source-specific MSC transcriptomics and spatial distribution characteristics. Overall, the spaGNN approach expands the spatial features that can be used for cell-type classification tasks.
Collapse
Affiliation(s)
- Zhou Fang
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
- Machine Learning Graduate Program, Georgia Institute of Technology, Atlanta, GA, USA
| | - Adam J. Ford
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Thomas Hu
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Nicholas Zhang
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
- Interdisciplinary Bioengineering Graduate Program, Georgia Institute of Technology, Atlanta, GA, USA
| | - Athanasios Mantalaris
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
| | - Ahmet F. Coskun
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
- Interdisciplinary Bioengineering Graduate Program, Georgia Institute of Technology, Atlanta, GA, USA
- Parker H. Petit Institute for Bioengineering and Bioscience, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
13
|
Quan L, Chu X, Sun X, Wu T, Lyu Q. How Deepbics Quantifies Intensities of Transcription Factor-DNA Binding and Facilitates Prediction of Single Nucleotide Variant Pathogenicity With a Deep Learning Model Trained On ChIP-Seq Data Sets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1594-1599. [PMID: 35471887 DOI: 10.1109/tcbb.2022.3170343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The binding of DNA sequences to cell type-specific transcription factors is essential for regulating gene expression in all organisms. Many variants occurring in these binding regions play crucial roles in human disease by disrupting the cis-regulation of gene expression. We first implemented a sequence-based deep learning model called deepBICS to quantify the intensity of transcription factors-DNA binding. The experimental results not only showed the superiority of deepBICS on ChIP-seq data sets but also suggested deepBICS as a language model could help the classification of disease-related and neutral variants. We then built a language model-based method called deepBICS4SNV to predict the pathogenicity of single nucleotide variants. The good performance of deepBICS4SNV on 2 tests related to Mendelian disorders and viral diseases shows the sequence contextual information derived from language models can improve prediction accuracy and generalization capability.
Collapse
|
14
|
Morova T, Ding Y, Huang CCF, Sar F, Schwarz T, Giambartolomei C, Baca S, Grishin D, Hach F, Gusev A, Freedman M, Pasaniuc B, Lack N. Optimized high-throughput screening of non-coding variants identified from genome-wide association studies. Nucleic Acids Res 2022; 51:e18. [PMID: 36546757 PMCID: PMC9943666 DOI: 10.1093/nar/gkac1198] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Revised: 11/19/2022] [Accepted: 12/06/2022] [Indexed: 12/24/2022] Open
Abstract
The vast majority of disease-associated single nucleotide polymorphisms (SNP) identified from genome-wide association studies (GWAS) are localized in non-coding regions. A significant fraction of these variants impact transcription factors binding to enhancer elements and alter gene expression. To functionally interrogate the activity of such variants we developed snpSTARRseq, a high-throughput experimental method that can interrogate the functional impact of hundreds to thousands of non-coding variants on enhancer activity. snpSTARRseq dramatically improves signal-to-noise by utilizing a novel sequencing and bioinformatic approach that increases both insert size and the number of variants tested per loci. Using this strategy, we interrogated known prostate cancer (PCa) risk-associated loci and demonstrated that 35% of them harbor SNPs that significantly altered enhancer activity. Combining these results with chromosomal looping data we could identify interacting genes and provide a mechanism of action for 20 PCa GWAS risk regions. When benchmarked to orthogonal methods, snpSTARRseq showed a strong correlation with in vivo experimental allelic-imbalance studies whereas there was no correlation with predictive in silico approaches. Overall, snpSTARRseq provides an integrated experimental and computational framework to functionally test non-coding genetic variants.
Collapse
Affiliation(s)
- Tunc Morova
- Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada
| | - Yi Ding
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | | | - Funda Sar
- Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada
| | - Tommer Schwarz
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Claudia Giambartolomei
- Central RNA Lab, Istituto Italiano di Tecnologia, Genova 16163, Italy,Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Sylvan C Baca
- Department of Medical Oncology, The Center for Functional Cancer Epigenetics, Dana Farber Cancer Institute, Boston, MA 02215, USA
| | - Dennis Grishin
- Department of Medical Oncology, The Center for Functional Cancer Epigenetics, Dana Farber Cancer Institute, Boston, MA 02215, USA
| | - Faraz Hach
- Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada,Department of Urologic Science, University of British Columbia, Vancouver, BC V5Z 1M9, Canada
| | - Alexander Gusev
- Department of Medical Oncology, The Center for Functional Cancer Epigenetics, Dana Farber Cancer Institute, Boston, MA 02215, USA,Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Matthew L Freedman
- Department of Medical Oncology, The Center for Functional Cancer Epigenetics, Dana Farber Cancer Institute, Boston, MA 02215, USA,The Center for Cancer Genome Discovery, Dana Farber Cancer Institute, Boston, MA 02215, USA
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Nathan A Lack
- To whom correspondence should be addressed. Tel: +1 604 875 4411;
| |
Collapse
|
15
|
Zhao Y. TFSyntax: a database of transcription factors binding syntax in mammalian genomes. Nucleic Acids Res 2022; 51:D306-D314. [PMID: 36200824 PMCID: PMC9825613 DOI: 10.1093/nar/gkac849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 09/10/2022] [Accepted: 09/21/2022] [Indexed: 01/29/2023] Open
Abstract
In mammals, transcriptional factors (TFs) drive gene expression by binding to regulatory elements in a cooperative manner. Deciphering the rules of such cooperation is crucial to obtain a full understanding of cellular homeostasis and development. Although this is a long-standing topic, there is no comprehensive database for biologists to access the syntax of TF binding sites. Here we present TFSyntax (https://tfsyntax.zhaopage.com), a database focusing on the arrangement of TF binding sites. TFSyntax maps the binding motif of 1299 human TFs and 890 mouse TFs across 382 cells and tissues, representing the most comprehensive TF binding map to date. In addition to location, TFSyntax defines motif positional preference, density and colocalization within accessible elements. Powered by a series of functional modules based on web interface, users can freely search, browse, analyze, and download data of interest. With comprehensive characterization of TF binding syntax across distinct tissues and cell types, TFSyntax represents a valuable resource and platform for studying the mechanism of transcriptional regulation and exploring how regulatory DNA variants cause disease.
Collapse
Affiliation(s)
- Yongbing Zhao
- To whom correspondence should be addressed. Tel: +1 301 480 5852;
| |
Collapse
|
16
|
Candido-Ferreira IL, Lukoseviciute M, Sauka-Spengler T. Multi-layered transcriptional control of cranial neural crest development. Semin Cell Dev Biol 2022; 138:1-14. [PMID: 35941042 DOI: 10.1016/j.semcdb.2022.07.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 07/23/2022] [Accepted: 07/23/2022] [Indexed: 11/28/2022]
Abstract
The neural crest (NC) is an emblematic population of embryonic stem-like cells with remarkable migratory ability. These distinctive attributes have inspired the curiosity of developmental biologists for over 150 years, however only recently the regulatory mechanisms controlling the complex features of the NC have started to become elucidated at genomic scales. Regulatory control of NC development is achieved through combinatorial transcription factor binding and recruitment of associated transcriptional complexes to distal cis-regulatory elements. Together, they regulate when, where and to what extent transcriptional programmes are actively deployed, ultimately shaping ontogenetic processes. Here, we discuss how transcriptional networks control NC ontogeny, with a special emphasis on the molecular mechanisms underlying specification of the cephalic NC. We also cover emerging properties of transcriptional regulation revealed in diverse developmental systems, such as the role of three-dimensional conformation of chromatin, and how they are involved in the regulation of NC ontogeny. Finally, we highlight how advances in deciphering the NC transcriptional network have afforded new insights into the molecular basis of human diseases.
Collapse
Affiliation(s)
- Ivan L Candido-Ferreira
- University of Oxford, MRC Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, Oxford OX3 9DS, UK
| | - Martyna Lukoseviciute
- University of Oxford, MRC Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, Oxford OX3 9DS, UK
| | - Tatjana Sauka-Spengler
- University of Oxford, MRC Weatherall Institute of Molecular Medicine, Radcliffe Department of Medicine, Oxford OX3 9DS, UK.
| |
Collapse
|
17
|
Qu J, Yang F, Zhu T, Wang Y, Fang W, Ding Y, Zhao X, Qi X, Xie Q, Chen M, Xu Q, Xie Y, Sun Y, Chen D. A reference single-cell regulomic and transcriptomic map of cynomolgus monkeys. Nat Commun 2022; 13:4069. [PMID: 35831300 PMCID: PMC9279386 DOI: 10.1038/s41467-022-31770-x] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 07/01/2022] [Indexed: 12/24/2022] Open
Abstract
Non-human primates are attractive laboratory animal models that accurately reflect both developmental and pathological features of humans. Here we present a compendium of cell types across multiple organs in cynomolgus monkeys (Macaca fascicularis) using both single-cell chromatin accessibility and RNA sequencing data. The integrated cell map enables in-depth dissection and comparison of molecular dynamics, cell-type compositions and cellular heterogeneity across multiple tissues and organs. Using single-cell transcriptomic data, we infer pseudotime cell trajectories and cell-cell communications to uncover key molecular signatures underlying their cellular processes. Furthermore, we identify various cell-specific cis-regulatory elements and construct organ-specific gene regulatory networks at the single-cell level. Finally, we perform comparative analyses of single-cell landscapes among mouse, monkey and human. We show that cynomolgus monkey has strikingly higher degree of similarities in terms of immune-associated gene expression patterns and cellular communications to human than mouse. Taken together, our study provides a valuable resource for non-human primate cell biology. Non-human primates are attractive laboratory animal models that can accurately reflect some developmental and pathological features of humans. Here the authors chart a reference cell map of cynomolgus monkeys using both scATAC-seq and scRNA-seq data across multiple organs, providing insights into the molecular dynamics and cellular heterogeneity of this organism.
Collapse
Affiliation(s)
- Jiao Qu
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, 210023, Nanjing, China
| | - Fa Yang
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, 210023, Nanjing, China
| | - Tao Zhu
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, 210023, Nanjing, China
| | - Yingshuo Wang
- The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, 310052, Hangzhou, China
| | - Wen Fang
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, 210023, Nanjing, China
| | - Yan Ding
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, 210023, Nanjing, China
| | - Xue Zhao
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, 210023, Nanjing, China
| | - Xianjia Qi
- Shanghai XuRan Biotechnology Co., Ltd., 1088 Zhongchun Road, 201109, Shanghai, China
| | - Qiangmin Xie
- The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, 310052, Hangzhou, China
| | - Ming Chen
- College of Life Sciences, Zhejiang University, 310058, Hangzhou, China
| | - Qiang Xu
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, 210023, Nanjing, China
| | - Yicheng Xie
- The Children's Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, 310052, Hangzhou, China.
| | - Yang Sun
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, 210023, Nanjing, China. .,Chemistry and Biomedicine Innovation Center (ChemBIC), Nanjing University, 210023, Nanjing, China.
| | - Dijun Chen
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, 210023, Nanjing, China.
| |
Collapse
|
18
|
Gao Y, Chen Y, Feng H, Zhang Y, Yue Z. RicENN: Prediction of Rice Enhancers with Neural Network Based on DNA Sequences. Interdiscip Sci 2022; 14:555-565. [PMID: 35190950 DOI: 10.1007/s12539-022-00503-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 01/07/2022] [Accepted: 01/18/2022] [Indexed: 01/22/2023]
Abstract
Enhancers are the primary cis-elements of transcriptional regulation and play a vital role in gene expression at different stages of plant growth and development. Having high locational variation and free scattering in non-encoding genomes, identification of enhancers is a crucial, but challenging work in understanding the biological mechanism of model plants. Recently, applications of neural network models are gaining increasing popularity in predicting the function of genomic elements. Although several computational models have shown great advantages to tackle this challenge, a further study of the identification of rice enhancers from DNA sequences is still lacking. We present RicENN, a novel deep learning framework capable of accurately identifying enhancers of rice, integrating convolution neural networks (CNNs), bi-directional recurrent neural networks (RNNs), and attention mechanisms. A combined-feature representation method was designed to extract the sequence features from original DNA sequences using six types of autocorrelation encodings. Moreover, we verified that the integrated model achieves the best performance by an ablation study. Finally, our deep learning framework realized a reliable prediction of the rice enhancers. The results show RicENN outperforms available alternative approaches in rice species, achieving the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC) of 0.960 and 0.960 on cross-validation, and 0.879 and 0.877 during independent tests, respectively. This study develops a hybrid model to combine the merits of different neural network architectures, which shows the potential ability to apply deep learning in bioinformatic sequences and contributes to the acceleration of functional genomic studies of rice. RicENN and its code are freely accessible at http://bioinfor.aielab.cc/RicENN/ .
Collapse
Affiliation(s)
- Yujia Gao
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Yiqiong Chen
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Haisong Feng
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Youhua Zhang
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China.
| | - Zhenyu Yue
- School of Information and Computer, Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China.
| |
Collapse
|
19
|
Bhogale S, Sinha S. Thermodynamics-based modeling reveals regulatory effects of indirect transcription factor-DNA binding. iScience 2022; 25:104152. [PMID: 35465052 PMCID: PMC9018382 DOI: 10.1016/j.isci.2022.104152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 12/28/2021] [Accepted: 03/21/2022] [Indexed: 11/30/2022] Open
Abstract
Transcription factors (TFs) influence gene expression by binding to DNA, yet experimental data suggests that they also frequently bind regulatory DNA indirectly by interacting with other DNA-bound proteins. Here, we used a data modeling approach to test if such indirect binding by TFs plays a significant role in gene regulation. We first incorporated regulatory function of indirectly bound TFs into a thermodynamics-based model for predicting enhancer-driven expression from its sequence. We then fit the new model to a rich data set comprising hundreds of enhancers and their regulatory activities during mesoderm specification in Drosophila embryogenesis and showed that the newly incorporated mechanism results in significantly better agreement with data. In the process, we derived the first sequence-level model of this extensively characterized regulatory program. We further showed that allowing indirect binding of a TF explains its localization at enhancers more accurately than with direct binding only. Our model also provided a simple explanation of how a TF may switch between activating and repressive roles depending on context. Inclusion of indirect DNA binding of transcription factor improves enhancer function prediction Context specific activating or repressive roles of TFs Indirect binding improves fits to experimental TF-DNA binding data Role of Tinman depends on its DNA-binding mode (direct or indirect)
Collapse
|
20
|
Quan L, Sun X, Wu J, Mei J, Huang L, He R, Nie L, Chen Y, Lyu Q. Learning Useful Representations of DNA Sequences From ChIP-Seq Datasets for Exploring Transcription Factor Binding Specificities. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:998-1008. [PMID: 32976105 DOI: 10.1109/tcbb.2020.3026787] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Deep learning has been successfully applied to surprisingly different domains. Researchers and practitioners are employing trained deep learning models to enrich our knowledge. Transcription factors (TFs)are essential for regulating gene expression in all organisms by binding to specific DNA sequences. Here, we designed a deep learning model named SemanticCS (Semantic ChIP-seq)to predict TF binding specificities. We trained our learning model on an ensemble of ChIP-seq datasets (Multi-TF-cell)to learn useful intermediate features across multiple TFs and cells. To interpret these feature vectors, visualization analysis was used. Our results indicate that these learned representations can be used to train shallow machines for other tasks. Using diverse experimental data and evaluation metrics, we show that SemanticCS outperforms other popular methods. In addition, from experimental data, SemanticCS can help to identify the substitutions that cause regulatory abnormalities and to evaluate the effect of substitutions on the binding affinity for the RXR transcription factor. The online server for SemanticCS is freely available at http://qianglab.scst.suda.edu.cn/semanticCS/.
Collapse
|
21
|
Galouzis CC, Furlong EEM. Regulating specificity in enhancer-promoter communication. Curr Opin Cell Biol 2022; 75:102065. [PMID: 35240372 DOI: 10.1016/j.ceb.2022.01.010] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Revised: 01/23/2022] [Accepted: 01/25/2022] [Indexed: 12/14/2022]
Abstract
Enhancers are cis-regulatory elements that can activate transcription remotely to regulate a specific pattern of a gene's expression. Genes typically have many enhancers that are often intermingled in the loci of other genes. To regulate expression, enhancers must therefore activate their correct promoter while ignoring others that may be in closer linear proximity. In this review, we discuss mechanisms by which enhancers engage with promoters, including recent findings on the role of cohesin and the Mediator complex, and how this specificity in enhancer-promoter communication is encoded. Genetic dissection of model loci, in addition to more recent findings using genome-wide approaches, highlight the core promoter sequence, its accessibility, cofactor-promoter preference, in addition to the surrounding genomic context, as key components.
Collapse
Affiliation(s)
| | - Eileen E M Furlong
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, D-69117, Heidelberg, Germany.
| |
Collapse
|
22
|
Datta RR, Rister J. The power of the (imperfect) palindrome: Sequence-specific roles of palindromic motifs in gene regulation. Bioessays 2022; 44:e2100191. [PMID: 35195290 PMCID: PMC8957550 DOI: 10.1002/bies.202100191] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Revised: 02/01/2022] [Accepted: 02/03/2022] [Indexed: 12/22/2022]
Abstract
In human languages, a palindrome reads the same forward as backward (e.g., 'madam'). In regulatory DNA, a palindrome is an inverted sequence repeat that allows a transcription factor to bind as a homodimer or as a heterodimer with another type of transcription factor. Regulatory palindromes are typically imperfect, that is, the repeated sequences differ in at least one base pair, but the functional significance of this asymmetry remains poorly understood. Here, we review the use of imperfect palindromes in Drosophila photoreceptor differentiation and mammalian steroid receptor signaling. Moreover, we discuss mechanistic explanations for the predominance of imperfect palindromes over perfect palindromes in these two gene regulatory contexts. Lastly, we propose to elucidate whether specific imperfectly palindromic variants have specific regulatory functions in steroid receptor signaling and whether such variants can help predict transcriptional outcomes as well as the response of individual patients to drug treatments.
Collapse
Affiliation(s)
- Rhea R Datta
- Department of Biology, Hamilton College, Clinton, New York, USA
| | - Jens Rister
- Department of Biology, University of Massachusetts Boston, Integrated Sciences Complex, Boston, Massachusetts, USA
| |
Collapse
|
23
|
Heller IS, Guenther CA, Meireles AM, Talbot WS, Kingsley DM. Characterization of mouse Bmp5 regulatory injury element in zebrafish wound models. Bone 2022; 155:116263. [PMID: 34826632 PMCID: PMC9007314 DOI: 10.1016/j.bone.2021.116263] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 11/17/2021] [Accepted: 11/18/2021] [Indexed: 11/21/2022]
Abstract
Many key signaling molecules used to build tissues during embryonic development are re-activated at injury sites to stimulate tissue regeneration and repair. Bone morphogenetic proteins provide a classic example, but the mechanisms that lead to reactivation of BMPs following injury are still unknown. Previous studies have mapped a large "injury response element" (IRE) in the mouse Bmp5 gene that drives gene expression following bone fractures and other types of injury. Here we show that the large mouse IRE region is also activated in both zebrafish tail resection and mechanosensory hair cell injury models. Using the ability to test multiple constructs and image temporal and spatial dynamics following injury responses, we have narrowed the original size of the mouse IRE region by over 100 fold and identified a small 142 bp minimal enhancer that is rapidly induced in both mesenchymal and epithelial tissues after injury. These studies identify a small sequence that responds to evolutionarily conserved local signals in wounded tissues and suggest candidate pathways that contribute to BMP reactivation after injury.
Collapse
Affiliation(s)
- Ian S Heller
- Department of Developmental Biology, Stanford University School of Medicine, United States of America
| | - Catherine A Guenther
- Department of Developmental Biology, Stanford University School of Medicine, United States of America; Howard Hughes Medical Institute, Stanford University School of Medicine, United States of America
| | - Ana M Meireles
- Department of Developmental Biology, Stanford University School of Medicine, United States of America
| | - William S Talbot
- Department of Developmental Biology, Stanford University School of Medicine, United States of America
| | - David M Kingsley
- Department of Developmental Biology, Stanford University School of Medicine, United States of America; Howard Hughes Medical Institute, Stanford University School of Medicine, United States of America.
| |
Collapse
|
24
|
Trading bits in the readout from a genetic network. Proc Natl Acad Sci U S A 2021; 118:2109011118. [PMID: 34772813 DOI: 10.1073/pnas.2109011118] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/04/2021] [Indexed: 11/18/2022] Open
Abstract
In the regulation of gene expression, information of relevance to the organism is represented by the concentrations of transcription factor molecules. To extract this information the cell must effectively "measure" these concentrations, but there are physical limits to the precision of these measurements. We use the gap gene network in the early fly embryo as an example of the tradeoff between the precision of concentration measurements and the transmission of relevant information. For thresholded measurements we find that lower thresholds are more important, and fine tuning is not required for near-optimal information transmission. We then consider general sensors, constrained only by a limit on their information capacity, and find that thresholded sensors can approach true information theoretic optima. The information theoretic approach allows us to identify the optimal sensor for the entire gap gene network and to argue that the physical limitations of sensing necessitate the observed multiplicity of enhancer elements, with sensitivities to combinations rather than single transcription factors.
Collapse
|
25
|
Eukaryotic Genomes Show Strong Evolutionary Conservation of k-mer Composition and Correlation Contributions between Introns and Intergenic Regions. Genes (Basel) 2021; 12:genes12101571. [PMID: 34680967 PMCID: PMC8536142 DOI: 10.3390/genes12101571] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 09/24/2021] [Accepted: 09/29/2021] [Indexed: 01/22/2023] Open
Abstract
Several strongly conserved DNA sequence patterns in and between introns and intergenic regions (IIRs) consisting of short tandem repeats (STRs) with repeat lengths <3 bp have already been described in the kingdom of Animalia. In this work, we expanded the search and analysis of conserved DNA sequence patterns to a wider range of eukaryotic genomes. Our aims were to confirm the conservation of these patterns, to support the hypothesis on their functional constraints and/or the identification of unknown patterns. We pairwise compared genomic DNA sequences of genes, exons, CDS, introns and intergenic regions of 34 Embryophyta (land plants), 30 Protista and 29 Fungi using established k-mer-based (alignment-free) comparison methods. Additionally, the results were compared with values derived for Animalia in former studies. We confirmed strong correlations between the sequence structures of IIRs spanning over the entire domain of Eukaryotes. We found that the high correlations within introns, intergenic regions and between the two are a result of conserved abundancies of STRs with repeat units ≤2 bp (e.g., (AT)n). For some sequence patterns and their inverse complementary sequences, we found a violation of equal distribution on complementary DNA strands in a subset of genomes. Looking at mismatches within the identified STR patterns, we found specific preferences for certain nucleotides stable over all four phylogenetic kingdoms. We conclude that all of these conserved patterns between IIRs indicate a shared function of these sequence structures related to STRs.
Collapse
|
26
|
Liu Q, Mishra M, Saxena AS, Wu H, Qiu Y, Zhang X, You X, Ding S, Miyamoto MM. Balancing selection maintains ancient polymorphisms at conserved enhancers for the olfactory receptor genes of a Chinese marine fish. Mol Ecol 2021; 30:4023-4038. [PMID: 34107131 DOI: 10.1111/mec.16016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Revised: 05/10/2021] [Accepted: 06/01/2021] [Indexed: 12/22/2022]
Abstract
The study of balancing selection, as a selective force maintaining adaptive genetic variation in gene pools longer than expected by drift, is currently experiencing renewed interest due to the increased availability of new data, methods of analysis, and case studies. In this investigation, evidence of balancing selection operating on conserved enhancers of the olfactory receptor (OR) genes is presented for the Chinese sleeper (Bostrychus sinensis), a coastal marine fish that is emerging as a model species for evolutionary studies in the Northwest Pacific marginal seas. Coupled with tests for Gene Ontology enrichment and transcription factor binding, population genomic data allow for the identification of an OR cluster in the sleeper with a downstream flanking region containing three enhancers that are conserved with human and other fish species. Phylogenetic and population genetic analyses indicate that the enhancers are under balancing selection as evidenced by their translineage polymorphisms, excess common alleles, and increased within-group diversities. Age comparisons between the translineage polymorphisms and most recent common ancestors of neutral genealogies substantiate that the former are old, and thus, due to ancient balancing selection. The survival and reproduction of vertebrates depend on their sense of smell, and thereby, on their ORs. In addition to locus duplication and allelic variation of structural genes, this study highlights a third mechanism by which receptor diversity can be achieved for detecting and responding to the huge variety of environmental odorants (i.e., by balancing selection acting on OR gene expression through their enhancer variability).
Collapse
Affiliation(s)
- Qiaohong Liu
- Xiamen Key Laboratory of Urban Sea Ecological Conservation and Restoration, College of Ocean and Earth Sciences, Xiamen University, Xiamen, China.,Function Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology, Qingdao, China
| | - Mrinal Mishra
- Department of Biology, University of Florida, Gainesville, FL, USA
| | - Ayush S Saxena
- Department of Biology, University of Florida, Gainesville, FL, USA
| | - Haohao Wu
- Xiamen Key Laboratory of Urban Sea Ecological Conservation and Restoration, College of Ocean and Earth Sciences, Xiamen University, Xiamen, China.,Function Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology, Qingdao, China
| | - Ying Qiu
- Shenzhen Key Laboratory of Marine Genomics, Guangdong Provincial Key Laboratory of Molecular Breeding in Marine Economic Animals, BGI Academy of Sciences, BGI Marine, Shenzhen, China
| | - Xinhui Zhang
- Shenzhen Key Laboratory of Marine Genomics, Guangdong Provincial Key Laboratory of Molecular Breeding in Marine Economic Animals, BGI Academy of Sciences, BGI Marine, Shenzhen, China
| | - Xinxin You
- Shenzhen Key Laboratory of Marine Genomics, Guangdong Provincial Key Laboratory of Molecular Breeding in Marine Economic Animals, BGI Academy of Sciences, BGI Marine, Shenzhen, China
| | - Shaoxiong Ding
- Xiamen Key Laboratory of Urban Sea Ecological Conservation and Restoration, College of Ocean and Earth Sciences, Xiamen University, Xiamen, China.,Function Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology, Qingdao, China
| | | |
Collapse
|
27
|
Ni P, Su Z. Accurate prediction of cis-regulatory modules reveals a prevalent regulatory genome of humans. NAR Genom Bioinform 2021; 3:lqab052. [PMID: 34159315 PMCID: PMC8210889 DOI: 10.1093/nargab/lqab052] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 05/01/2021] [Accepted: 06/14/2021] [Indexed: 02/07/2023] Open
Abstract
cis-regulatory modules(CRMs) formed by clusters of transcription factor (TF) binding sites (TFBSs) are as important as coding sequences in specifying phenotypes of humans. It is essential to categorize all CRMs and constituent TFBSs in the genome. In contrast to most existing methods that predict CRMs in specific cell types using epigenetic marks, we predict a largely cell type agonistic but more comprehensive map of CRMs and constituent TFBSs in the gnome by integrating all available TF ChIP-seq datasets. Our method is able to partition 77.47% of genome regions covered by available 6092 datasets into a CRM candidate (CRMC) set (56.84%) and a non-CRMC set (43.16%). Intriguingly, the predicted CRMCs are under strong evolutionary constraints, while the non-CRMCs are largely selectively neutral, strongly suggesting that the CRMCs are likely cis-regulatory, while the non-CRMCs are not. Our predicted CRMs are under stronger evolutionary constraints than three state-of-the-art predictions (GeneHancer, EnhancerAtlas and ENCODE phase 3) and substantially outperform them for recalling VISTA enhancers and non-coding ClinVar variants. We estimated that the human genome might encode about 1.47M CRMs and 68M TFBSs, comprising about 55% and 22% of the genome, respectively; for both of which, we predicted 80%. Therefore, the cis-regulatory genome appears to be more prevalent than originally thought.
Collapse
Affiliation(s)
- Pengyu Ni
- Department of Bioinformatics and Genomics, the University of North Carolina at Charlotte, 9201 University City Boulevard, Charlotte, NC 28223, USA
| | - Zhengchang Su
- Department of Bioinformatics and Genomics, the University of North Carolina at Charlotte, 9201 University City Boulevard, Charlotte, NC 28223, USA
| |
Collapse
|
28
|
Prosdocimi F, de Farias ST. Life and living beings under the perspective of organic macrocodes. Biosystems 2021; 206:104445. [PMID: 34033908 DOI: 10.1016/j.biosystems.2021.104445] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 05/17/2021] [Accepted: 05/18/2021] [Indexed: 11/16/2022]
Abstract
A powerful and concise concept of life is crucial for studies aiming to understand the characteristics that emerged from an inorganic world. Among biologists, the most accepted argument define life under a top-down strategy by looking into the shared characteristics observed in all cellular organisms. This is often made highlighting (i) autonomy and (ii) evolutionary capacity as fundamental characteristics observed in all cellular organisms. Along the present work, we assume the framework of code biology considering that biology started with the emergence of the first organic code by self-organization. We reinforces that the conceptual structure of life should be reallocated from the ontology class of Matter to its sister class of Process. Along the emergence and early evolution of biological systems, biological codes changed from open systems of "naked" molecules (at the progenote era), to close, encapsulated systems (at the organismic era). Living beings appeared at the very moment when nucleic acids with coding properties became encapsulated. This led to the origin of viruses and, then, to the origin of cells. In this context, we propose that the single character that makes a clear distinction between the abiotic and the biotic world is the capacity to process organic codes. Thus, life appears with the self-assembly of a genetic code and evolves by the emergence of other overlapping codes. Once life has been clearly conceptualized, we go further to conceptualize organisms, parents, lineages, and species in terms of code biology.
Collapse
Affiliation(s)
- Francisco Prosdocimi
- Laboratório de Biologia Teórica e de Sistemas, Instituto de Bioquímica Médica Leopoldo de Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil.
| | - Sávio Torres de Farias
- Laboratório de Genética Evolutiva Paulo Leminski, Centro de Ciências Exatas e da Natureza, Universidade Federal da Paraíba, João Pessoa, Paraíba, Brazil; Network of Researchers on the Chemical Evolution of Life (NoRCEL), Leeds, LS7 3RB, UK.
| |
Collapse
|
29
|
Liu L, Zhang G, He S, Hu X. TSPTFBS: a docker image for Trans-Species Prediction of Transcription Factor Binding Sites in Plants. Bioinformatics 2021; 37:260-262. [PMID: 33416862 DOI: 10.1093/bioinformatics/btaa1100] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Revised: 12/18/2020] [Accepted: 12/29/2020] [Indexed: 01/19/2023] Open
Abstract
MOTIVATION Both the lack or limitation of experimental data of transcription factor binding sites (TFBS) in plants and the independent evolutions of plant TFs make computational approaches for identifying plant TFBSs lagging behind the relevant human researches. Observing that TFs are highly conserved among plant species, here we first employ the deep convolutional neural network (DeepCNN) to build 265 Arabidopsis TFBS prediction models based on available DAP-seq (DNA affinity purification sequencing) datasets, and then transfer them into homologous TFs in other plants. RESULTS DeepCNN not only achieves greater successes on Arabidopsis TFBS predictions when compared with gkm-SVM and MEME, but also has learned its known motif for most Arabidopsis TFs as well as cooperative TF motifs with PPI (protein-protein-interaction) evidences as its biological interpretability. Under the idea of transfer learning, trans-species prediction performances on ten TFs of other three plants of Oryza sativa, Zea mays and Glycine max demonstrate the feasibility of current strategy. AVAILABILITY AND IMPLEMENTATION The trained 265 Arabidopsis TFBS prediction models were packaged in a Docker image named TSPTFBS, which is freely available on DockerHub at https://hub.docker.com/r/vanadiummm/tsptfbs. Source code and documentation are available on GitHub at: https://github.com/liulifenyf/TSPTFBS.
Collapse
Affiliation(s)
- Lifen Liu
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, Hubei, P.R. of China
| | - Ge Zhang
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, Hubei, P.R. of China
| | - Shoupeng He
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, Hubei, P.R. of China
| | - Xuehai Hu
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, Hubei, P.R. of China
| |
Collapse
|
30
|
Kolmykov S, Yevshin I, Kulyashov M, Sharipov R, Kondrakhin Y, Makeev VJ, Kulakovskiy IV, Kel A, Kolpakov F. GTRD: an integrated view of transcription regulation. Nucleic Acids Res 2021; 49:D104-D111. [PMID: 33231677 PMCID: PMC7778956 DOI: 10.1093/nar/gkaa1057] [Citation(s) in RCA: 135] [Impact Index Per Article: 33.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/18/2020] [Accepted: 11/03/2020] [Indexed: 12/24/2022] Open
Abstract
The Gene Transcription Regulation Database (GTRD; http://gtrd.biouml.org/) contains uniformly annotated and processed NGS data related to gene transcription regulation: ChIP-seq, ChIP-exo, DNase-seq, MNase-seq, ATAC-seq and RNA-seq. With the latest release, the database has reached a new level of data integration. All cell types (cell lines and tissues) presented in the GTRD were arranged into a dictionary and linked with different ontologies (BRENDA, Cell Ontology, Uberon, Cellosaurus and Experimental Factor Ontology) and with related experiments in specialized databases on transcription regulation (FANTOM5, ENCODE and GTEx). The updated version of the GTRD provides an integrated view of transcription regulation through a dedicated web interface with advanced browsing and search capabilities, an integrated genome browser, and table reports by cell types, transcription factors, and genes of interest.
Collapse
Affiliation(s)
- Semyon Kolmykov
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation
- Federal Research Center for Information and Computational Technologies, Novosibirsk 630090, Russian Federation
- Federal Research Center Institute of Cytology and Genetics SB RAS, Novosibirsk 630090, Russian Federation
| | - Ivan Yevshin
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation
- Federal Research Center for Information and Computational Technologies, Novosibirsk 630090, Russian Federation
| | - Mikhail Kulyashov
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation
- Federal Research Center for Information and Computational Technologies, Novosibirsk 630090, Russian Federation
- Novosibirsk State University, Novosibirsk 630090, Russian Federation
| | - Ruslan Sharipov
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation
- Federal Research Center for Information and Computational Technologies, Novosibirsk 630090, Russian Federation
- Novosibirsk State University, Novosibirsk 630090, Russian Federation
| | - Yury Kondrakhin
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation
- Federal Research Center for Information and Computational Technologies, Novosibirsk 630090, Russian Federation
| | - Vsevolod J Makeev
- Vavilov Institute of General Genetics RAS, Moscow 119991, Russian Federation
- Moscow Institute of Physics and Technology (State University), Dolgoprudny 141700, Russian Federation
- NRC «Kurchatov Institute» - GOSNIIGENETIKA, Kurchatov Genomic Center, Moscow 123182, Russian Federation
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russian Federation
| | - Ivan V Kulakovskiy
- Vavilov Institute of General Genetics RAS, Moscow 119991, Russian Federation
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russian Federation
- Institute of Protein Research, Russian Academy of Sciences, Pushchino 142290, Russian Federation
| | - Alexander Kel
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation
- geneXplain GmbH, 38302 Wolfenbüttel, Germany
- Institute of Chemical Biology and Fundamental Medicine SB RAS, Novosibirsk 630090, Russian Federation
| | - Fedor Kolpakov
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation
- Federal Research Center for Information and Computational Technologies, Novosibirsk 630090, Russian Federation
| |
Collapse
|
31
|
Suzuki A, Guerrini MM, Yamamoto K. Functional genomics of autoimmune diseases. Ann Rheum Dis 2021; 80:689-697. [PMID: 33408079 DOI: 10.1136/annrheumdis-2019-216794] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 12/06/2020] [Indexed: 12/22/2022]
Abstract
For more than a decade, genome-wide association studies have been applied to autoimmune diseases and have expanded our understanding on the pathogeneses. Genetic risk factors associated with diseases and traits are essentially causative. However, elucidation of the biological mechanism of disease from genetic factors is challenging. In fact, it is difficult to identify the causal variant among multiple variants located on the same haplotype or linkage disequilibrium block and thus the responsible biological genes remain elusive. Recently, multiple studies have revealed that the majority of risk variants locate in the non-coding region of the genome and they are the most likely to regulate gene expression such as quantitative trait loci. Enhancer, promoter and long non-coding RNA appear to be the main target mechanisms of the risk variants. In this review, we discuss functional genetics to challenge these puzzles.
Collapse
Affiliation(s)
- Akari Suzuki
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
| | - Matteo Maurizio Guerrini
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
| | - Kazuhiko Yamamoto
- Laboratory for Autoimmune Diseases, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
| |
Collapse
|
32
|
Makashov AA, Myasnikova EM, Spirov AV. Fuzzy Linguistic Modeling of the Regulation of Drosophila Segmentation Genes. Biophysics (Nagoya-shi) 2021. [DOI: 10.1134/s0006350921010073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
33
|
Lee D, Shi M, Moran J, Wall M, Zhang J, Liu J, Fitzgerald D, Kyono Y, Ma L, White KP, Gerstein M. STARRPeaker: uniform processing and accurate identification of STARR-seq active regions. Genome Biol 2020; 21:298. [PMID: 33292397 PMCID: PMC7722316 DOI: 10.1186/s13059-020-02194-x] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 11/04/2020] [Indexed: 12/11/2022] Open
Abstract
STARR-seq technology has employed progressively more complex genomic libraries and increased sequencing depths. An issue with the increased complexity and depth is that the coverage in STARR-seq experiments is non-uniform, overdispersed, and often confounded by sequencing biases, such as GC content. Furthermore, STARR-seq readout is confounded by RNA secondary structure and thermodynamic stability. To address these potential confounders, we developed a negative binomial regression framework for uniformly processing STARR-seq data, called STARRPeaker. Moreover, to aid our effort, we generated whole-genome STARR-seq data from the HepG2 and K562 human cell lines and applied STARRPeaker to comprehensively and unbiasedly call enhancers in them.
Collapse
Affiliation(s)
- Donghoon Lee
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.,Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.,Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Manman Shi
- Institute for Genomics and System Biology, University of Chicago, Chicago, IL, 60637, USA.,Tempus Labs, Inc., Chicago, IL, 60654, USA
| | - Jennifer Moran
- Institute for Genomics and System Biology, University of Chicago, Chicago, IL, 60637, USA.,Tempus Labs, Inc., Chicago, IL, 60654, USA
| | - Martha Wall
- Institute for Genomics and System Biology, University of Chicago, Chicago, IL, 60637, USA.,Tempus Labs, Inc., Chicago, IL, 60654, USA
| | - Jing Zhang
- School of Information and Computer Sciences, University of California, Irvine, CA, 92697, USA
| | - Jason Liu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA
| | - Dominic Fitzgerald
- Institute for Genomics and System Biology, University of Chicago, Chicago, IL, 60637, USA
| | - Yasuhiro Kyono
- Institute for Genomics and System Biology, University of Chicago, Chicago, IL, 60637, USA.,Tempus Labs, Inc., Chicago, IL, 60654, USA
| | - Lijia Ma
- Institute for Genomics and System Biology, University of Chicago, Chicago, IL, 60637, USA.,School of Life Sciences, Westlake University, Hangzhou, 310024, Zhejiang, China
| | - Kevin P White
- Institute for Genomics and System Biology, University of Chicago, Chicago, IL, 60637, USA. .,Tempus Labs, Inc., Chicago, IL, 60654, USA.
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA. .,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA. .,Department of Computer Science, Yale University, New Haven, CT, 06520, USA. .,Department of Statistics and Data Science, Yale University, New Haven, CT, 06520, USA.
| |
Collapse
|
34
|
Chen L, Capra JA. Learning and interpreting the gene regulatory grammar in a deep learning framework. PLoS Comput Biol 2020; 16:e1008334. [PMID: 33137083 PMCID: PMC7660921 DOI: 10.1371/journal.pcbi.1008334] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Revised: 11/12/2020] [Accepted: 09/12/2020] [Indexed: 12/12/2022] Open
Abstract
Deep neural networks (DNNs) have achieved state-of-the-art performance in identifying gene regulatory sequences, but they have provided limited insight into the biology of regulatory elements due to the difficulty of interpreting the complex features they learn. Several models of how combinatorial binding of transcription factors, i.e. the regulatory grammar, drives enhancer activity have been proposed, ranging from the flexible TF billboard model to the stringent enhanceosome model. However, there is limited knowledge of the prevalence of these (or other) sequence architectures across enhancers. Here we perform several hypothesis-driven analyses to explore the ability of DNNs to learn the regulatory grammar of enhancers. We created synthetic datasets based on existing hypotheses about combinatorial transcription factor binding site (TFBS) patterns, including homotypic clusters, heterotypic clusters, and enhanceosomes, from real TF binding motifs from diverse TF families. We then trained deep residual neural networks (ResNets) to model the sequences under a range of scenarios that reflect real-world multi-label regulatory sequence prediction tasks. We developed a gradient-based unsupervised clustering method to extract the patterns learned by the ResNet models. We demonstrated that simulated regulatory grammars are best learned in the penultimate layer of the ResNets, and the proposed method can accurately retrieve the regulatory grammar even when there is heterogeneity in the enhancer categories and a large fraction of TFBS outside of the regulatory grammar. However, we also identify common scenarios where ResNets fail to learn simulated regulatory grammars. Finally, we applied the proposed method to mouse developmental enhancers and were able to identify the components of a known heterotypic TF cluster. Our results provide a framework for interpreting the regulatory rules learned by ResNets, and they demonstrate that the ability and efficiency of ResNets in learning the regulatory grammar depends on the nature of the prediction task.
Collapse
Affiliation(s)
- Ling Chen
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, United States of America
| | - John A. Capra
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, United States of America
- Vanderbilt Genetics Institute and Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States of America
- Department of Computer Science, Vanderbilt University, Nashville, TN, United States of America
| |
Collapse
|
35
|
eRNAs and Superenhancer lncRNAs Are Functional in Human Prostate Cancer. DISEASE MARKERS 2020; 2020:8847986. [PMID: 33029258 PMCID: PMC7532396 DOI: 10.1155/2020/8847986] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Revised: 07/27/2020] [Accepted: 08/14/2020] [Indexed: 01/06/2023]
Abstract
Prostate cancer (PCa) is one of the most commonly diagnosed cancers in males worldwide. lncRNAs (long noncoding RNAs) play a significant role in the occurrence and development of PCa. eRNAs (enhancer RNAs) and SE-lncRNAs (superenhancer lncRNAs) are important elements of lncRNAs, but the role of eRNAs and SE-lncRNAs in PCa remains largely unclear. In this work, we identified 681 eRNAs and 292 SE-lncRNAs that were expressed differentially in PCa using a microarray. We also found that eRNAs transcribed from active open chromatin had significantly higher expression than those from active closed chromatin, and SE-lncRNAs had a little higher expression than eRNAs. Next, we constructed a transcriptional regulation network that eRNA-related enhancer and the target genes shared the same TF-binding motifs. Further, we investigated whether CTCF played a role in mediating the transcriptional regulation network. eRNAs, especially those that regulate androgen response genes, may be candidates for prognostic biomarkers and therapy targets. Our work provides a new perspective for developing medical treatments and therapies for prostate cancer.
Collapse
|
36
|
Co-option of the lineage-specific LAVA retrotransposon in the gibbon genome. Proc Natl Acad Sci U S A 2020; 117:19328-19338. [PMID: 32690705 DOI: 10.1073/pnas.2006038117] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Co-option of transposable elements (TEs) to become part of existing or new enhancers is an important mechanism for evolution of gene regulation. However, contributions of lineage-specific TE insertions to recent regulatory adaptations remain poorly understood. Gibbons present a suitable model to study these contributions as they have evolved a lineage-specific TE called LAVA (LINE-AluSz-VNTR-Alu LIKE), which is still active in the gibbon genome. The LAVA retrotransposon is thought to have played a role in the emergence of the highly rearranged structure of the gibbon genome by disrupting transcription of cell cycle genes. In this study, we investigated whether LAVA may have also contributed to the evolution of gene regulation by adopting enhancer function. We characterized fixed and polymorphic LAVA insertions across multiple gibbons and found 96 LAVA elements overlapping enhancer chromatin states. Moreover, LAVA was enriched in multiple transcription factor binding motifs, was bound by an important transcription factor (PU.1), and was associated with higher levels of gene expression in cis We found gibbon-specific signatures of purifying/positive selection at 27 LAVA insertions. Two of these insertions were fixed in the gibbon lineage and overlapped with enhancer chromatin states, representing putative co-opted LAVA enhancers. These putative enhancers were located within genes encoding SETD2 and RAD9A, two proteins that facilitate accurate repair of DNA double-strand breaks and prevent chromosomal rearrangement mutations. Co-option of LAVA in these genes may have influenced regulation of processes that preserve genome integrity. Our findings highlight the importance of considering lineage-specific TEs in studying evolution of gene regulatory elements.
Collapse
|
37
|
Modular Organization of Cis-regulatory Control Information of Neurotransmitter Pathway Genes in Caenorhabditis elegans. Genetics 2020; 215:665-681. [PMID: 32444379 DOI: 10.1534/genetics.120.303206] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Accepted: 05/20/2020] [Indexed: 11/18/2022] Open
Abstract
We explore here the cis-regulatory logic that dictates gene expression in specific cell types in the nervous system. We focus on a set of eight genes involved in the synthesis, transport, and breakdown of three neurotransmitter systems: acetylcholine (unc-17 /VAChT, cha-1 /ChAT, cho-1 /ChT, and ace-2 /AChE), glutamate (eat-4 /VGluT), and γ-aminobutyric acid (unc-25 /GAD, unc-46 /LAMP, and unc-47 /VGAT). These genes are specifically expressed in defined subsets of cells in the nervous system. Through transgenic reporter gene assays, we find that the cellular specificity of expression of all of these genes is controlled in a modular manner through distinct cis-regulatory elements, corroborating the previously inferred piecemeal nature of specification of neurotransmitter identity. This modularity provides the mechanistic basis for the phenomenon of "phenotypic convergence," in which distinct regulatory pathways can generate similar phenotypic outcomes (i.e., the acquisition of a specific neurotransmitter identity) in different neuron classes. We also identify cases of enhancer pleiotropy, in which the same cis-regulatory element is utilized to control gene expression in distinct neuron types. We engineered a cis-regulatory allele of the vesicular acetylcholine transporter, unc-17 /VAChT, to assess the functional contribution of a "shadowed" enhancer. We observed a selective loss of unc-17 /VAChT expression in one cholinergic pharyngeal pacemaker motor neuron class and a behavioral phenotype that matches microsurgical removal of this neuron. Our analysis illustrates the value of understanding cis-regulatory information to manipulate gene expression and control animal behavior.
Collapse
|
38
|
Yevshin I, Sharipov R, Kolmykov S, Kondrakhin Y, Kolpakov F. GTRD: a database on gene transcription regulation-2019 update. Nucleic Acids Res 2020; 47:D100-D105. [PMID: 30445619 PMCID: PMC6323985 DOI: 10.1093/nar/gky1128] [Citation(s) in RCA: 152] [Impact Index Per Article: 30.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 10/26/2018] [Indexed: 01/16/2023] Open
Abstract
The current version of the Gene Transcription Regulation Database (GTRD; http://gtrd.biouml.org) contains information about: (i) transcription factor binding sites (TFBSs) and transcription coactivators identified by ChIP-seq experiments for Homo sapiens, Mus musculus, Rattus norvegicus, Danio rerio, Caenorhabditis elegans, Drosophila melanogaster, Saccharomyces cerevisiae, Schizosaccharomyces pombe and Arabidopsis thaliana; (ii) regions of open chromatin and TFBSs (DNase footprints) identified by DNase-seq; (iii) unmappable regions where TFBSs cannot be identified due to repeats; (iv) potential TFBSs for both human and mouse using position weight matrices from the HOCOMOCO database. Raw ChIP-seq and DNase-seq data were obtained from ENCODE and SRA, and uniformly processed. ChIP-seq peaks were called using four different methods: MACS, SISSRs, GEM and PICS. Moreover, peaks for the same factor and peak calling method, albeit using different experiment conditions (cell line, treatment, etc.), were merged into clusters. To reduce noise, such clusters for different peak calling methods were merged into meta-clusters; these were considered to be non-redundant TFBS sets. Moreover, extended quality control was applied to all ChIP-seq data. Web interface to access GTRD was developed using the BioUML platform. It provides browsing and displaying information, advanced search possibilities and an integrated genome browser.
Collapse
Affiliation(s)
- Ivan Yevshin
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation
| | - Ruslan Sharipov
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation.,Institute of Computational Technologies SB RAS, Novosibirsk 630090, Russian Federation.,Novosibirsk State University, Novosibirsk 630090, Russian Federation
| | - Semyon Kolmykov
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation.,Institute of Cytology and Genetics SB RAS, Novosibirsk 630090, Russian Federation
| | - Yury Kondrakhin
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation.,Institute of Computational Technologies SB RAS, Novosibirsk 630090, Russian Federation
| | - Fedor Kolpakov
- BIOSOFT.RU, LLC, Novosibirsk 630090, Russian Federation.,Institute of Computational Technologies SB RAS, Novosibirsk 630090, Russian Federation
| |
Collapse
|
39
|
The 3D Genome Shapes the Regulatory Code of Developmental Genes. J Mol Biol 2020; 432:712-723. [DOI: 10.1016/j.jmb.2019.10.017] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 10/11/2019] [Accepted: 10/24/2019] [Indexed: 02/06/2023]
|
40
|
Niu X, Yang K, Zhang G, Yang Z, Hu X. A Pretraining-Retraining Strategy of Deep Learning Improves Cell-Specific Enhancer Predictions. Front Genet 2020; 10:1305. [PMID: 31969903 PMCID: PMC6960260 DOI: 10.3389/fgene.2019.01305] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Accepted: 11/26/2019] [Indexed: 01/22/2023] Open
Abstract
Deciphering the code of cis-regulatory element (CRE) is one of the core issues of today’s biology. Enhancers are distal CREs and play significant roles in gene transcriptional regulation. Although identifications of enhancer locations across the whole genome [discriminative enhancer predictions (DEP)] is necessary, it is more important to predict in which specific cell or tissue types, they will be activated and functional [tissue-specific enhancer predictions (TSEP)]. Although existing deep learning models achieved great successes in DEP, they cannot be directly employed in TSEP because a specific cell or tissue type only has a limited number of available enhancer samples for training. Here, we first adopted a reported deep learning architecture and then developed a novel training strategy named “pretraining-retraining strategy” (PRS) for TSEP by decomposing the whole training process into two successive stages: a pretraining stage is designed to train with the whole enhancer data for performing DEP, and a retraining strategy is then designed to train with tissue-specific enhancer samples based on the trained pretraining model for making TSEP. As a result, PRS is found to be valid for DEP with an AUC of 0.922 and a GM (geometric mean) of 0.696, when testing on a larger-scale FANTOM5 enhancer dataset via a five-fold cross-validation. Interestingly, based on the trained pretraining model, a new finding is that only additional twenty epochs are needed to complete the retraining process on testing 23 specific tissues or cell lines. For TSEP tasks, PRS achieved a mean GM of 0.806 which is significantly higher than 0.528 of gkm-SVM, an existing mainstream method for CRE predictions. Notably, PRS is further proven superior to other two state-of-the-art methods: DEEP and BiRen. In summary, PRS has employed useful ideas from the domain of transfer learning and is a reliable method for TSEPs.
Collapse
Affiliation(s)
- Xiaohui Niu
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Kun Yang
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Ge Zhang
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Zhiquan Yang
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| | - Xuehai Hu
- College of Informatics, Hubei Key Laboratory of Agricultural Bioinformatics, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
41
|
Identification and Characterization of Cis-Regulatory Elements for Photoreceptor-Type-Specific Transcription in ZebraFish. Methods Mol Biol 2020; 2092:123-145. [PMID: 31786786 DOI: 10.1007/978-1-0716-0175-4_10] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/06/2022]
Abstract
Tissue-specific or cell-type-specific transcription of protein-coding genes is controlled by both trans-regulatory elements (TREs) and cis-regulatory elements (CREs). However, it is challenging to identify TREs and CREs, which are unknown for most genes. Here, we describe a protocol for identifying two types of transcription-activating CREs-core promoters and enhancers-of zebrafish photoreceptor type-specific genes. This protocol is composed of three phases: bioinformatic prediction, experimental validation, and characterization of the CREs. To better illustrate the principles and logic of this protocol, we exemplify it with the discovery of the core promoter and enhancer of the mpp5b apical polarity gene (also known as ponli), whose red, green, and blue (RGB) cone-specific transcription requires its enhancer, a member of the rainbow enhancer family. While exemplified with an RGB-cone-specific gene, this protocol is general and can be used to identify the core promoters and enhancers of other protein-coding genes.
Collapse
|
42
|
Pataskar A, Vanderlinden W, Emmerig J, Singh A, Lipfert J, Tiwari VK. Deciphering the Gene Regulatory Landscape Encoded in DNA Biophysical Features. iScience 2019; 21:638-649. [PMID: 31731201 PMCID: PMC6889597 DOI: 10.1016/j.isci.2019.10.055] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2019] [Revised: 10/20/2019] [Accepted: 10/24/2019] [Indexed: 01/24/2023] Open
Abstract
Gene regulation in higher organisms involves a sophisticated interplay between genetic and epigenetic mechanisms. Despite advances, the logic in selective usage of certain genomic regions as regulatory elements remains unclear. Here we show that the inherent biophysical properties of the DNA encode epigenetic state and the underlying regulatory potential. We find that the propeller twist (ProT) level is indicative of genomic location of the regulatory elements, their strength, the affinity landscape of transcription factors, and distribution in the nuclear 3D space. We experimentally show that ProT levels confer increased DNA flexibility and surface accessibility, and thus potentially primes usage of high ProT regions as regulatory elements. ProT levels also correlate with occurrence and phenotypic consequences of mutations. Interestingly, cell-fate switches involve a transient usage of low ProT regulatory elements. Altogether, our work provides unprecedented insights into the gene regulatory landscape encoded in the DNA biophysical features. DNA shape features encode genomic surface accessibility and flexibility High ProT is a deterministic feature of enhancers ProT levels correlate with nuclear organization of epigenetic states Cell-fate switches involve a transient usage of low ProT regulatory elements
Collapse
Affiliation(s)
- Abhijeet Pataskar
- Netherlands Cancer Institute, Amsterdam, the Netherlands; Former Address: Institute of Molecular Biology, 55128 Mainz, Germany
| | - Willem Vanderlinden
- Department of Physics and Center for NanoScience, LMU Munich, 80799 Munich, Germany
| | - Johannes Emmerig
- Department of Physics and Center for NanoScience, LMU Munich, 80799 Munich, Germany
| | - Aditi Singh
- Wellcome-Wolfson Institute for Experimental Medicine, School of Medicine, Dentistry & Biomedical Science, Queens University Belfast, Belfast BT9 7BL, UK
| | - Jan Lipfert
- Department of Physics and Center for NanoScience, LMU Munich, 80799 Munich, Germany
| | - Vijay K Tiwari
- Wellcome-Wolfson Institute for Experimental Medicine, School of Medicine, Dentistry & Biomedical Science, Queens University Belfast, Belfast BT9 7BL, UK; Former Address: Institute of Molecular Biology, 55128 Mainz, Germany.
| |
Collapse
|
43
|
Lineage specific conservation of cis-regulatory elements in Cytokinin Response Factors. Sci Rep 2019; 9:13387. [PMID: 31527685 PMCID: PMC6746799 DOI: 10.1038/s41598-019-49741-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2019] [Accepted: 08/31/2019] [Indexed: 01/17/2023] Open
Abstract
Expression patterns of genes are controlled by short regions of DNA in promoter regions known as cis-regulatory elements. How expression patterns change due to alterations in cis-regulatory elements in the context of gene duplication are not well studied in plants. Over 300 promoter sequences from a small, well-conserved family of plant transcription factors known as Cytokinin Response Factors (CRFs) were examined for conserved motifs across several known clades present in Angiosperms. General CRF and lineage specific motifs were identified. Once identified, significantly enriched motifs were then compared to known transcription factor binding sites to elucidate potential functional roles. Additionally, presence of similar motifs shows that levels of conservation exist between different CRFs across land plants, likely occurring through processes of neo- or sub-functionalization. Furthermore, significant patterns of motif conservation are seen within and between CRF clades suggesting cis-regulatory regions have been conserved throughout CRF evolution.
Collapse
|
44
|
Broad Heterochromatic Domains Open in Gonocyte Development Prior to De Novo DNA Methylation. Dev Cell 2019; 51:21-34.e5. [PMID: 31474564 DOI: 10.1016/j.devcel.2019.07.023] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Revised: 03/28/2019] [Accepted: 07/24/2019] [Indexed: 02/03/2023]
Abstract
Facultative heterochromatin forms and reorganizes in response to external stimuli. However, how the initial establishment of such a chromatin state is regulated in cell-cycle-arrested cells remains unexplored. Mouse gonocytes are arrested male germ cells, at which stage the genome-wide DNA methylome forms. Here, we discovered transiently accessible heterochromatin domains of several megabases in size in gonocytes and named them differentially accessible domains (DADs). Open DADs formed in gene desert and gene cluster regions, primarily at transposons, with the reprogramming of histone marks, suggesting DADs as facultative heterochromatin. De novo DNA methylation took place with two waves in gonocytes: the first region specific and the second genome-wide. DADs were resistant to the first wave and their opening preceded the second wave. In addition, the higher-order chromosome architecture was reorganized with less defined chromosome compartments in gonocytes. These findings suggest that multiple layers of chromatin reprogramming facilitate de novo DNA methylation.
Collapse
|
45
|
Vuilleumier R, Lian T, Flibotte S, Khan ZN, Fuchs A, Pyrowolakis G, Allan DW. Retrograde BMP signaling activates neuronal gene expression through widespread deployment of a conserved BMP-responsive cis-regulatory activation element. Nucleic Acids Res 2019; 47:679-699. [PMID: 30476189 PMCID: PMC6344883 DOI: 10.1093/nar/gky1135] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Accepted: 10/25/2018] [Indexed: 12/29/2022] Open
Abstract
Retrograde Bone Morphogenetic Protein (BMP) signaling in neurons is essential for the differentiation and synaptic function of many neuronal subtypes. BMP signaling regulates these processes via Smad transcription factor activity, yet the scope and nature of Smad-dependent gene regulation in neurons are mostly unknown. Here, we applied a computational approach to predict Smad-binding cis-regulatory BMP-Activating Elements (BMP-AEs) in Drosophila, followed by transgenic in vivo reporter analysis to test their neuronal subtype enhancer activity in the larval central nervous system (CNS). We identified 34 BMP-AE-containing genomic fragments that are responsive to BMP signaling in neurons, and showed that the embedded BMP-AEs are required for this activity. RNA-seq analysis identified BMP-responsive genes in the CNS and revealed that BMP-AEs selectively enrich near BMP-activated genes. These data suggest that functional BMP-AEs control nearby BMP-activated genes, which we validated experimentally. Finally, we demonstrated that the BMP-AE motif mediates a conserved Smad-responsive function in the Drosophila and vertebrate CNS. Our results provide evidence that BMP signaling controls neuronal function by directly coordinating the expression of a battery of genes through widespread deployment of a conserved Smad-responsive cis-regulatory motif.
Collapse
Affiliation(s)
- Robin Vuilleumier
- Department of Cellular and Physiological Sciences, University of British Columbia, Vancouver, British Columbia, Canada
| | - Tianshun Lian
- Department of Cellular and Physiological Sciences, University of British Columbia, Vancouver, British Columbia, Canada
| | - Stephane Flibotte
- Department of Cellular and Physiological Sciences, University of British Columbia, Vancouver, British Columbia, Canada
| | - Zaynah N Khan
- Department of Cellular and Physiological Sciences, University of British Columbia, Vancouver, British Columbia, Canada
| | - Alisa Fuchs
- BIOSS, Centre for Biological Signaling Studies and Institute for Biology I, Faculty of Biology, Albert-Ludwigs University of Freiburg, Freiburg, Germany.,Max-Planck Institute for Molecular Genetics, Berlin, Germany
| | - George Pyrowolakis
- BIOSS, Centre for Biological Signaling Studies and Institute for Biology I, Faculty of Biology, Albert-Ludwigs University of Freiburg, Freiburg, Germany
| | - Douglas W Allan
- Department of Cellular and Physiological Sciences, University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
46
|
Park J, Estrada J, Johnson G, Vincent BJ, Ricci-Tam C, Bragdon MDJ, Shulgina Y, Cha A, Wunderlich Z, Gunawardena J, DePace AH. Dissecting the sharp response of a canonical developmental enhancer reveals multiple sources of cooperativity. eLife 2019; 8:e41266. [PMID: 31223115 PMCID: PMC6588347 DOI: 10.7554/elife.41266] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Accepted: 03/04/2019] [Indexed: 12/19/2022] Open
Abstract
Developmental enhancers integrate graded concentrations of transcription factors (TFs) to create sharp gene expression boundaries. Here we examine the hunchback P2 (HbP2) enhancer which drives a sharp expression pattern in the Drosophila blastoderm embryo in response to the transcriptional activator Bicoid (Bcd). We systematically interrogate cis and trans factors that influence the shape and position of expression driven by HbP2, and find that the prevailing model, based on pairwise cooperative binding of Bcd to HbP2 is not adequate. We demonstrate that other proteins, such as pioneer factors, Mediator and histone modifiers influence the shape and position of the HbP2 expression pattern. Comparing our results to theory reveals how higher-order cooperativity and energy expenditure impact boundary location and sharpness. Our results emphasize that the bacterial view of transcription regulation, where pairwise interactions between regulatory proteins dominate, must be reexamined in animals, where multiple molecular mechanisms collaborate to shape the gene regulatory function.
Collapse
Affiliation(s)
- Jeehae Park
- Department of Systems BiologyHarvard Medical SchoolBostonUnited States
| | - Javier Estrada
- Department of Systems BiologyHarvard Medical SchoolBostonUnited States
| | - Gemma Johnson
- Department of Systems BiologyHarvard Medical SchoolBostonUnited States
| | - Ben J Vincent
- Department of Systems BiologyHarvard Medical SchoolBostonUnited States
| | - Chiara Ricci-Tam
- Department of Systems BiologyHarvard Medical SchoolBostonUnited States
| | - Meghan DJ Bragdon
- Department of Systems BiologyHarvard Medical SchoolBostonUnited States
| | | | - Anna Cha
- Department of Systems BiologyHarvard Medical SchoolBostonUnited States
| | - Zeba Wunderlich
- Department of Systems BiologyHarvard Medical SchoolBostonUnited States
| | | | - Angela H DePace
- Department of Systems BiologyHarvard Medical SchoolBostonUnited States
| |
Collapse
|
47
|
Coons LA, Burkholder AB, Hewitt SC, McDonnell DP, Korach KS. Decoding the Inversion Symmetry Underlying Transcription Factor DNA-Binding Specificity and Functionality in the Genome. iScience 2019; 15:552-591. [PMID: 31152742 PMCID: PMC6542189 DOI: 10.1016/j.isci.2019.04.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2018] [Revised: 11/26/2018] [Accepted: 12/04/2018] [Indexed: 12/13/2022] Open
Abstract
Understanding why a transcription factor (TF) binds to a specific DNA element in the genome and whether that binding event affects transcriptional output remains a great challenge. In this study, we demonstrate that TF binding in the genome follows inversion symmetry (IS). In addition, the specific DNA elements where TFs bind in the genome are determined by internal IS within the DNA element. These DNA-binding rules quantitatively define how TFs select the appropriate regulatory targets from a large number of similar DNA elements in the genome to elicit specific transcriptional and cellular responses. Importantly, we also demonstrate that these DNA-binding rules extend to DNA elements that do not support transcriptional activity. That is, the DNA-binding rules are obeyed, but the retention time of the TF at these non-functional DNA elements is not long enough to initiate and/or maintain transcription. We further demonstrate that IS is universal within the genome. Thus, IS is the DNA code that TFs use to interact with the genome and dictates (in conjunction with known DNA sequence constraints) which of those interactions are functionally active.
Collapse
Affiliation(s)
- Laurel A Coons
- Receptor Biology Section, Reproductive and Developmental Biology Laboratory, National Institute of Environmental Health Sciences/National Institutes of Health, 111 T.W. Alexander Dr., Research Triangle Park, NC 27709, USA; Department of Pharmacology and Cancer Biology, Duke University School of Medicine, Durham, NC 27710, USA
| | - Adam B Burkholder
- Integrative Bioinformatics, National Institute of Environmental Health Sciences/National Institutes of Health, Research Triangle Park, NC 27709, USA
| | - Sylvia C Hewitt
- Receptor Biology Section, Reproductive and Developmental Biology Laboratory, National Institute of Environmental Health Sciences/National Institutes of Health, 111 T.W. Alexander Dr., Research Triangle Park, NC 27709, USA
| | - Donald P McDonnell
- Department of Pharmacology and Cancer Biology, Duke University School of Medicine, Durham, NC 27710, USA
| | - Kenneth S Korach
- Receptor Biology Section, Reproductive and Developmental Biology Laboratory, National Institute of Environmental Health Sciences/National Institutes of Health, 111 T.W. Alexander Dr., Research Triangle Park, NC 27709, USA.
| |
Collapse
|
48
|
Wu C, Chen J, Liu Y, Hu X. Improved Prediction of Regulatory Element Using Hybrid Abelian Complexity Features with DNA Sequences. Int J Mol Sci 2019; 20:ijms20071704. [PMID: 30959806 PMCID: PMC6480087 DOI: 10.3390/ijms20071704] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Revised: 04/01/2019] [Accepted: 04/02/2019] [Indexed: 12/14/2022] Open
Abstract
Deciphering the code of cis-regulatory element (CRE) is one of the core issues of current biology. As an important category of CRE, enhancers play crucial roles in gene transcriptional regulations in a distant manner. Further, the disruption of an enhancer can cause abnormal transcription and, thus, trigger human diseases, which means that its accurate identification is currently of broad interest. Here, we introduce an innovative concept, i.e., abelian complexity function (ACF), which is a more complex extension of the classic subword complexity function, for a new coding of DNA sequences. After feature selection by an upper bound estimation and integration with DNA composition features, we developed an enhancer prediction model with hybrid abelian complexity features (HACF). Compared with existing methods, HACF shows consistently superior performance on three sources of enhancer datasets. We tested the generalization ability of HACF by scanning human chromosome 22 to validate previously reported super-enhancers. Meanwhile, we identified novel candidate enhancers which have supports from enhancer-related ENCODE ChIP-seq signals. In summary, HACF improves current enhancer prediction and may be beneficial for further prioritization of functional noncoding variants.
Collapse
Affiliation(s)
- Chengchao Wu
- College of Informatics, Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, Wuhan 430070, China.
| | - Jin Chen
- College of Science, Huazhong Agricultural University, Wuhan 430070, China.
| | - Yunxia Liu
- College of Informatics, Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, Wuhan 430070, China.
| | - Xuehai Hu
- College of Informatics, Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, Wuhan 430070, China.
| |
Collapse
|
49
|
Mejía-Guerra MK, Buckler ES. A k-mer grammar analysis to uncover maize regulatory architecture. BMC PLANT BIOLOGY 2019; 19:103. [PMID: 30876396 PMCID: PMC6419808 DOI: 10.1186/s12870-019-1693-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2018] [Accepted: 02/21/2019] [Indexed: 05/06/2023]
Abstract
BACKGROUND Only a small percentage of the genome sequence is involved in regulation of gene expression, but to biochemically identify this portion is expensive and laborious. In species like maize, with diverse intergenic regions and lots of repetitive elements, this is an especially challenging problem that limits the use of the data from one line to the other. While regulatory regions are rare, they do have characteristic chromatin contexts and sequence organization (the grammar) with which they can be identified. RESULTS We developed a computational framework to exploit this sequence arrangement. The models learn to classify regulatory regions based on sequence features - k-mers. To do this, we borrowed two approaches from the field of natural language processing: (1) "bag-of-words" which is commonly used for differentially weighting key words in tasks like sentiment analyses, and (2) a vector-space model using word2vec (vector-k-mers), that captures semantic and linguistic relationships between words. We built "bag-of-k-mers" and "vector-k-mers" models that distinguish between regulatory and non-regulatory regions with an average accuracy above 90%. Our "bag-of-k-mers" achieved higher overall accuracy, while the "vector-k-mers" models were more useful in highlighting key groups of sequences within the regulatory regions. CONCLUSIONS These models now provide powerful tools to annotate regulatory regions in other maize lines beyond the reference, at low cost and with high accuracy.
Collapse
Affiliation(s)
| | - Edward S. Buckler
- Institute for Genomic Diversity, Cornell University, 175 Biotechnology Building, Ithaca, 14853 NY USA
- USDA-ARS, Research Geneticist, USDA ARS Robert Holley Center, Ithaca, 14853 NY USA
- Department of Plant Breeding and Genetics, Cornell University, 159 Biotechnology Building, Ithaca, 14853 NY USA
| |
Collapse
|
50
|
Osman NM, Kitapci TH, Vlaho S, Wunderlich Z, Nuzhdin SV. Inference of Transcription Factor Regulation Patterns Using Gene Expression Covariation in Natural Populations of Drosophila melanogaster. Biophysics (Nagoya-shi) 2019; 63:43-51. [PMID: 30739944 DOI: 10.1134/s0006350918010128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Gene regulatory networks control the complex programs that drive development. Deciphering the connections between transcription factors (TFs) and target genes is challenging, in part because TFs bind to thousands of places in the genome but control expression through a subset of these binding events. We hypothesize that we can combine natural variation of expression levels and predictions of TF binding sites to identify TF targets. We gather RNA-seq data from 71 genetically distinct F1 Drosophila melanogaster embryos and calculate the correlations between TF and potential target genes' expression levels, which we call "regulatory strength." To separate direct and indirect TF targets, we hypothesize that direct TF targets will have a preponderance of binding sites in their upstream regions. Using 14 TFs active during embryogenesis, we find that 12 TFs showed a significant correlation between their binding strength and regulatory strength on downstream targets, and 10 TFs showed a significant correlation between the number of binding sites and the regulatory effect on target genes. The general roles, e.g. bicoid's role as an activator, and the particular interactions we observed between our TFs, e.g. twist's role as a repressor of sloppy paired and odd paired, generally coincide with the literature.
Collapse
Affiliation(s)
- Noha M Osman
- University of Southern California, Los Angeles, CA.,National Research Centre, Dokki, Giza, Egypt
| | | | - Srna Vlaho
- University of Southern California, Los Angeles, CA
| | | | - Sergey V Nuzhdin
- University of Southern California, Los Angeles, CA.,Saint Petersburg Polytechnical University, St Petersburg, Russia
| |
Collapse
|