1
|
Li Z, Zhang Y, Peng B, Qin S, Zhang Q, Chen Y, Chen C, Bao Y, Zhu Y, Hong Y, Liu B, Liu Q, Xu L, Chen X, Ma X, Wang H, Xie L, Yao Y, Deng B, Li J, De B, Chen Y, Wang J, Li T, Liu R, Tang Z, Cao J, Zuo E, Mei C, Zhu F, Shao C, Wang G, Sun T, Wang N, Liu G, Ni JQ, Liu Y. A novel interpretable deep learning-based computational framework designed synthetic enhancers with broad cross-species activity. Nucleic Acids Res 2024; 52:13447-13468. [PMID: 39420601 PMCID: PMC11602155 DOI: 10.1093/nar/gkae912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 09/25/2024] [Accepted: 10/03/2024] [Indexed: 10/19/2024] Open
Abstract
Enhancers play a critical role in dynamically regulating spatial-temporal gene expression and establishing cell identity, underscoring the significance of designing them with specific properties for applications in biosynthetic engineering and gene therapy. Despite numerous high-throughput methods facilitating genome-wide enhancer identification, deciphering the sequence determinants of their activity remains challenging. Here, we present the DREAM (DNA cis-Regulatory Elements with controllable Activity design platforM) framework, a novel deep learning-based approach for synthetic enhancer design. Proficient in uncovering subtle and intricate patterns within extensive enhancer screening data, DREAM achieves cutting-edge sequence-based enhancer activity prediction and highlights critical sequence features implicating strong enhancer activity. Leveraging DREAM, we have engineered enhancers that surpass the potency of the strongest enhancer within the Drosophila genome by approximately 3.6-fold. Remarkably, these synthetic enhancers exhibited conserved functionality across species that have diverged more than billion years, indicating that DREAM was able to learn highly conserved enhancer regulatory grammar. Additionally, we designed silencers and cell line-specific enhancers using DREAM, demonstrating its versatility. Overall, our study not only introduces an interpretable approach for enhancer design but also lays out a general framework applicable to the design of other types of cis-regulatory elements.
Collapse
Affiliation(s)
- Zhaohong Li
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Yuanyuan Zhang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Bo Peng
- Gene Regulatory Lab, School of Basic Medical Sciences, Tsinghua University, NO. 30 Shuangqing road, Haidian district, Beijing 100084, China
- State Key Laboratory of Molecular Oncology, Tsinghua University, NO. 30 Shuangqing road, Haidian district, Beijing 100084, China
| | - Shenghua Qin
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Qian Zhang
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, NO.1 Beichen West Road, Chaoyang District, Beijing 100101, China
| | - Yun Chen
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Choulin Chen
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Yongzhou Bao
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Yuqi Zhu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, NO. 7 Pengfei Road, Dapeng District, Shenzhen 518124, China
| | - Yi Hong
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, NO. 7 Pengfei Road, Dapeng District, Shenzhen 518124, China
| | - Binghua Liu
- State Key Laboratory of Maricultural Biobreeding and Sustainable Goods, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, NO.106 Nanjing Road, Shinan District, Qingdao, Shandong 266071, China
| | - Qian Liu
- State Key Laboratory of Maricultural Biobreeding and Sustainable Goods, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, NO.106 Nanjing Road, Shinan District, Qingdao, Shandong 266071, China
| | - Lingna Xu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Xi Chen
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Xinhao Ma
- College of Grassland Agriculture, National Beef Cattle Improvement Center, College of Animal Science and Technology, Northwest A&F University, NO. 3 Taicheng Road, Yangling District, Yangling, Shaanxi 712100, China
| | - Hongyan Wang
- State Key Laboratory of Maricultural Biobreeding and Sustainable Goods, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, NO.106 Nanjing Road, Shinan District, Qingdao, Shandong 266071, China
| | - Long Xie
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Yilong Yao
- Green Healthy Aquaculture Research Center, Kunpeng Institute of Modern Agriculture at Foshan, Chinese Academy of Agricultural Sciences, Building 26 Lihe Technology Park, Auxiliary Road of Xinxi Avenue South, Nanhai District, Foshan 528226, China
| | - Biao Deng
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Jiaying Li
- Department of Ophthalmology, Beijing Institute of Ophthalmology, Beijing Tongren Eye Center, Beijing Tongren Hospital, Capital Medical University, Dongjiaomin lane No1, Dongcheng District, Beijing 100101, China
| | - Baojun De
- College of Life Sciences, Inner Mongolia Autonomous Region Key Laboratory of Biomanufacturing, Inner Mongolia Agricultural University, NO. 306 Zhaowuda Road, Saihan District, Hohhot 010018, China
| | - Yuting Chen
- College of Life Sciences, Inner Mongolia Autonomous Region Key Laboratory of Biomanufacturing, Inner Mongolia Agricultural University, NO. 306 Zhaowuda Road, Saihan District, Hohhot 010018, China
| | - Jing Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Tian Li
- College of JUNCAO Science and Ecology, Haixia Institute of Science and Technology, National Engineering Research Center of JUNCAO, Fujian Agriculture and Forestry University (FAFU), NO.15 Shangxiadian Road, Cangshan District, Fuzhou 0350002, China
| | - Ranran Liu
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Yuanmingyuan West Road NO. 2, Haidian District, Beijing 100193, China
| | - Zhonglin Tang
- Green Healthy Aquaculture Research Center, Kunpeng Institute of Modern Agriculture at Foshan, Chinese Academy of Agricultural Sciences, Building 26 Lihe Technology Park, Auxiliary Road of Xinxi Avenue South, Nanhai District, Foshan 528226, China
| | - Junwei Cao
- College of Life Sciences, Inner Mongolia Autonomous Region Key Laboratory of Biomanufacturing, Inner Mongolia Agricultural University, NO. 306 Zhaowuda Road, Saihan District, Hohhot 010018, China
| | - Erwei Zuo
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Chugang Mei
- College of Grassland Agriculture, National Beef Cattle Improvement Center, College of Animal Science and Technology, Northwest A&F University, NO. 3 Taicheng Road, Yangling District, Yangling, Shaanxi 712100, China
| | - Fangjie Zhu
- College of JUNCAO Science and Ecology, Haixia Institute of Science and Technology, National Engineering Research Center of JUNCAO, Fujian Agriculture and Forestry University (FAFU), NO.15 Shangxiadian Road, Cangshan District, Fuzhou 0350002, China
| | - Changwei Shao
- State Key Laboratory of Maricultural Biobreeding and Sustainable Goods, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, NO.106 Nanjing Road, Shinan District, Qingdao, Shandong 266071, China
| | - Guirong Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
| | - Tongjun Sun
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, NO. 7 Pengfei Road, Dapeng District, Shenzhen 518124, China
| | - Ningli Wang
- Department of Ophthalmology, Beijing Institute of Ophthalmology, Beijing Tongren Eye Center, Beijing Tongren Hospital, Capital Medical University, Dongjiaomin lane No1, Dongcheng District, Beijing 100101, China
| | - Gang Liu
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, NO.1 Beichen West Road, Chaoyang District, Beijing 100101, China
| | - Jian-Quan Ni
- Gene Regulatory Lab, School of Basic Medical Sciences, Tsinghua University, NO. 30 Shuangqing road, Haidian district, Beijing 100084, China
- State Key Laboratory of Molecular Oncology, Tsinghua University, NO. 30 Shuangqing road, Haidian district, Beijing 100084, China
- SXMU-Tsinghua Collaborative Innovation Center for Frontier Medicine, Shanxi Medical University, NO. 56 Xinjian South Road, Yingze District, Taiyuan 030001, China
| | - Yuwen Liu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Buxin Road NO. 97, Dapeng District, Shenzhen 518124, China
- Green Healthy Aquaculture Research Center, Kunpeng Institute of Modern Agriculture at Foshan, Chinese Academy of Agricultural Sciences, Building 26 Lihe Technology Park, Auxiliary Road of Xinxi Avenue South, Nanhai District, Foshan 528226, China
| |
Collapse
|
2
|
Wang FZ, Niyogi KK. Towards targeted engineering of promoters via deletion of repressive cis-regulatory elements. THE NEW PHYTOLOGIST 2024. [PMID: 39540713 DOI: 10.1111/nph.20280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2024]
Affiliation(s)
- Flora Zhiqi Wang
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, 94720, USA
- Innovative Genomics Institute, University of California, Berkeley, CA, 94720, USA
| | - Krishna K Niyogi
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, 94720, USA
- Innovative Genomics Institute, University of California, Berkeley, CA, 94720, USA
- Howard Hughes Medical Institute, University of California, Berkeley, CA, 94720, USA
- Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| |
Collapse
|
3
|
Holub AS, Choudury SG, Andrianova EP, Dresden CE, Camacho RU, Zhulin IB, Husbands AY. START domains generate paralog-specific regulons from a single network architecture. Nat Commun 2024; 15:9861. [PMID: 39543118 PMCID: PMC11564692 DOI: 10.1038/s41467-024-54269-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 11/01/2024] [Indexed: 11/17/2024] Open
Abstract
Functional divergence of transcription factors (TFs) has driven cellular and organismal complexity throughout evolution, but its mechanistic drivers remain poorly understood. Here we test for new mechanisms using CORONA (CNA) and PHABULOSA (PHB), two functionally diverged paralogs in the CLASS III HOMEODOMAIN LEUCINE ZIPPER (HD-ZIPIII) family of TFs. We show that virtually all genes bound by PHB ( ~ 99%) are also bound by CNA, ruling out occupation of distinct sets of genes as a mechanism of functional divergence. Further, genes bound and regulated by both paralogs are almost always regulated in the same direction, ruling out opposite regulation of shared targets as a mechanistic driver. Functional divergence of CNA and PHB instead results from differential usage of shared binding sites, with hundreds of uniquely regulated genes emerging from a commonly bound genetic network. Regulation of a given gene by CNA or PHB is thus a function of whether a bound site is considered 'responsive' versus 'non-responsive' by each paralog. Discrimination between responsive and non-responsive sites is controlled, at least in part, by their lipid binding START domain. This suggests a model in which HD-ZIPIII TFs use information integrated by their START domain to generate paralog-specific transcriptional outcomes from a shared network architecture. Taken together, our study identifies a mechanism of HD-ZIPIII TF paralog divergence and proposes the ubiquitously distributed START evolutionary module as a driver of functional divergence.
Collapse
Affiliation(s)
- Ashton S Holub
- Department of Biology, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Department of Molecular Genetics, The Ohio State University, Columbus, OH, 43215, USA
| | - Sarah G Choudury
- Department of Biology, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Epigenetics Institute, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | | | - Courtney E Dresden
- Department of Biology, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Epigenetics Institute, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Molecular, Cellular, and Developmental Biology, The Ohio State University, Columbus, OH, 43215, USA
| | - Ricardo Urquidi Camacho
- Department of Biology, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Epigenetics Institute, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Igor B Zhulin
- Department of Microbiology, The Ohio State University, Columbus, OH, 43215, USA
| | - Aman Y Husbands
- Department of Biology, University of Pennsylvania, Philadelphia, PA, 19104, USA.
- Epigenetics Institute, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
4
|
Jyoti, Ritu, Gupta S, Shankar R. Comprehensive analysis of computational approaches in plant transcription factors binding regions discovery. Heliyon 2024; 10:e39140. [PMID: 39640721 PMCID: PMC11620080 DOI: 10.1016/j.heliyon.2024.e39140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Revised: 08/23/2024] [Accepted: 10/08/2024] [Indexed: 12/07/2024] Open
Abstract
Transcription factors (TFs) are regulatory proteins which bind to a specific DNA region known as the transcription factor binding regions (TFBRs) to regulate the rate of transcription process. The identification of TFBRs has been made possible by a number of experimental and computational techniques established during the past few years. The process of TFBR identification involves peak identification in the binding data, followed by the identification of motif characteristics. Using the same binding data attempts have been made to raise computational models to identify such binding regions which could save time and resources spent for binding experiments. These computational approaches depend a lot on what way they learn and how. These existing computational approaches are skewed heavily around human TFBRs discovery, while plants have drastically different genomic setup for regulation which these approaches have grossly ignored. Here, we provide a comprehensive study of the current state of the matters in plant specific TF discovery algorithms. While doing so, we encountered several software tools' issues rendering the tools not useable to researches. We fixed them and have also provided the corrected scripts for such tools. We expect this study to serve as a guide for better understanding of software tools' approaches for plant specific TFBRs discovery and the care to be taken while applying them, especially during cross-species applications. The corrected scripts of these software tools are made available at https://github.com/SCBB-LAB/Comparative-analysis-of-plant-TFBS-software.
Collapse
Affiliation(s)
- Jyoti
- Studio of Computational Biology & Bioinformatics, The Himalayan Centre for High-throughput Computational Biology, (HiCHiCoB, A BIC Supported by DBT, India), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, (HP), 176061, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, 201002, India
| | - Ritu
- Studio of Computational Biology & Bioinformatics, The Himalayan Centre for High-throughput Computational Biology, (HiCHiCoB, A BIC Supported by DBT, India), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, (HP), 176061, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, 201002, India
| | - Sagar Gupta
- Studio of Computational Biology & Bioinformatics, The Himalayan Centre for High-throughput Computational Biology, (HiCHiCoB, A BIC Supported by DBT, India), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, (HP), 176061, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, 201002, India
| | - Ravi Shankar
- Studio of Computational Biology & Bioinformatics, The Himalayan Centre for High-throughput Computational Biology, (HiCHiCoB, A BIC Supported by DBT, India), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, (HP), 176061, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh, 201002, India
| |
Collapse
|
5
|
Li J, Rohs R. Deep DNAshape webserver: prediction and real-time visualization of DNA shape considering extended k-mers. Nucleic Acids Res 2024; 52:W7-W12. [PMID: 38801070 PMCID: PMC11223853 DOI: 10.1093/nar/gkae433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 04/30/2024] [Accepted: 05/08/2024] [Indexed: 05/29/2024] Open
Abstract
Sequence-dependent DNA shape plays an important role in understanding protein-DNA binding mechanisms. High-throughput prediction of DNA shape features has become a valuable tool in the field of protein-DNA recognition, transcription factor-DNA binding specificity, and gene regulation. However, our widely used webserver, DNAshape, relies on statistically summarized pentamer query tables to query DNA shape features. These query tables do not consider flanking regions longer than two base pairs, and acquiring a query table for hexamers or higher-order k-mers is currently still unrealistic due to limitations in achieving sufficient statistical coverage in molecular simulations or structural biology experiments. A recent deep-learning method, Deep DNAshape, can predict DNA shape features at the core of a DNA fragment considering flanking regions of up to seven base pairs, trained on limited simulation data. However, Deep DNAshape is rather complicated to install, and it must run locally compared to the pentamer-based DNAshape webserver, creating a barrier for users. Here, we present the Deep DNAshape webserver, which has the benefits of both methods while being accurate, fast, and accessible to all users. Additional improvements of the webserver include the detection of user input in real time, the ability of interactive visualization tools and different modes of analyses. URL: https://deepdnashape.usc.edu.
Collapse
Affiliation(s)
- Jinsen Li
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Remo Rohs
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
- Department of Chemistry, University of Southern California, Los Angeles, CA 90089, USA
- Department of Physics and Astronomy, University of Southern California, Los Angeles, CA 90089, USA
- Thomas Lord Department of Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
6
|
Gupta S, Kesarwani V, Bhati U, Jyoti, Shankar R. PTFSpot: deep co-learning on transcription factors and their binding regions attains impeccable universality in plants. Brief Bioinform 2024; 25:bbae324. [PMID: 39013383 PMCID: PMC11250369 DOI: 10.1093/bib/bbae324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 06/07/2024] [Accepted: 06/19/2024] [Indexed: 07/18/2024] Open
Abstract
Unlike animals, variability in transcription factors (TFs) and their binding regions (TFBRs) across the plants species is a major problem that most of the existing TFBR finding software fail to tackle, rendering them hardly of any use. This limitation has resulted into underdevelopment of plant regulatory research and rampant use of Arabidopsis-like model species, generating misleading results. Here, we report a revolutionary transformers-based deep-learning approach, PTFSpot, which learns from TF structures and their binding regions' co-variability to bring a universal TF-DNA interaction model to detect TFBR with complete freedom from TF and species-specific models' limitations. During a series of extensive benchmarking studies over multiple experimentally validated data, it not only outperformed the existing software by >30% lead but also delivered consistently >90% accuracy even for those species and TF families that were never encountered during the model-building process. PTFSpot makes it possible now to accurately annotate TFBRs across any plant genome even in the total lack of any TF information, completely free from the bottlenecks of species and TF-specific models.
Collapse
Affiliation(s)
- Sagar Gupta
- Studio of Computational Biology & Bioinformatics, The Himalayan Centre for High-throughput Computational Biology, (HiCHiCoB, A BIC supported by DBT, India), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, Himachal Pradesh 176061, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Veerbhan Kesarwani
- Studio of Computational Biology & Bioinformatics, The Himalayan Centre for High-throughput Computational Biology, (HiCHiCoB, A BIC supported by DBT, India), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, Himachal Pradesh 176061, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Umesh Bhati
- Studio of Computational Biology & Bioinformatics, The Himalayan Centre for High-throughput Computational Biology, (HiCHiCoB, A BIC supported by DBT, India), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, Himachal Pradesh 176061, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Jyoti
- Studio of Computational Biology & Bioinformatics, The Himalayan Centre for High-throughput Computational Biology, (HiCHiCoB, A BIC supported by DBT, India), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, Himachal Pradesh 176061, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| | - Ravi Shankar
- Studio of Computational Biology & Bioinformatics, The Himalayan Centre for High-throughput Computational Biology, (HiCHiCoB, A BIC supported by DBT, India), Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur, Himachal Pradesh 176061, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh 201002, India
| |
Collapse
|
7
|
Chen W, Miao C, Zhang Z, Fung CSH, Wang R, Chen Y, Qian Y, Cheng L, Yip KY, Tsui SKW, Cao Q. Commonly used software tools produce conflicting and overly-optimistic AUPRC values. Genome Biol 2024; 25:118. [PMID: 38741205 PMCID: PMC11089773 DOI: 10.1186/s13059-024-03266-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 04/30/2024] [Indexed: 05/16/2024] Open
Abstract
The precision-recall curve (PRC) and the area under the precision-recall curve (AUPRC) are useful for quantifying classification performance. They are commonly used in situations with imbalanced classes, such as cancer diagnosis and cell type annotation. We evaluate 10 popular tools for plotting PRC and computing AUPRC, which were collectively used in more than 3000 published studies. We find the AUPRC values computed by the tools rank classifiers differently and some tools produce overly-optimistic results.
Collapse
Affiliation(s)
- Wenyu Chen
- School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Chen Miao
- School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Zhenghao Zhang
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Cathy Sin-Hang Fung
- School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Ran Wang
- School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Yizhen Chen
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Yan Qian
- The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Lixin Cheng
- Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medical College of Jinan University, Shenzhen, China
| | - Kevin Y Yip
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China.
- Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, USA.
| | - Stephen Kwok-Wing Tsui
- School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China.
- Hong Kong Bioinformatics Centre, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China.
| | - Qin Cao
- School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China.
- Hong Kong Bioinformatics Centre, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China.
- Shenzhen Research Institute, The Chinese University of Hong Kong, Shenzhen, China.
| |
Collapse
|
8
|
Chen N, Yu J, Liu Z, Meng L, Li X, Wong KC. Discovering DNA shape motifs with multiple DNA shape features: generalization, methods, and validation. Nucleic Acids Res 2024; 52:4137-4150. [PMID: 38572749 PMCID: PMC11077088 DOI: 10.1093/nar/gkae210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 03/06/2024] [Accepted: 03/12/2024] [Indexed: 04/05/2024] Open
Abstract
DNA motifs are crucial patterns in gene regulation. DNA-binding proteins (DBPs), including transcription factors, can bind to specific DNA motifs to regulate gene expression and other cellular activities. Past studies suggest that DNA shape features could be subtly involved in DNA-DBP interactions. Therefore, the shape motif annotations based on intrinsic DNA topology can deepen the understanding of DNA-DBP binding. Nevertheless, high-throughput tools for DNA shape motif discovery that incorporate multiple features altogether remain insufficient. To address it, we propose a series of methods to discover non-redundant DNA shape motifs with the generalization to multiple motifs in multiple shape features. Specifically, an existing Gibbs sampling method is generalized to multiple DNA motif discovery with multiple shape features. Meanwhile, an expectation-maximization (EM) method and a hybrid method coupling EM with Gibbs sampling are proposed and developed with promising performance, convergence capability, and efficiency. The discovered DNA shape motif instances reveal insights into low-signal ChIP-seq peak summits, complementing the existing sequence motif discovery works. Additionally, our modelling captures the potential interplays across multiple DNA shape features. We provide a valuable platform of tools for DNA shape motif discovery. An R package is built for open accessibility and long-lasting impact: https://zenodo.org/doi/10.5281/zenodo.10558980.
Collapse
Affiliation(s)
- Nanjun Chen
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Jixiang Yu
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Zhe Liu
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Lingkuan Meng
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, Changchun City, Jilin Province, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
- Hong Kong Institute of Data Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
- Shenzhen Research Institute, City University of Hong Kong, Shenzhen, China
| |
Collapse
|
9
|
Chen W, Miao C, Zhang Z, Fung CSH, Wang R, Chen Y, Qian Y, Cheng L, Yip KY, Tsui SKW, Cao Q. Commonly used software tools produce conflicting and overly-optimistic AUPRC values. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.02.578654. [PMID: 38370825 PMCID: PMC10871236 DOI: 10.1101/2024.02.02.578654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
The precision-recall curve (PRC) and the area under it (AUPRC) are useful for quantifying classification performance. They are commonly used in situations with imbalanced classes, such as cancer diagnosis and cell type annotation. We evaluated 10 popular tools for plotting PRC and computing AUPRC, which were collectively used in >3,000 published studies. We found the AUPRC values computed by the tools rank classifiers differently and some tools produce overly-optimistic results.
Collapse
Affiliation(s)
- Wenyu Chen
- School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Chen Miao
- School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Zhenghao Zhang
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Cathy Sin-Hang Fung
- School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Ran Wang
- School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Yizhen Chen
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Yan Qian
- The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Lixin Cheng
- Shenzhen People’s Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medicine College of Jinan University, Shenzhen, China
| | - Kevin Y. Yip
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
- Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, USA
| | - Stephen Kwok-Wing Tsui
- School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
- Hong Kong Bioinformatics Centre, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | - Qin Cao
- School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
- Hong Kong Bioinformatics Centre, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
- Shenzhen Research Institute, The Chinese University of Hong Kong, Shenzhen, China
| |
Collapse
|
10
|
Zhang L, Liu L, Li H, He J, Chao H, Yan S, Yin Y, Zhao W, Li M. 3D genome structural variations play important roles in regulating seed oil content of Brassica napus. PLANT COMMUNICATIONS 2024; 5:100666. [PMID: 37496273 PMCID: PMC10811347 DOI: 10.1016/j.xplc.2023.100666] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 07/01/2023] [Accepted: 07/25/2023] [Indexed: 07/28/2023]
Abstract
Dissecting the complex regulatory mechanism of seed oil content (SOC) is one of the main research goals in Brassica napus. Increasing evidence suggests that genome architecture is linked to multiple biological functions. However, the effect of genome architecture on SOC regulation remains unclear. Here, we used high-throughput chromatin conformation capture to characterize differences in the three-dimensional (3D) landscape of genome architecture of seeds from two B. napus lines, N53-2 (with high SOC) and Ken-C8 (with low SOC). Bioinformatics analysis demonstrated that differentially accessible regions and differentially expressed genes between N53-2 and Ken-C8 were preferentially enriched in regions with quantitative trait loci (QTLs)/associated genomic regions (AGRs) for SOC. A multi-omics analysis demonstrated that expression of SOC-related genes was tightly correlated with genome structural variations in QTLs/AGRs of B. napus. The candidate gene BnaA09g48250D, which showed structural variation in a QTL/AGR on chrA09, was identified by fine-mapping of a KN double-haploid population derived from hybridization of N53-2 and Ken-C8. Overexpression and knockout of BnaA09g48250D led to significant increases and decreases in SOC, respectively, in the transgenic lines. Taken together, our results reveal the 3D genome architecture of B. napus seeds and the roles of genome structural variations in SOC regulation, enriching our understanding of the molecular mechanisms of SOC regulation from the perspective of spatial chromatin structure.
Collapse
Affiliation(s)
- Libin Zhang
- College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China; Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Wuhan 430074, China
| | - Lin Liu
- Wuhan Frasergen Bioinformatics Co., Ltd., Wuhan 430075, China
| | - Huaixin Li
- College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China; Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Wuhan 430074, China
| | - Jianjie He
- College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China; Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Wuhan 430074, China
| | - Hongbo Chao
- College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
| | - Shuxiang Yan
- College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China; Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Wuhan 430074, China
| | - Yontai Yin
- College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China; Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Wuhan 430074, China
| | - Weiguo Zhao
- College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China; Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Wuhan 430074, China
| | - Maoteng Li
- College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China; Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Bioinformatics and Molecular Imaging Key Laboratory, Wuhan 430074, China.
| |
Collapse
|
11
|
Kalsan M, Jabeen A, Ahmad S. Incorporating Sequence-Dependent DNA Shape and Dynamics into Transcriptome Data Analysis. Methods Mol Biol 2024; 2812:317-343. [PMID: 39068371 DOI: 10.1007/978-1-0716-3886-6_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
Differentially expressed genes in a cellular context may be co-regulated by the same transcription factor. However, in the absence of a concurrent transcription factor binding data, such interactions are difficult to detect, especially at the single cell expression level. Motif enrichments in such genes can be used to gain insight into differential expressions caused by the shared upstream TFs. However, it is now established that many genes are co-regulated by the same TF due to a shared DNA shape or sequence-dependent conformational dynamics instead of sequence motif. In this work, we demonstrate how, starting from a gene expression data, such DNA shape and dynamics signatures can be potentially detected using publicly available tools, including DynaSeq, developed in our group for predicting the sequence-dependent components of these DNA shape features.
Collapse
Affiliation(s)
- Manisha Kalsan
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India
| | - Almas Jabeen
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India
| | - Shandar Ahmad
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India.
| |
Collapse
|
12
|
Back G, Walther D. Predictions of DNA mechanical properties at a genomic scale reveal potentially new functional roles of DNA flexibility. NAR Genom Bioinform 2023; 5:lqad097. [PMID: 37954573 PMCID: PMC10632188 DOI: 10.1093/nargab/lqad097] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Revised: 09/28/2023] [Accepted: 10/25/2023] [Indexed: 11/14/2023] Open
Abstract
Mechanical properties of DNA have been implied to influence many of its biological functions. Recently, a new high-throughput method, called loop-seq, which allows measuring the intrinsic bendability of DNA fragments, has been developed. Using loop-seq data, we created a deep learning model to explore the biological significance of local DNA flexibility in a range of different species from different kingdoms. Consistently, we observed a characteristic and largely dinucleotide-composition-driven change of local flexibility near transcription start sites. In the presence of a TATA-box, a pronounced peak of high flexibility can be observed. Furthermore, depending on the transcription factor investigated, flanking-sequence-dependent DNA flexibility was identified as a potential factor influencing DNA binding. Compared to randomized genomic sequences, depending on species and taxa, actual genomic sequences were observed both with increased and lowered flexibility. Furthermore, in Arabidopsis thaliana, mutation rates, both de novo and fixed, were found to be associated with relatively rigid sequence regions. Our study presents a range of significant correlations between characteristic DNA mechanical properties and genomic features, the significance of which with regard to detailed molecular relevance awaits further theoretical and experimental exploration.
Collapse
Affiliation(s)
- Georg Back
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, Potsdam-Golm 14476, Germany
| | - Dirk Walther
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, Potsdam-Golm 14476, Germany
| |
Collapse
|
13
|
Jores T, Hamm M, Cuperus JT, Queitsch C. Frontiers and techniques in plant gene regulation. CURRENT OPINION IN PLANT BIOLOGY 2023; 75:102403. [PMID: 37331209 DOI: 10.1016/j.pbi.2023.102403] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 05/12/2023] [Accepted: 05/19/2023] [Indexed: 06/20/2023]
Abstract
Understanding plant gene regulation has been a priority for generations of plant scientists. However, due to its complex nature, the regulatory code governing plant gene expression has yet to be deciphered comprehensively. Recently developed methods-often relying on next-generation sequencing technology and state-of-the-art computational approaches-have started to further our understanding of the gene regulatory logic used by plants. In this review, we discuss these methods and the insights into the regulatory code of plants that they can yield.
Collapse
Affiliation(s)
- Tobias Jores
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
| | - Morgan Hamm
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Josh T Cuperus
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
| | - Christine Queitsch
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
| |
Collapse
|
14
|
Smith GD, Ching WH, Cornejo-Páramo P, Wong ES. Decoding enhancer complexity with machine learning and high-throughput discovery. Genome Biol 2023; 24:116. [PMID: 37173718 PMCID: PMC10176946 DOI: 10.1186/s13059-023-02955-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 04/28/2023] [Indexed: 05/15/2023] Open
Abstract
Enhancers are genomic DNA elements controlling spatiotemporal gene expression. Their flexible organization and functional redundancies make deciphering their sequence-function relationships challenging. This article provides an overview of the current understanding of enhancer organization and evolution, with an emphasis on factors that influence these relationships. Technological advancements, particularly in machine learning and synthetic biology, are discussed in light of how they provide new ways to understand this complexity. Exciting opportunities lie ahead as we continue to unravel the intricacies of enhancer function.
Collapse
Affiliation(s)
- Gabrielle D Smith
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia
| | - Wan Hern Ching
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
| | - Paola Cornejo-Páramo
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia
| | - Emily S Wong
- Victor Chang Cardiac Research Institute, 405 Liverpool Street, Darlinghurst, NSW, Australia.
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Kensington, NSW, Australia.
| |
Collapse
|
15
|
Li M, Yao T, Lin W, Hinckley WE, Galli M, Muchero W, Gallavotti A, Chen JG, Huang SSC. Double DAP-seq uncovered synergistic DNA binding of interacting bZIP transcription factors. Nat Commun 2023; 14:2600. [PMID: 37147307 PMCID: PMC10163045 DOI: 10.1038/s41467-023-38096-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Accepted: 04/15/2023] [Indexed: 05/07/2023] Open
Abstract
Many eukaryotic transcription factors (TF) form homodimer or heterodimer complexes to regulate gene expression. Dimerization of BASIC LEUCINE ZIPPER (bZIP) TFs are critical for their functions, but the molecular mechanism underlying the DNA binding and functional specificity of homo- versus heterodimers remains elusive. To address this gap, we present the double DNA Affinity Purification-sequencing (dDAP-seq) technique that maps heterodimer binding sites on endogenous genomic DNA. Using dDAP-seq we profile twenty pairs of C/S1 bZIP heterodimers and S1 homodimers in Arabidopsis and show that heterodimerization significantly expands the DNA binding preferences of these TFs. Analysis of dDAP-seq binding sites reveals the function of bZIP9 in abscisic acid response and the role of bZIP53 heterodimer-specific binding in seed maturation. The C/S1 heterodimers show distinct preferences for the ACGT elements recognized by plant bZIPs and motifs resembling the yeast GCN4 cis-elements. This study demonstrates the potential of dDAP-seq in deciphering the DNA binding specificities of interacting TFs that are key for combinatorial gene regulation.
Collapse
Affiliation(s)
- Miaomiao Li
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, 10003, USA
| | - Tao Yao
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - Wanru Lin
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, 10003, USA
| | - Will E Hinckley
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, 10003, USA
| | - Mary Galli
- Waksman Institute of Microbiology, Rutgers University, Piscataway, NJ, 08854-8020, USA
| | - Wellington Muchero
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - Andrea Gallavotti
- Waksman Institute of Microbiology, Rutgers University, Piscataway, NJ, 08854-8020, USA
| | - Jin-Gui Chen
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
| | - Shao-Shan Carol Huang
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, 10003, USA.
| |
Collapse
|
16
|
Boumpas P, Merabet S, Carnesecchi J. Integrating transcription and splicing into cell fate: Transcription factors on the block. WILEY INTERDISCIPLINARY REVIEWS. RNA 2023; 14:e1752. [PMID: 35899407 DOI: 10.1002/wrna.1752] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 06/22/2022] [Accepted: 07/01/2022] [Indexed: 11/10/2022]
Abstract
Transcription factors (TFs) are present in all life forms and conserved across great evolutionary distances in eukaryotes. From yeast to complex multicellular organisms, they are pivotal players of cell fate decision by orchestrating gene expression at diverse molecular layers. Notably, TFs fine-tune gene expression by coordinating RNA fate at both the expression and splicing levels. They regulate alternative splicing, an essential mechanism for cell plasticity, allowing the production of many mRNA and protein isoforms in precise cell and tissue contexts. Despite this apparent role in splicing, how TFs integrate transcription and splicing to ultimately orchestrate diverse cell functions and cell fate decisions remains puzzling. We depict substantial studies in various model organisms underlining the key role of TFs in alternative splicing for promoting tissue-specific functions and cell fate. Furthermore, we emphasize recent advances describing the molecular link between the transcriptional and splicing activities of TFs. As TFs can bind both DNA and/or RNA to regulate transcription and splicing, we further discuss their flexibility and compatibility for DNA and RNA substrates. Finally, we propose several models integrating transcription and splicing activities of TFs in the coordination and diversification of cell and tissue identities. This article is categorized under: RNA Processing > Splicing Regulation/Alternative Splicing RNA Interactions with Proteins and Other Molecules > Protein-RNA Interactions: Functional Implications RNA Processing > Splicing Mechanisms.
Collapse
Affiliation(s)
- Panagiotis Boumpas
- Institut de Génomique Fonctionnelle de Lyon, UMR5242, Ecole Normale Supérieure de Lyon, Centre National de la Recherche Scientifique, Université Claude Bernard-Lyon 1, Lyon, France
| | - Samir Merabet
- Institut de Génomique Fonctionnelle de Lyon, UMR5242, Ecole Normale Supérieure de Lyon, Centre National de la Recherche Scientifique, Université Claude Bernard-Lyon 1, Lyon, France
| | - Julie Carnesecchi
- Institut de Génomique Fonctionnelle de Lyon, UMR5242, Ecole Normale Supérieure de Lyon, Centre National de la Recherche Scientifique, Université Claude Bernard-Lyon 1, Lyon, France
| |
Collapse
|
17
|
Yasmeen E, Wang J, Riaz M, Zhang L, Zuo K. Designing artificial synthetic promoters for accurate, smart, and versatile gene expression in plants. PLANT COMMUNICATIONS 2023:100558. [PMID: 36760129 PMCID: PMC10363483 DOI: 10.1016/j.xplc.2023.100558] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 01/30/2023] [Accepted: 02/06/2023] [Indexed: 06/18/2023]
Abstract
With the development of high-throughput biology techniques and artificial intelligence, it has become increasingly feasible to design and construct artificial biological parts, modules, circuits, and even whole systems. To overcome the limitations of native promoters in controlling gene expression, artificial promoter design aims to synthesize short, inducible, and conditionally controlled promoters to coordinate the expression of multiple genes in diverse plant metabolic and signaling pathways. Synthetic promoters are versatile and can drive gene expression accurately with smart responses; they show potential for enhancing desirable traits in crops, thereby improving crop yield, nutritional quality, and food security. This review first illustrates the importance of synthetic promoters, then introduces promoter architecture and thoroughly summarizes advances in synthetic promoter construction. Restrictions to the development of synthetic promoters and future applications of such promoters in synthetic plant biology and crop improvement are also discussed.
Collapse
Affiliation(s)
- Erum Yasmeen
- Single Cell Research Center, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Jin Wang
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Muhammad Riaz
- Single Cell Research Center, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Lida Zhang
- Single Cell Research Center, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Kaijing Zuo
- Single Cell Research Center, School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai 200240, China.
| |
Collapse
|
18
|
Zrimec J, Zelezniak A, Gruden K. Toward learning the principles of plant gene regulation. TRENDS IN PLANT SCIENCE 2022; 27:1206-1208. [PMID: 36100536 DOI: 10.1016/j.tplants.2022.08.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Revised: 08/09/2022] [Accepted: 08/17/2022] [Indexed: 06/15/2023]
Abstract
Advanced machine learning (ML) algorithms produce highly accurate models of gene expression, uncovering novel regulatory features in nucleotide sequences involving multiple cis-regulatory regions across whole genes and structural properties. These broaden our understanding of gene regulation and point to new principles to test and adopt in the field of plant science.
Collapse
Affiliation(s)
- Jan Zrimec
- Department of Biotechnology and Systems Biology, National Institute of Biology, Večna pot 111, 1000 Ljubljana, Slovenia.
| | - Aleksej Zelezniak
- Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, 412 96, Gothenburg, Sweden
| | - Kristina Gruden
- Department of Biotechnology and Systems Biology, National Institute of Biology, Večna pot 111, 1000 Ljubljana, Slovenia.
| |
Collapse
|
19
|
Yan W, Li Z, Pian C, Wu Y. PlantBind: an attention-based multi-label neural network for predicting plant transcription factor binding sites. Brief Bioinform 2022; 23:6713513. [PMID: 36155619 DOI: 10.1093/bib/bbac425] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Revised: 08/29/2022] [Accepted: 08/31/2022] [Indexed: 12/14/2022] Open
Abstract
Identification of transcription factor binding sites (TFBSs) is essential to understanding of gene regulation. Designing computational models for accurate prediction of TFBSs is crucial because it is not feasible to experimentally assay all transcription factors (TFs) in all sequenced eukaryotic genomes. Although many methods have been proposed for the identification of TFBSs in humans, methods designed for plants are comparatively underdeveloped. Here, we present PlantBind, a method for integrated prediction and interpretation of TFBSs based on DNA sequences and DNA shape profiles. Built on an attention-based multi-label deep learning framework, PlantBind not only simultaneously predicts the potential binding sites of 315 TFs, but also identifies the motifs bound by transcription factors. During the training process, this model revealed a strong similarity among TF family members with respect to target binding sequences. Trans-species prediction performance using four Zea mays TFs demonstrated the suitability of this model for transfer learning. Overall, this study provides an effective solution for identifying plant TFBSs, which will promote greater understanding of transcriptional regulatory mechanisms in plants.
Collapse
Affiliation(s)
| | - Zutan Li
- Nanjing Agricultur al University
| | - Cong Pian
- College of Sciences at Nanjing Agricultural University
| | - Yufeng Wu
- State Key Laboratory for Crop Genetics and Germplasm Enhancement, Bioinformatics Center, College of Agriculture, Academy for Advanced Interdisciplinary Studies at Nanjing Agricultural University
| |
Collapse
|
20
|
Hajheidari M, Huang SSC. Elucidating the biology of transcription factor-DNA interaction for accurate identification of cis-regulatory elements. CURRENT OPINION IN PLANT BIOLOGY 2022; 68:102232. [PMID: 35679803 PMCID: PMC10103634 DOI: 10.1016/j.pbi.2022.102232] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 04/26/2022] [Accepted: 05/02/2022] [Indexed: 05/03/2023]
Abstract
Transcription factors (TFs) play a critical role in determining cell fate decisions by integrating developmental and environmental signals through binding to specific cis-regulatory modules and regulating spatio-temporal specificity of gene expression patterns. Precise identification of functional TF binding sites in time and space not only will revolutionize our understanding of regulatory networks governing cell fate decisions but is also instrumental to uncover how genetic variations cause morphological diversity or disease. In this review, we discuss recent advances in mapping TF binding sites and characterizing the various parameters underlying the complexity of binding site recognition by TFs.
Collapse
Affiliation(s)
- Mohsen Hajheidari
- Center for Genomics and Systems Biology, Department of Biology, New York University, 12 Waverly Pl, New York, NY 10003, USA
| | - Shao-Shan Carol Huang
- Center for Genomics and Systems Biology, Department of Biology, New York University, 12 Waverly Pl, New York, NY 10003, USA.
| |
Collapse
|
21
|
Akagi T, Masuda K, Kuwada E, Takeshita K, Kawakatsu T, Ariizumi T, Kubo Y, Ushijima K, Uchida S. Genome-wide cis-decoding for expression design in tomato using cistrome data and explainable deep learning. THE PLANT CELL 2022; 34:2174-2187. [PMID: 35258588 PMCID: PMC9134063 DOI: 10.1093/plcell/koac079] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Accepted: 02/20/2022] [Indexed: 06/14/2023]
Abstract
In the evolutionary history of plants, variation in cis-regulatory elements (CREs) resulting in diversification of gene expression has played a central role in driving the evolution of lineage-specific traits. However, it is difficult to predict expression behaviors from CRE patterns to properly harness them, mainly because the biological processes are complex. In this study, we used cistrome datasets and explainable convolutional neural network (CNN) frameworks to predict genome-wide expression patterns in tomato (Solanum lycopersicum) fruit from the DNA sequences in gene regulatory regions. By fixing the effects of trans-acting factors using single cell-type spatiotemporal transcriptome data for the response variables, we developed a prediction model for crucial expression patterns in the initiation of tomato fruit ripening. Feature visualization of the CNNs identified nucleotide residues critical to the objective expression pattern in each gene, and their effects were validated experimentally in ripening tomato fruit. This cis-decoding framework will not only contribute to the understanding of the regulatory networks derived from CREs and transcription factor interactions, but also provides a flexible means of designing alleles for optimized expression.
Collapse
Affiliation(s)
| | | | | | | | - Taiji Kawakatsu
- Institute of Agrobiological Sciences, National Agriculture and Food Research Organization, Tsukuba, Ibaraki 305-8602, Japan
| | - Tohru Ariizumi
- Faculty of Life and Environmental Sciences, University of Tsukuba, Tsukuba Plant Innovation Research Center, Tsukuba, Japan
| | - Yasutaka Kubo
- Graduate School of Environmental and Life Science, Okayama University, Okayama 700-8530, Japan
| | - Koichiro Ushijima
- Graduate School of Environmental and Life Science, Okayama University, Okayama 700-8530, Japan
| | - Seiichi Uchida
- Department of Advanced Information Technology, Kyushu University, Fukuoka 819-0395, Japan
| |
Collapse
|