1
|
Patra P, Gao YQ. Structural and dynamical aspect of DNA motif sequence specific binding of AP-1 transcription factor. J Chem Phys 2024; 160:115103. [PMID: 38506297 DOI: 10.1063/5.0196508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Accepted: 02/26/2024] [Indexed: 03/21/2024] Open
Abstract
Activator protein-1 (AP-1) comprises one of the largest and most evolutionary conserved families of ubiquitous eukaryotic transcription factors that act as a pioneer factor. Diversity in DNA binding interaction of AP-1 through a conserved basic-zipper (bZIP) domain directs in-depth understanding of how AP-1 achieves its DNA binding selectivity and consequently gene regulation specificity. Here, we address the structural and dynamical aspects of the DNA target recognition process of AP-1 using microsecond-long atomistic simulations based on the structure of the human AP-1 FosB/JunD bZIP-DNA complex. Our results show the unique role of DNA shape features in selective base specific interactions, characteristic ion population, and solvation properties of DNA grooves to form the motif sequence specific AP-1-DNA complex. The TpG step at the two terminals of the AP-1 site plays an important role in the structural adjustment of DNA by modifying the helical twist in the AP-1 bound state. We addressed the role of intrinsic motion of the bZIP domain in terms of opening and closing gripper motions of DNA binding helices, in target site recognition and binding of AP-1 factors. Our observations suggest that binding to the cognate motif in DNA is mainly accompanied with the precise adjustment of closing gripper motion of DNA binding helices of the bZIP domain.
Collapse
Affiliation(s)
- Piya Patra
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, 518107 Shenzhen, China
| | - Yi Qin Gao
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, 518107 Shenzhen, China
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, 100871 Beijing, China
- Biomedical Pioneering Innovation Center, Peking University, 100871 Beijing, China
- Changping Laboratory, Beijing 102200, China
| |
Collapse
|
2
|
Roldán-Piñero C, Luengo-Márquez J, Assenza S, Pérez R. Systematic Comparison of Atomistic Force Fields for the Mechanical Properties of Double-Stranded DNA. J Chem Theory Comput 2024; 20:2261-2272. [PMID: 38411091 PMCID: PMC10938644 DOI: 10.1021/acs.jctc.3c01089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 02/14/2024] [Accepted: 02/14/2024] [Indexed: 02/28/2024]
Abstract
The response of double-stranded DNA to external mechanical stress plays a central role in its interactions with the protein machinery in the cell. Modern atomistic force fields have been shown to provide highly accurate predictions for the fine structural features of the duplex. In contrast, and despite their pivotal function, less attention has been devoted to the accuracy of the prediction of the elastic parameters. Several reports have addressed the flexibility of double-stranded DNA via all-atom molecular dynamics, yet the collected information is insufficient to have a clear understanding of the relative performance of the various force fields. In this work, we fill this gap by performing a systematic study in which several systems, characterized by different sequence contexts, are simulated with the most popular force fields within the AMBER family, bcs1 and OL15, as well as with CHARMM36. Analysis of our results, together with their comparison with previous work focused on bsc0, allows us to unveil the differences in the predicted rigidity between the newest force fields and suggests a roadmap to test their performance against experiments. In the case of the stretch modulus, we reconcile these differences, showing that a single mapping between sequence-dependent conformation and elasticity via the crookedness parameter captures simultaneously the results of all force fields, supporting the key role of crookedness in the mechanical response of double-stranded DNA.
Collapse
Affiliation(s)
- Carlos Roldán-Piñero
- Departamento
de Física Teórica de la Materia Condensada, Universidad Autónoma de Madrid, E-28049 Madrid, Spain
| | - Juan Luengo-Márquez
- Departamento
de Física Teórica de la Materia Condensada, Universidad Autónoma de Madrid, E-28049 Madrid, Spain
- Instituto
Nicolás Cabrera, Universidad Autónoma
de Madrid, E-28049 Madrid, Spain
| | - Salvatore Assenza
- Departamento
de Física Teórica de la Materia Condensada, Universidad Autónoma de Madrid, E-28049 Madrid, Spain
- Instituto
Nicolás Cabrera, Universidad Autónoma
de Madrid, E-28049 Madrid, Spain
- Condensed
Matter Physics Center (IFIMAC), Universidad
Autónoma de Madrid, E-28049 Madrid, Spain
| | - Rubén Pérez
- Departamento
de Física Teórica de la Materia Condensada, Universidad Autónoma de Madrid, E-28049 Madrid, Spain
- Condensed
Matter Physics Center (IFIMAC), Universidad
Autónoma de Madrid, E-28049 Madrid, Spain
| |
Collapse
|
3
|
Augustijn HE, Roseboom AM, Medema MH, van Wezel GP. Harnessing regulatory networks in Actinobacteria for natural product discovery. J Ind Microbiol Biotechnol 2024; 51:kuae011. [PMID: 38569653 PMCID: PMC10996143 DOI: 10.1093/jimb/kuae011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 04/02/2024] [Indexed: 04/05/2024]
Abstract
Microbes typically live in complex habitats where they need to rapidly adapt to continuously changing growth conditions. To do so, they produce an astonishing array of natural products with diverse structures and functions. Actinobacteria stand out for their prolific production of bioactive molecules, including antibiotics, anticancer agents, antifungals, and immunosuppressants. Attention has been directed especially towards the identification of the compounds they produce and the mining of the large diversity of biosynthetic gene clusters (BGCs) in their genomes. However, the current return on investment in random screening for bioactive compounds is low, while it is hard to predict which of the millions of BGCs should be prioritized. Moreover, many of the BGCs for yet undiscovered natural products are silent or cryptic under laboratory growth conditions. To identify ways to prioritize and activate these BGCs, knowledge regarding the way their expression is controlled is crucial. Intricate regulatory networks control global gene expression in Actinobacteria, governed by a staggering number of up to 1000 transcription factors per strain. This review highlights recent advances in experimental and computational methods for characterizing and predicting transcription factor binding sites and their applications to guide natural product discovery. We propose that regulation-guided genome mining approaches will open new avenues toward eliciting the expression of BGCs, as well as prioritizing subsets of BGCs for expression using synthetic biology approaches. ONE-SENTENCE SUMMARY This review provides insights into advances in experimental and computational methods aimed at predicting transcription factor binding sites and their applications to guide natural product discovery.
Collapse
Affiliation(s)
- Hannah E Augustijn
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
- Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Anna M Roseboom
- Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Marnix H Medema
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
- Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Gilles P van Wezel
- Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
- Netherlands Institute for Ecology (NIOO-KNAW), Wageningen, The Netherlands
| |
Collapse
|
4
|
Rommelfanger MK, Behrends M, Chen Y, Martinez J, Bens M, Xiong L, Rudolph KL, MacLean AL. Gene regulatory network inference with popInfer reveals dynamic regulation of hematopoietic stem cell quiescence upon diet restriction and aging. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.18.537360. [PMID: 37131596 PMCID: PMC10153203 DOI: 10.1101/2023.04.18.537360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Inference of gene regulatory networks (GRNs) can reveal cell state transitions from single-cell genomics data. However, obstacles to temporal inference from snapshot data are difficult to overcome. Single-nuclei multiomics data offer means to bridge this gap and derive temporal information from snapshot data using joint measurements of gene expression and chromatin accessibility in the same single cells. We developed popInfer to infer networks that characterize lineage-specific dynamic cell state transitions from joint gene expression and chromatin accessibility data. Benchmarking against alternative methods for GRN inference, we showed that popInfer achieves higher accuracy in the GRNs inferred. popInfer was applied to study single-cell multiomics data characterizing hematopoietic stem cells (HSCs) and the transition from HSC to a multipotent progenitor cell state during murine hematopoiesis across age and dietary conditions. From networks predicted by popInfer, we discovered gene interactions controlling entry to/exit from HSC quiescence that are perturbed in response to diet or aging.
Collapse
Affiliation(s)
- Megan K. Rommelfanger
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Marthe Behrends
- Research Group on Stem Cell and Metabolism Aging, Leibniz Institute on Aging, Fritz Lipmann Institute (FLI), Jena, Germany
| | - Yulin Chen
- Research Group on Stem Cell and Metabolism Aging, Leibniz Institute on Aging, Fritz Lipmann Institute (FLI), Jena, Germany
| | - Jonathan Martinez
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Martin Bens
- Core Facility Next Generation Sequencing, Leibniz Institute on Aging, Fritz Lipmann Institute (FLI), Jena, Germany
| | - Lingyun Xiong
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
- Department of Stem Cell Biology and Regenerative Medicine, Broad-CIRM Center, Keck School of Medicine, University of Southern California, Los Angeles, CA 90089, USA
| | - K. Lenhard Rudolph
- Research Group on Stem Cell and Metabolism Aging, Leibniz Institute on Aging, Fritz Lipmann Institute (FLI), Jena, Germany
- Medical Faculty, Jena University Hospital, Friedrich Schiller University, Jena, Germany
| | - Adam L. MacLean
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
5
|
Patra P, Gao YQ. Sequence-Specific Structural Features and Solvation Properties of Transcription Factor Binding DNA Motifs: Insights from Molecular Dynamics Simulation. J Phys Chem B 2022; 126:9187-9206. [PMID: 36322688 DOI: 10.1021/acs.jpcb.2c05749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Sequence-specific recognition of transcription factor (TF) binding motifs in the target site of DNA over the vast amount of non-target DNA is of primary importance for the transcriptional regulation of gene expression by the TFs. Binding of TFs to the target site of DNA relies not only on the direct contact formation but also on the structural and conformational features of DNA. Recognition of DNA structural features or shape readout by proteins is an important factor in the context of TF-DNA interaction. Based on the atomistic molecular simulation, here we report the sequence-dependent unique structural features, solvation, and ion-binding properties of biologically relevant AT- and GC-rich human TF binding motifs in DNA. Counterion and water distribution around the motif is found to be sensitive to the motif sequence, which is accompanied with the DNA shape features. The motif sequence affects the electrostatic potential along the grooves, and cytosine methylation alters the DNA shape features. Characteristic solvation properties of TF binding motif DNA fragments infer that an ionic environment and hydration influences are essential to describe TF-DNA interactions.
Collapse
Affiliation(s)
- Piya Patra
- Shenzhen Bay Laboratory, Institute of Systems and Physical Biology, Shenzhen 518107, China
| | - Yi Qin Gao
- Shenzhen Bay Laboratory, Institute of Systems and Physical Biology, Shenzhen 518107, China.,Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China.,Biomedical Pioneering Innovation Center, Peking University, Beijing 100871, China
| |
Collapse
|
6
|
Chang L, Mondal A, Perez A. Towards rational computational peptide design. FRONTIERS IN BIOINFORMATICS 2022; 2:1046493. [PMID: 36338806 PMCID: PMC9634169 DOI: 10.3389/fbinf.2022.1046493] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 10/11/2022] [Indexed: 11/16/2022] Open
Abstract
Peptides are prevalent in biology, mediating as many as 40% of protein-protein interactions, and involved in other cellular functions such as transport and signaling. Their ability to bind with high specificity make them promising therapeutical agents with intermediate properties between small molecules and large biologics. Beyond their biological role, peptides can be programmed to self-assembly, and they are already being used for functions as diverse as oligonuclotide delivery, tissue regeneration or as drugs. However, the transient nature of their interactions has limited the number of structures and knowledge of binding affinities available-and their flexible nature has limited the success of computational pipelines that predict the structures and affinities of these molecules. Fortunately, recent advances in experimental and computational pipelines are creating new opportunities for this field. We are starting to see promising predictions of complex structures, thermodynamic and kinetic properties. We believe in the following years this will lead to robust rational peptide design pipelines with success similar to those applied for small molecule drug discovery.
Collapse
Affiliation(s)
- Liwei Chang
- Department of Chemistry, University of Florida, Gainesville, FL, United States
- Quantum Theory Project, University of Florida, Gainesville, FL, United States
| | - Arup Mondal
- Department of Chemistry, University of Florida, Gainesville, FL, United States
- Quantum Theory Project, University of Florida, Gainesville, FL, United States
| | - Alberto Perez
- Department of Chemistry, University of Florida, Gainesville, FL, United States
- Quantum Theory Project, University of Florida, Gainesville, FL, United States
| |
Collapse
|
7
|
Zhao M, Yuan Z, Wu L, Zhou S, Deng Y. Precise Prediction of Promoter Strength Based on a De Novo Synthetic Promoter Library Coupled with Machine Learning. ACS Synth Biol 2022; 11:92-102. [PMID: 34927418 DOI: 10.1021/acssynbio.1c00117] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Promoters are one of the most critical regulatory elements controlling metabolic pathways. However, the fast and accurate prediction of promoter strength remains challenging, leading to time- and labor-consuming promoter construction and characterization processes. This dilemma is caused by the lack of a big promoter library that has gradient strengths, broad dynamic ranges, and clear sequence profiles that can be used to train an artificial intelligence model of promoter strength prediction. To overcome this challenge, we constructed and characterized a mutant library of Trc promoters (Ptrc) using 83 rounds of mutation-construction-screening-characterization engineering cycles. After excluding invalid mutation sites, we established a synthetic promoter library that consisted of 3665 different variants, displaying an intensity range of more than two orders of magnitude. The strongest variant was ∼69-fold stronger than the original Ptrc and 1.52-fold stronger than a 1 mM isopropyl-β-d-thiogalactoside-driven PT7 promoter, with an ∼454-fold difference between the strongest and weakest expression levels. Using this synthetic promoter library, different machine learning models were built and optimized to explore the relationships between promoter sequences and transcriptional strength. Finally, our XgBoost model exhibited optimal performance, and we utilized this approach to precisely predict the strength of artificially designed promoter sequences (R2 = 0.88, mean absolute error = 0.15, and Pearson correlation coefficient = 0.94). Our work provides a powerful platform that enables the predictable tuning of promoters to achieve optimal transcriptional strength.
Collapse
Affiliation(s)
- Mei Zhao
- National Engineering Laboratory for Cereal Fermentation Technology (NELCF), Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- Jiangsu Provincial Research Center for Bioactive Product Processing Technology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- School of Food and Biological Engineering, Jiangsu University, 301 Xuefu Road, Zhenjiang, Jiangsu 212013, China
| | - Zhenqi Yuan
- School of Artificial Intelligence and Computer Science, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Longtao Wu
- College of Physics and Optoelectronics, Taiyuan University of Technology, Taiyuan 030024, China
| | - Shenghu Zhou
- National Engineering Laboratory for Cereal Fermentation Technology (NELCF), Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- Jiangsu Provincial Research Center for Bioactive Product Processing Technology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| | - Yu Deng
- National Engineering Laboratory for Cereal Fermentation Technology (NELCF), Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
- Jiangsu Provincial Research Center for Bioactive Product Processing Technology, Jiangnan University, 1800 Lihu Road, Wuxi, Jiangsu 214122, China
| |
Collapse
|
8
|
Guo Y, Ju Y, Chen D, Wang L. Research on the Computational Prediction of Essential Genes. Front Cell Dev Biol 2021; 9:803608. [PMID: 34938741 PMCID: PMC8685449 DOI: 10.3389/fcell.2021.803608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 11/22/2021] [Indexed: 11/19/2022] Open
Abstract
Genes, the nucleotide sequences that encode a polypeptide chain or functional RNA, are the basic genetic unit controlling biological traits. They are the guarantee of the basic structures and functions in organisms, and they store information related to biological factors and processes such as blood type, gestation, growth, and apoptosis. The environment and genetics jointly affect important physiological processes such as reproduction, cell division, and protein synthesis. Genes are related to a wide range of phenomena including growth, decline, illness, aging, and death. During the evolution of organisms, there is a class of genes that exist in a conserved form in multiple species. These genes are often located on the dominant strand of DNA and tend to have higher expression levels. The protein encoded by it usually either performs very important functions or is responsible for maintaining and repairing these essential functions. Such genes are called persistent genes. Among them, the irreplaceable part of the body’s life activities is the essential gene. For example, when starch is the only source of energy, the genes related to starch digestion are essential genes. Without them, the organism will die because it cannot obtain enough energy to maintain basic functions. The function of the proteins encoded by these genes is thought to be fundamental to life. Nowadays, DNA can be extracted from blood, saliva, or tissue cells for genetic testing, and detailed genetic information can be obtained using the most advanced scientific instruments and technologies. The information gained from genetic testing is useful to assess the potential risks of disease, and to help determine the prognosis and development of diseases. Such information is also useful for developing personalized medication and providing targeted health guidance to improve the quality of life. Therefore, it is of great theoretical and practical significance to identify important and essential genes. In this paper, the research status of essential genes and the essential genome database of bacteria are reviewed, the computational prediction method of essential genes based on communication coding theory is expounded, and the significance and practical application value of essential genes are discussed.
Collapse
Affiliation(s)
- Yuxin Guo
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China
| | - Dong Chen
- College of Electrical and Information Engineering, Quzhou University, Quzhou, China
| | - Lihong Wang
- Beidahuang Industry Group General Hospital, Harbin, China
| |
Collapse
|
9
|
Sielemann J, Wulf D, Schmidt R, Bräutigam A. Local DNA shape is a general principle of transcription factor binding specificity in Arabidopsis thaliana. Nat Commun 2021; 12:6549. [PMID: 34772949 PMCID: PMC8590021 DOI: 10.1038/s41467-021-26819-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 10/21/2021] [Indexed: 11/20/2022] Open
Abstract
Understanding gene expression will require understanding where regulatory factors bind genomic DNA. The frequently used sequence-based motifs of protein-DNA binding are not predictive, since a genome contains many more binding sites than are actually bound and transcription factors of the same family share similar DNA-binding motifs. Traditionally, these motifs only depict sequence but neglect DNA shape. Since shape may contribute non-linearly and combinational to binding, machine learning approaches ought to be able to better predict transcription factor binding. Here we show that a random forest machine learning approach, which incorporates the 3D-shape of DNA, enhances binding prediction for all 216 tested Arabidopsis thaliana transcription factors and improves the resolution of differential binding by transcription factor family members which share the same binding motif. We observed that DNA shape features were individually weighted for each transcription factor, even if they shared the same binding sequence. Methods to predict transcription factor binding sites typically focus on sequence motifs without considering DNA shape. Here the authors use a random forest machine learning approach that incorporates DNA shape and improves binding site prediction for Arabidopsis thaliana transcription factors.
Collapse
Affiliation(s)
- Janik Sielemann
- Computational Biology, Center for Biotechnology (CeBiTec), Bielefeld University, 33615, Bielefeld, Germany.,Computational Biology, Faculty of Biology, Bielefeld University, 33615, Bielefeld, Germany.,Graduate School DILS, Bielefeld Institute for Bioinformatics Infrastructure (BIBI), Bielefeld University, 33615, Bielefeld, Germany
| | - Donat Wulf
- Computational Biology, Center for Biotechnology (CeBiTec), Bielefeld University, 33615, Bielefeld, Germany.,Computational Biology, Faculty of Biology, Bielefeld University, 33615, Bielefeld, Germany.,Graduate School DILS, Bielefeld Institute for Bioinformatics Infrastructure (BIBI), Bielefeld University, 33615, Bielefeld, Germany
| | - Romy Schmidt
- Plant Biotechnology, Bielefeld University, 33615, Bielefeld, Germany
| | - Andrea Bräutigam
- Computational Biology, Center for Biotechnology (CeBiTec), Bielefeld University, 33615, Bielefeld, Germany. .,Computational Biology, Faculty of Biology, Bielefeld University, 33615, Bielefeld, Germany. .,Graduate School DILS, Bielefeld Institute for Bioinformatics Infrastructure (BIBI), Bielefeld University, 33615, Bielefeld, Germany.
| |
Collapse
|
10
|
Zrimec J, Buric F, Kokina M, Garcia V, Zelezniak A. Learning the Regulatory Code of Gene Expression. Front Mol Biosci 2021; 8:673363. [PMID: 34179082 PMCID: PMC8223075 DOI: 10.3389/fmolb.2021.673363] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Accepted: 05/24/2021] [Indexed: 11/13/2022] Open
Abstract
Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology.
Collapse
Affiliation(s)
- Jan Zrimec
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Filip Buric
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Mariia Kokina
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Victor Garcia
- School of Life Sciences and Facility Management, Zurich University of Applied Sciences, Wädenswil, Switzerland
| | - Aleksej Zelezniak
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Science for Life Laboratory, Stockholm, Sweden
| |
Collapse
|
11
|
Wang S, Zhang Q, Shen Z, He Y, Chen ZH, Li J, Huang DS. Predicting transcription factor binding sites using DNA shape features based on shared hybrid deep learning architecture. MOLECULAR THERAPY-NUCLEIC ACIDS 2021; 24:154-163. [PMID: 33767912 PMCID: PMC7972936 DOI: 10.1016/j.omtn.2021.02.014] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Accepted: 02/14/2021] [Indexed: 12/26/2022]
Abstract
The study of transcriptional regulation is still difficult yet fundamental in molecular biology research. Recent research has shown that the double helix structure of nucleotides plays an important role in improving the accuracy and interpretability of transcription factor binding sites (TFBSs). Although several computational methods have been designed to take both DNA sequence and DNA shape features into consideration simultaneously, how to design an efficient model is still an intractable topic. In this paper, we proposed a hybrid convolutional recurrent neural network (CNN/RNN) architecture, CRPTS, to predict TFBSs by combining DNA sequence and DNA shape features. The novelty of our proposed method relies on three critical aspects: (1) the application of a shared hybrid CNN and RNN has the ability to efficiently extract features from large-scale genomic sequences obtained by high-throughput technology; (2) the common patterns were found from DNA sequences and their corresponding DNA shape features; (3) our proposed CRPTS can capture local structural information of DNA sequences without completely relying on DNA shape data. A series of comprehensive experiments on 66 in vitro datasets derived from universal protein binding microarrays (uPBMs) shows that our proposed method CRPTS obviously outperforms the state-of-the-art methods.
Collapse
Affiliation(s)
- Siguo Wang
- The Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, No. 4800 Caoan Road, Shanghai 201804, China
| | - Qinhu Zhang
- The Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, No. 4800 Caoan Road, Shanghai 201804, China.,Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Tongji University, Siping Road 1239, Shanghai 200092, China
| | - Zhen Shen
- School of Computer and Software, Nanyang Institute of Technology, Changjiang Road 80, Nanyang, Henan 473004, China
| | - Ying He
- The Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, No. 4800 Caoan Road, Shanghai 201804, China
| | - Zhen-Heng Chen
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
| | - Jianqiang Li
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
| | - De-Shuang Huang
- The Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, No. 4800 Caoan Road, Shanghai 201804, China
| |
Collapse
|
12
|
Dantas Machado AC, Cooper BH, Lei X, Di Felice R, Chen L, Rohs R. Landscape of DNA binding signatures of myocyte enhancer factor-2B reveals a unique interplay of base and shape readout. Nucleic Acids Res 2020; 48:8529-8544. [PMID: 32738045 PMCID: PMC7470950 DOI: 10.1093/nar/gkaa642] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Revised: 07/16/2020] [Accepted: 07/22/2020] [Indexed: 01/08/2023] Open
Abstract
Myocyte enhancer factor-2B (MEF2B) has the unique capability of binding to its DNA target sites with a degenerate motif, while still functioning as a gene-specific transcriptional regulator. Identifying its DNA targets is crucial given regulatory roles exerted by members of the MEF2 family and MEF2B's involvement in B-cell lymphoma. Analyzing structural data and SELEX-seq experimental results, we deduced the DNA sequence and shape determinants of MEF2B target sites on a high-throughput basis in vitro for wild-type and mutant proteins. Quantitative modeling of MEF2B binding affinities and computational simulations exposed the DNA readout mechanisms of MEF2B. The resulting binding signature of MEF2B revealed distinct intricacies of DNA recognition compared to other transcription factors. MEF2B uses base readout at its half-sites combined with shape readout at the center of its degenerate motif, where A-tract polarity dictates nuances of binding. The predominant role of shape readout at the center of the core motif, with most contacts formed in the minor groove, differs from previously observed protein-DNA readout modes. MEF2B, therefore, represents a unique protein for studies of the role of DNA shape in achieving binding specificity. MEF2B-DNA recognition mechanisms are likely representative for other members of the MEF2 family.
Collapse
Affiliation(s)
- Ana Carolina Dantas Machado
- Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Brendon H Cooper
- Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Xiao Lei
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Rosa Di Felice
- Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
- Department of Physics & Astronomy, University of Southern California, Los Angeles, CA 90089, USA
| | - Lin Chen
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
- Department of Chemistry, University of Southern California, Los Angeles, CA 90089, USA
- Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA 90033, USA
| | - Remo Rohs
- Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
- Department of Physics & Astronomy, University of Southern California, Los Angeles, CA 90089, USA
- Department of Chemistry, University of Southern California, Los Angeles, CA 90089, USA
- Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA 90033, USA
- Department of Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
13
|
Malousi A, Andreou AZ, Kouidou S. In silico structural analysis of sequences containing 5-hydroxymethylcytosine reveals its potential as binding regulator for development, ageing and cancer-related transcription factors. Epigenetics 2020; 16:503-518. [PMID: 32752914 DOI: 10.1080/15592294.2020.1805693] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
The presence of 5-hydroxymethyl cytosine in DNA has been previously associated with ageing. Using in silico analysis of normal liver samples we presently observed that in 5-hydroxymethyl cytosine sequences, DNA methylation is dependent on the co-presence of G-quadruplexes and palindromes. This association exhibits discrete patterns depending on G-quadruplex and palindrome densities. DNase-Seq data show that 5-hydroxymethyl cytosine sequences are common among liver nucleosomes (p < 2.2x10-16) and threefold more frequent than nucleosome sequences. Nucleosomes lacking palindromes and potential G-quadruplexes are rare in vivo (1%) and nucleosome occupancy potential decreases with increasing G-quadruplexes. Palindrome distribution is similar to that previously reported in nucleosomes. In low and mixed complexity sequences 5-hydroxymethyl cytosine is frequently located next to three elements: G-quadruplexes or imperfect G-quadruplexes with CpGs, or unstable hairpin loops (TCCCAY6TGGGA) mostly located in antisense strands or finally A-/T-rich segments near these motifs. The high frequencies and selective distribution of pentamer sequences (including TCCCA, TGGGA) probably indicate the positive contribution of 5-hydroxymethyl cytosine to stabilize the formation of structures unstable in the absence of this cytosine modification. Common motifs identified in all total 5-hydroxymethyl cytosine-containing sequences exhibit high homology to recognition sites of several transcription factor families: homeobox, factors involved in growth, mortality/ageing, cancer, neuronal function, vision, and reproduction. We conclude that cytosine hydroxymethylation could play a role in the recognition of sequences with G-quadruplexes/palindromes by forming epigenetically regulated DNA 'springs' and governing expansions or compressions recognized by different transcription factors or stabilizing nucleosomes. The balance of these epigenetic elements is lost in hepatocellular carcinoma.
Collapse
Affiliation(s)
- Andigoni Malousi
- Lab. of Biological Chemistry, Medical School, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | | | - Sofia Kouidou
- Lab. of Biological Chemistry, Medical School, Aristotle University of Thessaloniki, Thessaloniki, Greece
| |
Collapse
|
14
|
Nagy G, Nagy L. Motif grammar: The basis of the language of gene expression. Comput Struct Biotechnol J 2020; 18:2026-2032. [PMID: 32802274 PMCID: PMC7406977 DOI: 10.1016/j.csbj.2020.07.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Revised: 07/06/2020] [Accepted: 07/08/2020] [Indexed: 11/21/2022] Open
Abstract
Collaboration of transcription factors (TFs) and their recognition motifs in DNA is the result of coevolution and forms the basis of gene regulation. However, the way how these short genomic sequences contribute to setting the level of gene products is not understood in sufficient detail. The biological problem to be solved by the cell is complex, because each gene requires a unique regulatory network in each cellular condition using the same genome. Thus far, only some components of these networks have been uncovered. In this review, we compiled the features and principles of the motif grammar, which dictates the characteristics and thus the likelihood of the interactions of the binding TFs and their coregulators. We present how sequence features provide specificity using, as examples, two major TF superfamilies, the bZIP proteins and nuclear receptors. We also discuss the phenomenon of “weak” (low affinity) binding sites, which appear to be components of several important genomic regulatory regions, but paradoxically are barely detectable by the currently used approaches. Assembling the complete set of regulatory regions composed of both weak and strong binding sites will allow one to get more comprehensive lists of factors playing roles in gene regulation, thus making possible the deeper understanding of regulatory networks.
Collapse
Affiliation(s)
- Gergely Nagy
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, Debrecen, HU 4032, Hungary
| | - Laszlo Nagy
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, Debrecen, HU 4032, Hungary
- Johns Hopkins University School of Medicine, Departments of Medicine and Biological Chemistry, Institute for Fundamental Biomedical Research, Johns Hopkins All Children’s Hospital, Saint Petersburg, FL 33701, USA
- Corresponding author at: Johns Hopkins University School of Medicine, Departments of Medicine and Biological Chemistry, Institute for Fundamental Biomedical Research, Johns Hopkins All Children’s Hospital, Saint Petersburg, FL 33701, USA.
| |
Collapse
|
15
|
Epigenetic competition reveals density-dependent regulation and target site plasticity of phosphorothioate epigenetics in bacteria. Proc Natl Acad Sci U S A 2020; 117:14322-14330. [PMID: 32518115 DOI: 10.1073/pnas.2002933117] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Phosphorothioate (PT) DNA modifications-in which a nonbonding phosphate oxygen is replaced with sulfur-represent a widespread, horizontally transferred epigenetic system in prokaryotes and have a highly unusual property of occupying only a small fraction of available consensus sequences in a genome. Using Salmonella enterica as a model, we asked a question of fundamental importance: How do the PT-modifying DndA-E proteins select their GPSAAC/GPSTTC targets? Here, we applied innovative analytical, sequencing, and computational tools to discover a novel behavior for DNA-binding proteins: The Dnd proteins are "parked" at the G6mATC Dam methyltransferase consensus sequence instead of the expected GAAC/GTTC motif, with removal of the 6mA permitting extensive PT modification of GATC sites. This shift in modification sites further revealed a surprising constancy in the density of PT modifications across the genome. Computational analysis showed that GAAC, GTTC, and GATC share common features of DNA shape, which suggests that PT epigenetics are regulated in a density-dependent manner partly by DNA shape-driven target selection in the genome.
Collapse
|
16
|
Kribelbauer JF, Loker RE, Feng S, Rastogi C, Abe N, Rube HT, Bussemaker HJ, Mann RS. Context-Dependent Gene Regulation by Homeodomain Transcription Factor Complexes Revealed by Shape-Readout Deficient Proteins. Mol Cell 2020; 78:152-167.e11. [PMID: 32053778 DOI: 10.1016/j.molcel.2020.01.027] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Revised: 12/01/2019] [Accepted: 01/27/2020] [Indexed: 01/09/2023]
Abstract
Eukaryotic transcription factors (TFs) form complexes with various partner proteins to recognize their genomic target sites. Yet, how the DNA sequence determines which TF complex forms at any given site is poorly understood. Here, we demonstrate that high-throughput in vitro DNA binding assays coupled with unbiased computational analysis provide unprecedented insight into how different DNA sequences select distinct compositions and configurations of homeodomain TF complexes. Using inferred knowledge about minor groove width readout, we design targeted protein mutations that destabilize homeodomain binding both in vitro and in vivo in a complex-specific manner. By performing parallel systematic evolution of ligands by exponential enrichment sequencing (SELEX-seq), chromatin immunoprecipitation sequencing (ChIP-seq), RNA sequencing (RNA-seq), and Hi-C assays, we not only classify the majority of in vivo binding events in terms of complex composition but also infer complex-specific functions by perturbing the gene regulatory network controlled by a single complex.
Collapse
Affiliation(s)
- Judith F Kribelbauer
- Department of Biological Sciences, Columbia University, New York, NY 10025, USA; Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Ryan E Loker
- Department of Biochemistry and Molecular Biophysics, Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, USA
| | - Siqian Feng
- Department of Biochemistry and Molecular Biophysics, Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, USA
| | - Chaitanya Rastogi
- Department of Biological Sciences, Columbia University, New York, NY 10025, USA; Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Namiko Abe
- Department of Biochemistry and Molecular Biophysics, Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, USA
| | - H Tomas Rube
- Department of Biological Sciences, Columbia University, New York, NY 10025, USA; Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY 10025, USA; Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA.
| | - Richard S Mann
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA; Department of Biochemistry and Molecular Biophysics, Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, USA; Department of Neuroscience, Columbia University, New York, NY 10027, USA.
| |
Collapse
|
17
|
Rigden DJ, Fernández XM. The 27th annual Nucleic Acids Research database issue and molecular biology database collection. Nucleic Acids Res 2020; 48:D1-D8. [PMID: 31906604 PMCID: PMC6943072 DOI: 10.1093/nar/gkz1161] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The 2020 Nucleic Acids Research Database Issue contains 148 papers spanning molecular biology. They include 59 papers reporting on new databases and 79 covering recent changes to resources previously published in the issue. A further ten papers are updates on databases most recently published elsewhere. This issue contains three breakthrough articles: AntiBodies Chemically Defined (ABCD) curates antibody sequences and their cognate antigens; SCOP returns with a new schema and breaks away from a purely hierarchical structure; while the new Alliance of Genome Resources brings together a number of Model Organism databases to pool knowledge and tools. Major returning nucleic acid databases include miRDB and miRTarBase. Databases for protein sequence analysis include CDD, DisProt and ELM, alongside no fewer than four newcomers covering proteins involved in liquid-liquid phase separation. In metabolism and signaling, Pathway Commons, Reactome and Metabolights all contribute papers. PATRIC and MicroScope update in microbial genomes while human and model organism genomics resources include Ensembl, Ensembl genomes and UCSC Genome Browser. Immune-related proteins are covered by updates from IPD-IMGT/HLA and AFND, as well as newcomers VDJbase and OGRDB. Drug design is catered for by updates from the IUPHAR/BPS Guide to Pharmacology and the Therapeutic Target Database. The entire Database Issue is freely available online on the Nucleic Acids Research website (https://academic.oup.com/nar). The NAR online Molecular Biology Database Collection has been revised, updating 305 entries, adding 65 new resources and eliminating 125 discontinued URLs; so bringing the current total to 1637 databases. It is available at http://www.oxfordjournals.org/nar/database/c/.
Collapse
Affiliation(s)
- Daniel J Rigden
- Institute of Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, UK
| | | |
Collapse
|
18
|
Fornes O, Castro-Mondragon JA, Khan A, van der Lee R, Zhang X, Richmond PA, Modi BP, Correard S, Gheorghe M, Baranašić D, Santana-Garcia W, Tan G, Chèneby J, Ballester B, Parcy F, Sandelin A, Lenhard B, Wasserman WW, Mathelier A. JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res 2020; 48:D87-D92. [PMID: 31701148 PMCID: PMC7145627 DOI: 10.1093/nar/gkz1001] [Citation(s) in RCA: 803] [Impact Index Per Article: 160.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Revised: 10/15/2019] [Accepted: 10/16/2019] [Indexed: 02/07/2023] Open
Abstract
JASPAR (http://jaspar.genereg.net) is an open-access database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) for TFs across multiple species in six taxonomic groups. In this 8th release of JASPAR, the CORE collection has been expanded with 245 new PFMs (169 for vertebrates, 42 for plants, 17 for nematodes, 10 for insects, and 7 for fungi), and 156 PFMs were updated (125 for vertebrates, 28 for plants and 3 for insects). These new profiles represent an 18% expansion compared to the previous release. JASPAR 2020 comes with a novel collection of unvalidated TF-binding profiles for which our curators did not find orthogonal supporting evidence in the literature. This collection has a dedicated web form to engage the community in the curation of unvalidated TF-binding profiles. Moreover, we created a Q&A forum to ease the communication between the user community and JASPAR curators. Finally, we updated the genomic tracks, inference tool, and TF-binding profile similarity clusters. All the data is available through the JASPAR website, its associated RESTful API, and through the JASPAR2020 R/Bioconductor package.
Collapse
Affiliation(s)
- Oriol Fornes
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - Jaime A Castro-Mondragon
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Aziz Khan
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Robin van der Lee
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - Xi Zhang
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - Phillip A Richmond
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - Bhavi P Modi
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - Solenne Correard
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - Marius Gheorghe
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Damir Baranašić
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London W12 0NN, UK
- Computational Regulatory Genomics, MRC London Institute of Medical Sciences, London W120NN, UK
| | - Walter Santana-Garcia
- Institut de Biologie de l’ENS (IBENS), Département de biologie, École normale supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - Ge Tan
- Functional Genomics Centre Zurich, ETH Zurich, Zurich, Switzerland
| | | | | | - François Parcy
- CNRS, Univ. Grenoble Alpes, CEA, INRA, IRIG-LPCV, 38000 Grenoble, France
| | - Albin Sandelin
- The Bioinformatics Centre, Department of Biology and Biotech Research & Innovation Centre, University of Copenhagen, DK2200 Copenhagen N, Denmark
| | - Boris Lenhard
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London W12 0NN, UK
- Computational Regulatory Genomics, MRC London Institute of Medical Sciences, London W120NN, UK
- Sars International Centre for Marine Molecular Biology, University of Bergen, N-5008 Bergen, Norway
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, BC V5Z 4H4, Canada
| | - Anthony Mathelier
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
- Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital Radiumhospitalet, 0310 Oslo, Norway
| |
Collapse
|
19
|
Kribelbauer JF, Lu XJ, Rohs R, Mann RS, Bussemaker HJ. Toward a Mechanistic Understanding of DNA Methylation Readout by Transcription Factors. J Mol Biol 2019:S0022-2836(19)30617-5. [PMID: 31689433 DOI: 10.1016/j.jmb.2019.10.021] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Revised: 10/23/2019] [Accepted: 10/24/2019] [Indexed: 01/09/2023]
Abstract
Epigenetic DNA modification impacts gene expression, but the underlying molecular mechanisms are only partly understood. Adding a methyl group to a cytosine base locally modifies the structural features of DNA in multiple ways, which may change the interaction with DNA-binding transcription factors (TFs) and trigger a cascade of downstream molecular events. Cells can be probed using various functional genomics assays, but it is difficult to disentangle the confounded effects of DNA modification on TF binding, chromatin accessibility, intranuclear variation in local TF concentration, and rate of transcription. Here we discuss how high-throughput in vitro profiling of protein-DNA interactions has enabled comprehensive characterization and quantification of the methylation sensitivity of TFs. Despite the limited structural data for DNA containing methylated cytosine, automated analysis of structural information in the Protein Data Bank (PDB) shows how 5-methylcytosine (5mC) can be recognized in various ways by amino acid side chains. We discuss how a context-dependent effect of methylation on DNA groove geometry can affect DNA binding by homeodomain proteins and how principled modeling of ChIP-seq data can overcome the confounding that makes the interpretation of in vivo data challenging. The emerging picture is that epigenetic modifications affect TF binding in a highly context-specific manner, with a direction and effect size that depend critically on their position within the TF binding site and the amino acid sequence of the TF. With this improved mechanistic knowledge, we have come closer to understanding how cells use DNA modification to acquire, retain, and change their identity.
Collapse
Affiliation(s)
- Judith F Kribelbauer
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA; Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, USA; Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Xiang-Jun Lu
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | - Remo Rohs
- Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA; Department of Chemistry, University of Southern California, Los Angeles, CA 90089, USA; Department of Physics & Astronomy, University of Southern California, Los Angeles, CA 90089, USA; Department of Computer Science, University of Southern California, Los Angeles, CA 90089, USA
| | - Richard S Mann
- Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, USA; Department of Systems Biology, Columbia University, New York, NY 10032, USA; Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA; Department of Neuroscience, Columbia University, New York, NY 10027, USA
| | - Harmen J Bussemaker
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA; Department of Systems Biology, Columbia University, New York, NY 10032, USA.
| |
Collapse
|