1
|
Kock KH, Kimes PK, Gisselbrecht SS, Inukai S, Phanor SK, Anderson JT, Ramakrishnan G, Lipper CH, Song D, Kurland JV, Rogers JM, Jeong R, Blacklow SC, Irizarry RA, Bulyk ML. DNA binding analysis of rare variants in homeodomains reveals homeodomain specificity-determining residues. Nat Commun 2024; 15:3110. [PMID: 38600112 PMCID: PMC11006913 DOI: 10.1038/s41467-024-47396-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 03/29/2024] [Indexed: 04/12/2024] Open
Abstract
Homeodomains (HDs) are the second largest class of DNA binding domains (DBDs) among eukaryotic sequence-specific transcription factors (TFs) and are the TF structural class with the largest number of disease-associated mutations in the Human Gene Mutation Database (HGMD). Despite numerous structural studies and large-scale analyses of HD DNA binding specificity, HD-DNA recognition is still not fully understood. Here, we analyze 92 human HD mutants, including disease-associated variants and variants of uncertain significance (VUS), for their effects on DNA binding activity. Many of the variants alter DNA binding affinity and/or specificity. Detailed biochemical analysis and structural modeling identifies 14 previously unknown specificity-determining positions, 5 of which do not contact DNA. The same missense substitution at analogous positions within different HDs often exhibits different effects on DNA binding activity. Variant effect prediction tools perform moderately well in distinguishing variants with altered DNA binding affinity, but poorly in identifying those with altered binding specificity. Our results highlight the need for biochemical assays of TF coding variants and prioritize dozens of variants for further investigations into their pathogenicity and the development of clinical diagnostics and precision therapies.
Collapse
Affiliation(s)
- Kian Hong Kock
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
- Program in Biological and Biomedical Sciences, Harvard University, Cambridge, MA, USA
| | - Patrick K Kimes
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Stephen S Gisselbrecht
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
| | - Sachi Inukai
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
| | - Sabrina K Phanor
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
| | - James T Anderson
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
| | - Gayatri Ramakrishnan
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
- Boston Bangalore Biosciences Beginnings Program, Harvard University, Cambridge, MA, USA
| | - Colin H Lipper
- Department of Biological Chemistry and Molecular Pharmacology, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Department of Cancer Biology, Dana Farber Cancer Institute, Boston, MA, USA
| | - Dongyuan Song
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Jesse V Kurland
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
| | - Julia M Rogers
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
- Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, MA, USA
| | - Raehoon Jeong
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
- Bioinformatics and Integrative Genomics Graduate Program, Harvard University, Cambridge, MA, USA
| | - Stephen C Blacklow
- Program in Biological and Biomedical Sciences, Harvard University, Cambridge, MA, USA
- Department of Biological Chemistry and Molecular Pharmacology, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Department of Cancer Biology, Dana Farber Cancer Institute, Boston, MA, USA
- Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, MA, USA
| | - Rafael A Irizarry
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Martha L Bulyk
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA.
- Program in Biological and Biomedical Sciences, Harvard University, Cambridge, MA, USA.
- Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, MA, USA.
- Bioinformatics and Integrative Genomics Graduate Program, Harvard University, Cambridge, MA, USA.
- Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
2
|
Mielko Z, Zhang Y, Sahay H, Liu Y, Schaich MA, Schnable B, Morrison AM, Burdinski D, Adar S, Pufall M, Van Houten B, Gordân R, Afek A. UV irradiation remodels the specificity landscape of transcription factors. Proc Natl Acad Sci U S A 2023; 120:e2217422120. [PMID: 36888663 PMCID: PMC10089200 DOI: 10.1073/pnas.2217422120] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 02/09/2023] [Indexed: 03/09/2023] Open
Abstract
Somatic mutations are highly enriched at transcription factor (TF) binding sites, with the strongest trend being observed for ultraviolet light (UV)-induced mutations in melanomas. One of the main mechanisms proposed for this hypermutation pattern is the inefficient repair of UV lesions within TF-binding sites, caused by competition between TFs bound to these lesions and the DNA repair proteins that must recognize the lesions to initiate repair. However, TF binding to UV-irradiated DNA is poorly characterized, and it is unclear whether TFs maintain specificity for their DNA sites after UV exposure. We developed UV-Bind, a high-throughput approach to investigate the impact of UV irradiation on protein-DNA binding specificity. We applied UV-Bind to ten TFs from eight structural families, and found that UV lesions significantly altered the DNA-binding preferences of all the TFs tested. The main effect was a decrease in binding specificity, but the precise effects and their magnitude differ across factors. Importantly, we found that despite the overall reduction in DNA-binding specificity in the presence of UV lesions, TFs can still compete with repair proteins for lesion recognition, in a manner consistent with their specificity for UV-irradiated DNA. In addition, for a subset of TFs, we identified a surprising but reproducible effect at certain nonconsensus DNA sequences, where UV irradiation leads to a high increase in the level of TF binding. These changes in DNA-binding specificity after UV irradiation, at both consensus and nonconsensus sites, have important implications for the regulatory and mutagenic roles of TFs in the cell.
Collapse
Affiliation(s)
- Zachery Mielko
- Program in Genetics and Genomics, Duke University School of Medicine, Durham, NC 27708
- Center for Genomic and Computational Biology, Duke University School of Medicine, Durham, NC 27708
- Department of Computer Science, Duke University, Durham, NC 27708
| | - Yuning Zhang
- Center for Genomic and Computational Biology, Duke University School of Medicine, Durham, NC 27708
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27708
| | - Harshit Sahay
- Center for Genomic and Computational Biology, Duke University School of Medicine, Durham, NC 27708
- Program in Computational Biology and Bioinformatics, Duke University School of Medicine, Durham NC 27708
| | - Yiling Liu
- Center for Genomic and Computational Biology, Duke University School of Medicine, Durham, NC 27708
- Program in Computational Biology and Bioinformatics, Duke University School of Medicine, Durham NC 27708
| | - Matthew A Schaich
- Department of Pharmacology and Chemical Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213
- UPMC-Hillman Cancer Center, Pittsburgh, PA 15213
| | - Brittani Schnable
- UPMC-Hillman Cancer Center, Pittsburgh, PA 15213
- Molecular Genetics and Developmental Biology Graduate Program, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213
| | - Abigail M Morrison
- Department of Biochemistry and Molecular Biology, Carver College of Medicine, University of Iowa, Iowa City, IA 52242
| | - Debbie Burdinski
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Sheera Adar
- Department of Microbiology and Molecular Genetics, The Institute for Medical Research Israel-Canada, The Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | - Miles Pufall
- Department of Biochemistry and Molecular Biology, Carver College of Medicine, University of Iowa, Iowa City, IA 52242
- Holden Comprehensive Cancer Center, University of Iowa, Iowa City, IA 52242
| | - Bennett Van Houten
- Program in Computational Biology and Bioinformatics, Duke University School of Medicine, Durham NC 27708
- Department of Pharmacology and Chemical Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213
- UPMC-Hillman Cancer Center, Pittsburgh, PA 15213
- Molecular Biophysics and Structural Biology Program, University of Pittsburgh, Pittsburgh, PA 15213
| | - Raluca Gordân
- Department of Computer Science, Duke University, Durham, NC 27708
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC 27708
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC 27708
| | - Ariel Afek
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| |
Collapse
|
3
|
Ji YT, Xiu Z, Chen CH, Wang Y, Yang JX, Sui JJ, Jiang SJ, Wang P, Yue SY, Zhang QQ, Jin JL, Wang GS, Wei QQ, Wei B, Wang J, Zhang HL, Zhang QY, Liu J, Liu CJ, Jian JB, Qu CQ. Long read sequencing of Toona sinensis (A. Juss) Roem: A chromosome-level reference genome for the family Meliaceae. Mol Ecol Resour 2021; 21:1243-1255. [PMID: 33421343 DOI: 10.1111/1755-0998.13318] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Revised: 12/21/2020] [Accepted: 01/05/2021] [Indexed: 11/30/2022]
Abstract
Chinese mahogany (Toona sinensis) is a woody plant that is widely cultivated in China and Malaysia. Toona sinensis is important economically, including as a nutritious food source, as material for traditional Chinese medicine and as a high-quality hardwood. However, the absence of a reference genome has hindered in-depth molecular and evolutionary studies of this plant. In this study, we report a high-quality T. sinensis genome assembly, with scaffolds anchored to 28 chromosomes and a total assembled length of 596 Mb (contig N50 = 1.5 Mb and scaffold N50 = 21.5 Mb). A total of 34,345 genes were predicted in the genome after homology-based and de novo annotation analyses. Evolutionary analysis showed that the genomes of T. sinensis and Populus trichocarpa diverged ~99.1-103.1 million years ago, and the T. sinensis genome underwent a recent genome-wide duplication event at ~7.8 million years and one more ancient whole genome duplication event at ~71.5 million years. These results provide a high-quality chromosome-level reference genome for T. sinensis and confirm its evolutionary position at the genomic level. Such information will offer genomic resources to study the molecular mechanism of terpenoid biosynthesis and the formation of flavour compounds, which will further facilitate its molecular breeding. As the first chromosome-level genome assembled in the family Meliaceae, it will provide unique insights into the evolution of members of the Meliaceae.
Collapse
Affiliation(s)
- Yun-Tao Ji
- Engineering Technology Research Center of Anti-aging Chinese Herbal Medicine of Anhui Province, Biology and Food Engineering School, Fuyang Normal University, Fuyang, China
| | - Zhihui Xiu
- BGI Genomics, BGI-Shenzhen, Shenzhen, China
| | | | - Youru Wang
- Hubei Engineering Research Center of Typical Wild Vegetables Breeding and Comprehensive Utilization Technology, Hubei Normal University, Huangshi, China
| | - Jing-Xia Yang
- Engineering Technology Research Center of Anti-aging Chinese Herbal Medicine of Anhui Province, Biology and Food Engineering School, Fuyang Normal University, Fuyang, China
| | - Juan-Juan Sui
- Engineering Technology Research Center of Anti-aging Chinese Herbal Medicine of Anhui Province, Biology and Food Engineering School, Fuyang Normal University, Fuyang, China
| | | | - Ping Wang
- BGI Genomics, BGI-Shenzhen, Shenzhen, China
| | - Shao-Yun Yue
- Engineering Technology Research Center of Anti-aging Chinese Herbal Medicine of Anhui Province, Biology and Food Engineering School, Fuyang Normal University, Fuyang, China
| | | | - Ji-Liang Jin
- Engineering Technology Research Center of Anti-aging Chinese Herbal Medicine of Anhui Province, Biology and Food Engineering School, Fuyang Normal University, Fuyang, China
| | | | | | - Bing Wei
- Engineering Technology Research Center of Anti-aging Chinese Herbal Medicine of Anhui Province, Biology and Food Engineering School, Fuyang Normal University, Fuyang, China
| | - Juan Wang
- Engineering Technology Research Center of Anti-aging Chinese Herbal Medicine of Anhui Province, Biology and Food Engineering School, Fuyang Normal University, Fuyang, China
| | | | - Qiu-Yan Zhang
- Engineering Technology Research Center of Anti-aging Chinese Herbal Medicine of Anhui Province, Biology and Food Engineering School, Fuyang Normal University, Fuyang, China
| | - Jun Liu
- Research Institute of Subtropical Forestry, Chinese Academy of Forestry, Fuyang, China
| | - Chang-Jin Liu
- State Key Laboratory of Food Nutrition and Safety, School of Food Science and Technology, Tianjin University of Science and Technology, Tianjin, China
| | - Jian-Bo Jian
- BGI Genomics, BGI-Shenzhen, Shenzhen, China.,Department of Biotechnology and Biomedicine, Technical University of Denmark, Lyngby, Denmark.,Key Laboratory of Genomics, Ministry of Agriculture, BGI-Shenzhen, Shenzhen, China
| | - Chang-Qing Qu
- Engineering Technology Research Center of Anti-aging Chinese Herbal Medicine of Anhui Province, Biology and Food Engineering School, Fuyang Normal University, Fuyang, China
| |
Collapse
|
4
|
He Z, Zhao T, Yin Z, Liu J, Cheng Y, Xu J. The phytochrome-interacting transcription factor CsPIF8 contributes to cold tolerance in citrus by regulating superoxide dismutase expression. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2020; 298:110584. [PMID: 32771144 DOI: 10.1016/j.plantsci.2020.110584] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Accepted: 06/25/2020] [Indexed: 05/28/2023]
Abstract
As one of the subtropical and tropical fruit trees, Citrus sinensis is sensitive to cold stress. However, most transcription factors (TFs) that regulate cold tolerance in citrus have not yet been reported. A phytochrome-interacting transcription factor (PIF) gene (CsPIF8) in citrus was significantly upregulated under cold stress. Overexpression of CsPIF8 increased cold tolerance in transgenic tomato plants and grapefruit callus, whereas virus-induced gene silencing-mediated suppression of PIF8 increased cold sensitivity in seedlings of Poncirus trifoliata. Superoxide dismutase (SOD) reduces the superoxide anion (O2-) level to enhance cold tolerance in plants. Chromatin immunoprecipitation combined with high-throughput sequencing, yeast one hybrid, electrophoretic mobility shift and dual luciferase assays showed that CsPIF8 directly bound the E-box (CANNTG) of CsSOD promoter and activated the promoter of CsSOD. Furthermore, the expression level of CsSOD and CsSOD activity were significantly increased, whereas the level of O2- was significantly reduced in the transgenic lines. The Poncirus trifoliata seedlings with VIGS-mediated suppression of PIF8 exhibited the opposite effects. These results have shown that CsPIF8 improved cold tolerance in citrus through regulating the expression level of SOD and SOD activity. These findings may provide novel insights into the regulation of PIF8 in the response to cold stress in citrus.
Collapse
Affiliation(s)
- Zhenyu He
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), College of Horticulture and Forestry Science, Huazhong Agricultural University, Wuhan 430070, China
| | - Tiantian Zhao
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), College of Horticulture and Forestry Science, Huazhong Agricultural University, Wuhan 430070, China
| | - Zhaoping Yin
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), College of Horticulture and Forestry Science, Huazhong Agricultural University, Wuhan 430070, China
| | - Jihong Liu
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), College of Horticulture and Forestry Science, Huazhong Agricultural University, Wuhan 430070, China
| | - Yunjiang Cheng
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), College of Horticulture and Forestry Science, Huazhong Agricultural University, Wuhan 430070, China
| | - Juan Xu
- Key Laboratory of Horticultural Plant Biology (Ministry of Education), College of Horticulture and Forestry Science, Huazhong Agricultural University, Wuhan 430070, China.
| |
Collapse
|
5
|
Specificity landscapes unmask submaximal binding site preferences of transcription factors. Proc Natl Acad Sci U S A 2018; 115:E10586-E10595. [PMID: 30341220 DOI: 10.1073/pnas.1811431115] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
We have developed Differential Specificity and Energy Landscape (DiSEL) analysis to comprehensively compare DNA-protein interactomes (DPIs) obtained by high-throughput experimental platforms and cutting edge computational methods. While high-affinity DNA binding sites are identified by most methods, DiSEL uncovered nuanced sequence preferences displayed by homologous transcription factors. Pairwise analysis of 726 DPIs uncovered homolog-specific differences at moderate- to low-affinity binding sites (submaximal sites). DiSEL analysis of variants of 41 transcription factors revealed that many disease-causing mutations result in allele-specific changes in binding site preferences. We focused on a set of highly homologous factors that have different biological roles but "read" DNA using identical amino acid side chains. Rather than direct readout, our results indicate that DNA noncontacting side chains allosterically contribute to sculpt distinct sequence preferences among closely related members of transcription factor families.
Collapse
|
6
|
Liu Q, Onal P, Datta RR, Rogers JM, Schmidt-Ott U, Bulyk ML, Small S, Thornton JW. Ancient mechanisms for the evolution of the bicoid homeodomain's function in fly development. eLife 2018; 7:e34594. [PMID: 30298815 PMCID: PMC6177261 DOI: 10.7554/elife.34594] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2017] [Accepted: 07/28/2018] [Indexed: 12/14/2022] Open
Abstract
The ancient mechanisms that caused developmental gene regulatory networks to diversify among distantly related taxa are not well understood. Here we use ancestral protein reconstruction, biochemical experiments, and developmental assays of transgenic animals carrying reconstructed ancestral genes to investigate how the transcription factor Bicoid (Bcd) evolved its central role in anterior-posterior patterning in flies. We show that most of Bcd's derived functions are attributable to evolutionary changes within its homeodomain (HD) during a phylogenetic interval >140 million years ago. A single substitution from this period (Q50K) accounts almost entirely for the evolution of Bcd's derived DNA specificity in vitro. In transgenic embryos expressing the reconstructed ancestral HD, however, Q50K confers activation of only a few of Bcd's transcriptional targets and yields a very partial rescue of anterior development. Adding a second historical substitution (M54R) confers regulation of additional Bcd targets and further rescues anterior development. These results indicate that two epistatically interacting mutations played a major role in the evolution of Bcd's controlling regulatory role in early development. They also show how ancestral sequence reconstruction can be combined with in vivo characterization of transgenic animals to illuminate the historical mechanisms of developmental evolution.
Collapse
Affiliation(s)
- Qinwen Liu
- Department of Ecology and EvolutionUniversity of ChicagoChicagoUnited States
| | - Pinar Onal
- Department of BiologyNew York UniversityNew YorkUnited States
| | - Rhea R Datta
- Department of BiologyNew York UniversityNew YorkUnited States
| | - Julia M Rogers
- Committee on Higher Degrees in BiophysicsHarvard UniversityCambridgeUnited States
- Division of Genetics, Department of MedicineBrigham and Women’s Hospital and Harvard Medical SchoolBostonUnited States
| | - Urs Schmidt-Ott
- Department of Organismal Biology and AnatomyUniversity of ChicagoChicagoUnited States
| | - Martha L Bulyk
- Committee on Higher Degrees in BiophysicsHarvard UniversityCambridgeUnited States
- Division of Genetics, Department of MedicineBrigham and Women’s Hospital and Harvard Medical SchoolBostonUnited States
- Department of PathologyBrigham and Women’s Hospital and Harvard Medical SchoolBostonUnited States
| | - Stephen Small
- Department of BiologyNew York UniversityNew YorkUnited States
| | - Joseph W Thornton
- Department of Ecology and EvolutionUniversity of ChicagoChicagoUnited States
- Department of Human GeneticsUniversity of ChicagoChicagoUnited States
| |
Collapse
|
7
|
Liu N, Hargreaves VV, Zhu Q, Kurland JV, Hong J, Kim W, Sher F, Macias-Trevino C, Rogers JM, Kurita R, Nakamura Y, Yuan GC, Bauer DE, Xu J, Bulyk ML, Orkin SH. Direct Promoter Repression by BCL11A Controls the Fetal to Adult Hemoglobin Switch. Cell 2018; 173:430-442.e17. [PMID: 29606353 DOI: 10.1016/j.cell.2018.03.016] [Citation(s) in RCA: 297] [Impact Index Per Article: 49.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Revised: 01/16/2018] [Accepted: 03/06/2018] [Indexed: 01/06/2023]
Abstract
Fetal hemoglobin (HbF, α2γ2) level is genetically controlled and modifies severity of adult hemoglobin (HbA, α2β2) disorders, sickle cell disease, and β-thalassemia. Common genetic variation affects expression of BCL11A, a regulator of HbF silencing. To uncover how BCL11A supports the developmental switch from γ- to β- globin, we use a functional assay and protein binding microarray to establish a requirement for a zinc-finger cluster in BCL11A in repression and identify a preferred DNA recognition sequence. This motif appears in embryonic and fetal-expressed globin promoters and is duplicated in γ-globin promoters. The more distal of the duplicated motifs is mutated in individuals with hereditary persistence of HbF. Using the CUT&RUN approach to map protein binding sites in erythroid cells, we demonstrate BCL11A occupancy preferentially at the distal motif, which can be disrupted by editing the promoter. Our findings reveal that direct γ-globin gene promoter repression by BCL11A underlies hemoglobin switching.
Collapse
Affiliation(s)
- Nan Liu
- Cancer and Blood Disorders Center, Dana Farber Cancer Institute and Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Victoria V Hargreaves
- Cancer and Blood Disorders Center, Dana Farber Cancer Institute and Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Qian Zhu
- Department of Biostatistics and Computational Biology, Dana Farber Cancer Institute and Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Jesse V Kurland
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Jiyoung Hong
- Cancer and Blood Disorders Center, Dana Farber Cancer Institute and Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Woojin Kim
- Cancer and Blood Disorders Center, Dana Farber Cancer Institute and Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Falak Sher
- Cancer and Blood Disorders Center, Dana Farber Cancer Institute and Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Claudio Macias-Trevino
- Cancer and Blood Disorders Center, Dana Farber Cancer Institute and Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Julia M Rogers
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA; Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, MA, USA
| | - Ryo Kurita
- Cell Engineering Division, RIKEN Bioresource Center, Tsukuba, Japan
| | - Yukio Nakamura
- Cell Engineering Division, RIKEN Bioresource Center, Tsukuba, Japan
| | - Guo-Cheng Yuan
- Department of Biostatistics and Computational Biology, Dana Farber Cancer Institute and Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Daniel E Bauer
- Cancer and Blood Disorders Center, Dana Farber Cancer Institute and Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Jian Xu
- Children's Medical Center Research Institute, Department of Pediatrics, University of Texas at Southwestern Medical Center, Dallas, TX, USA
| | - Martha L Bulyk
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA; Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, MA, USA; Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Stuart H Orkin
- Cancer and Blood Disorders Center, Dana Farber Cancer Institute and Boston Children's Hospital, Harvard Medical School, Boston, MA, USA; Howard Hughes Medical Institute, Boston, MA, USA.
| |
Collapse
|
8
|
Ruan S, Stormo GD. Comparison of discriminative motif optimization using matrix and DNA shape-based models. BMC Bioinformatics 2018; 19:86. [PMID: 29510689 PMCID: PMC5840810 DOI: 10.1186/s12859-018-2104-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2017] [Accepted: 03/01/2018] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Transcription factor (TF) binding site specificity is commonly represented by some form of matrix model in which the positions in the binding site are assumed to contribute independently to the site's activity. The independence assumption is known to be an approximation, often a good one but sometimes poor. Alternative approaches have been developed that use k-mers (DNA "words" of length k) to account for the non-independence, and more recently DNA structural parameters have been incorporated into the models. ChIP-seq data are often used to assess the discriminatory power of motifs and to compare different models. However, to measure the improvement due to using more complex models, one must compare to optimized matrix models. RESULTS We describe a program "Discriminative Additive Model Optimization" (DAMO) that uses positive and negative examples, as in ChIP-seq data, and finds the additive position weight matrix (PWM) that maximizes the Area Under the Receiver Operating Characteristic Curve (AUROC). We compare to a recent study where structural parameters, serving as features in a gradient boosting classifier algorithm, are shown to improve the AUROC over JASPAR position frequency matrices (PFMs). In agreement with the previous results, we find that adding structural parameters gives the largest improvement, but most of the gain can be obtained by an optimized PWM and nearly all of the gain can be obtained with a di-nucleotide extension to the PWM. CONCLUSION To appropriately compare different models for TF bind sites, optimized models must be used. PWMs and their extensions are good representations of binding specificity for most TFs, and more complex models, including the incorporation of DNA shape features and gradient boosting classifiers, provide only moderate improvements for a few TFs.
Collapse
Affiliation(s)
- Shuxiang Ruan
- Department of Genetics and Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, 63110 USA
| | - Gary D. Stormo
- Department of Genetics and Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, 63110 USA
| |
Collapse
|
9
|
Barrera LA, Vedenko A, Kurland JV, Rogers JM, Gisselbrecht SS, Rossin EJ, Woodard J, Mariani L, Kock KH, Inukai S, Siggers T, Shokri L, Gordân R, Sahni N, Cotsapas C, Hao T, Yi S, Kellis M, Daly MJ, Vidal M, Hill DE, Bulyk ML. Survey of variation in human transcription factors reveals prevalent DNA binding changes. Science 2016; 351:1450-1454. [PMID: 27013732 PMCID: PMC4825693 DOI: 10.1126/science.aad2257] [Citation(s) in RCA: 100] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2015] [Accepted: 02/18/2016] [Indexed: 12/13/2022]
Abstract
Sequencing of exomes and genomes has revealed abundant genetic variation affecting the coding sequences of human transcription factors (TFs), but the consequences of such variation remain largely unexplored. We developed a computational, structure-based approach to evaluate TF variants for their impact on DNA binding activity and used universal protein-binding microarrays to assay sequence-specific DNA binding activity across 41 reference and 117 variant alleles found in individuals of diverse ancestries and families with Mendelian diseases. We found 77 variants in 28 genes that affect DNA binding affinity or specificity and identified thousands of rare alleles likely to alter the DNA binding activity of human sequence-specific TFs. Our results suggest that most individuals have unique repertoires of TF DNA binding activities, which may contribute to phenotypic variation.
Collapse
Affiliation(s)
- Luis A. Barrera
- Division of Genetics, Department of Medicine, Brigham and
Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
- Committee on Higher Degrees in Biophysics, Harvard
University, Cambridge, MA 02138, USA
- Harvard-MIT Division of Health Sciences and Technology,
Harvard Medical School, Boston, MA 02115, USA
- Computer Science and Artificial Intelligence Laboratory,
Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Anastasia Vedenko
- Division of Genetics, Department of Medicine, Brigham and
Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Jesse V. Kurland
- Division of Genetics, Department of Medicine, Brigham and
Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Julia M. Rogers
- Division of Genetics, Department of Medicine, Brigham and
Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
- Committee on Higher Degrees in Biophysics, Harvard
University, Cambridge, MA 02138, USA
| | - Stephen S. Gisselbrecht
- Division of Genetics, Department of Medicine, Brigham and
Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Elizabeth J. Rossin
- Harvard-MIT Division of Health Sciences and Technology,
Harvard Medical School, Boston, MA 02115, USA
- Analytic and Translational Genetics Unit, Department of
Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA
02114, USA
- Broad Institute of Harvard and MIT, Cambridge, MA 02139,
USA
| | - Jaie Woodard
- Division of Genetics, Department of Medicine, Brigham and
Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
- Committee on Higher Degrees in Biophysics, Harvard
University, Cambridge, MA 02138, USA
| | - Luca Mariani
- Division of Genetics, Department of Medicine, Brigham and
Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Kian Hong Kock
- Division of Genetics, Department of Medicine, Brigham and
Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
- Program in Biological and Biomedical Sciences, Harvard
University, Cambridge, MA 02138, USA
| | - Sachi Inukai
- Division of Genetics, Department of Medicine, Brigham and
Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Trevor Siggers
- Division of Genetics, Department of Medicine, Brigham and
Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Leila Shokri
- Division of Genetics, Department of Medicine, Brigham and
Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Raluca Gordân
- Division of Genetics, Department of Medicine, Brigham and
Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Nidhi Sahni
- Center for Cancer Systems Biology, Dana-Farber Cancer
Institute, Boston, MA 02215, USA
- Department of Cancer Biology, Dana-Farber Cancer
Institute, Boston, MA 02215, USA and Department of Genetics, Harvard Medical School,
Boston, MA 02115, USA
| | - Chris Cotsapas
- Analytic and Translational Genetics Unit, Department of
Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA
02114, USA
- Broad Institute of Harvard and MIT, Cambridge, MA 02139,
USA
| | - Tong Hao
- Center for Cancer Systems Biology, Dana-Farber Cancer
Institute, Boston, MA 02215, USA
- Department of Cancer Biology, Dana-Farber Cancer
Institute, Boston, MA 02215, USA and Department of Genetics, Harvard Medical School,
Boston, MA 02115, USA
| | - Song Yi
- Center for Cancer Systems Biology, Dana-Farber Cancer
Institute, Boston, MA 02215, USA
- Department of Cancer Biology, Dana-Farber Cancer
Institute, Boston, MA 02215, USA and Department of Genetics, Harvard Medical School,
Boston, MA 02115, USA
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Laboratory,
Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Broad Institute of Harvard and MIT, Cambridge, MA 02139,
USA
| | - Mark J. Daly
- Analytic and Translational Genetics Unit, Department of
Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA
02114, USA
- Broad Institute of Harvard and MIT, Cambridge, MA 02139,
USA
- Center for Human Genetics Research and Center for
Computational and Integrative Biology, Massachusetts General Hospital, Boston, MA
02114, USA
| | - Marc Vidal
- Center for Cancer Systems Biology, Dana-Farber Cancer
Institute, Boston, MA 02215, USA
- Department of Cancer Biology, Dana-Farber Cancer
Institute, Boston, MA 02215, USA and Department of Genetics, Harvard Medical School,
Boston, MA 02115, USA
| | - David E. Hill
- Center for Cancer Systems Biology, Dana-Farber Cancer
Institute, Boston, MA 02215, USA
- Department of Cancer Biology, Dana-Farber Cancer
Institute, Boston, MA 02215, USA and Department of Genetics, Harvard Medical School,
Boston, MA 02115, USA
| | - Martha L. Bulyk
- Division of Genetics, Department of Medicine, Brigham and
Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
- Committee on Higher Degrees in Biophysics, Harvard
University, Cambridge, MA 02138, USA
- Harvard-MIT Division of Health Sciences and Technology,
Harvard Medical School, Boston, MA 02115, USA
- Broad Institute of Harvard and MIT, Cambridge, MA 02139,
USA
- Program in Biological and Biomedical Sciences, Harvard
University, Cambridge, MA 02138, USA
- Center for Cancer Systems Biology, Dana-Farber Cancer
Institute, Boston, MA 02215, USA
- Department of Pathology, Brigham and Women's Hospital
and Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
10
|
Lindemose S, Jensen MK, Van de Velde J, O'Shea C, Heyndrickx KS, Workman CT, Vandepoele K, Skriver K, De Masi F. A DNA-binding-site landscape and regulatory network analysis for NAC transcription factors in Arabidopsis thaliana. Nucleic Acids Res 2014; 42:7681-93. [PMID: 24914054 PMCID: PMC4081100 DOI: 10.1093/nar/gku502] [Citation(s) in RCA: 63] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Target gene identification for transcription factors is a prerequisite for the systems wide understanding of organismal behaviour. NAM-ATAF1/2-CUC2 (NAC) transcription factors are amongst the largest transcription factor families in plants, yet limited data exist from unbiased approaches to resolve the DNA-binding preferences of individual members. Here, we present a TF-target gene identification workflow based on the integration of novel protein binding microarray data with gene expression and multi-species promoter sequence conservation to identify the DNA-binding specificities and the gene regulatory networks of 12 NAC transcription factors. Our data offer specific single-base resolution fingerprints for most TFs studied and indicate that NAC DNA-binding specificities might be predicted from their DNA-binding domain's sequence. The developed methodology, including the application of complementary functional genomics filters, makes it possible to translate, for each TF, protein binding microarray data into a set of high-quality target genes. With this approach, we confirm NAC target genes reported from independent in vivo analyses. We emphasize that candidate target gene sets together with the workflow associated with functional modules offer a strong resource to unravel the regulatory potential of NAC genes and that this workflow could be used to study other families of transcription factors.
Collapse
Affiliation(s)
- Søren Lindemose
- Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Michael K Jensen
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, DK-2970 Hørsholm, Denmark
| | - Jan Van de Velde
- Department of Plant Systems Biology, VIB, 9052 Ghent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium
| | - Charlotte O'Shea
- Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Ken S Heyndrickx
- Department of Plant Systems Biology, VIB, 9052 Ghent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium
| | - Christopher T Workman
- Center for Biological Sequence Analysis, Institute for Systems Biology, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| | - Klaas Vandepoele
- Department of Plant Systems Biology, VIB, 9052 Ghent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium
| | - Karen Skriver
- Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Federico De Masi
- Center for Biological Sequence Analysis, Institute for Systems Biology, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark
| |
Collapse
|
11
|
Orenstein Y, Shamir R. A comparative analysis of transcription factor binding models learned from PBM, HT-SELEX and ChIP data. Nucleic Acids Res 2014; 42:e63. [PMID: 24500199 PMCID: PMC4005680 DOI: 10.1093/nar/gku117] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Understanding gene regulation is a key challenge in today's biology. The new technologies of protein-binding microarrays (PBMs) and high-throughput SELEX (HT-SELEX) allow measurement of the binding intensities of one transcription factor (TF) to numerous synthetic double-stranded DNA sequences in a single experiment. Recently, Jolma et al. reported the results of 547 HT-SELEX experiments covering human and mouse TFs. Because 162 of these TFs were also covered by PBM technology, for the first time, a large-scale comparison between implementations of these two in vitro technologies is possible. Here we assessed the similarities and differences between binding models, represented as position weight matrices, inferred from PBM and HT-SELEX, and also measured how well these models predict in vivo binding. Our results show that HT-SELEX- and PBM-derived models agree for most TFs. For some TFs, the HT-SELEX-derived models are longer versions of the PBM-derived models, whereas for other TFs, the HT-SELEX models match the secondary PBM-derived models. Remarkably, PBM-based 8-mer ranking is more accurate than that of HT-SELEX, but models derived from HT-SELEX predict in vivo binding better. In addition, we reveal several biases in HT-SELEX data including nucleotide frequency bias, enrichment of C-rich k-mers and oligos and underrepresentation of palindromes.
Collapse
Affiliation(s)
- Yaron Orenstein
- Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv 69978, Israel
| | | |
Collapse
|
12
|
Anvar SY, Khachatryan L, Vermaat M, van Galen M, Pulyakhina I, Ariyurek Y, Kraaijeveld K, den Dunnen JT, de Knijff P, ’t Hoen PAC, Laros JFJ. Determining the quality and complexity of next-generation sequencing data without a reference genome. Genome Biol 2014; 15:555. [PMID: 25514851 PMCID: PMC4298064 DOI: 10.1186/s13059-014-0555-3] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2014] [Accepted: 11/27/2014] [Indexed: 01/22/2023] Open
Abstract
We describe an open-source kPAL package that facilitates an alignment-free assessment of the quality and comparability of sequencing datasets by analyzing k-mer frequencies. We show that kPAL can detect technical artefacts such as high duplication rates, library chimeras, contamination and differences in library preparation protocols. kPAL also successfully captures the complexity and diversity of microbiomes and provides a powerful means to study changes in microbial communities. Together, these features make kPAL an attractive and broadly applicable tool to determine the quality and comparability of sequence libraries even in the absence of a reference sequence. kPAL is freely available at https://github.com/LUMC/kPAL webcite.
Collapse
Affiliation(s)
- Seyed Yahya Anvar
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
- />Leiden Genome Technology Center, Leiden University Medical Center, Leiden, The Netherlands
| | - Lusine Khachatryan
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Martijn Vermaat
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Michiel van Galen
- />Leiden Genome Technology Center, Leiden University Medical Center, Leiden, The Netherlands
| | - Irina Pulyakhina
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Yavuz Ariyurek
- />Leiden Genome Technology Center, Leiden University Medical Center, Leiden, The Netherlands
| | - Ken Kraaijeveld
- />Leiden Genome Technology Center, Leiden University Medical Center, Leiden, The Netherlands
- />Department of Ecological Science, VU University Amsterdam, Amsterdam, The Netherlands
| | - Johan T den Dunnen
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
- />Leiden Genome Technology Center, Leiden University Medical Center, Leiden, The Netherlands
- />Department of Clinical Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Peter de Knijff
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Peter AC ’t Hoen
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Jeroen FJ Laros
- />Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
- />Leiden Genome Technology Center, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
13
|
Zhong S, He X, Bar-Joseph Z. Predicting tissue specific transcription factor binding sites. BMC Genomics 2013; 14:796. [PMID: 24238150 PMCID: PMC3898213 DOI: 10.1186/1471-2164-14-796] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2013] [Accepted: 11/06/2013] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Studies of gene regulation often utilize genome-wide predictions of transcription factor (TF) binding sites. Most existing prediction methods are based on sequence information alone, ignoring biological contexts such as developmental stages and tissue types. Experimental methods to study in vivo binding, including ChIP-chip and ChIP-seq, can only study one transcription factor in a single cell type and under a specific condition in each experiment, and therefore cannot scale to determine the full set of regulatory interactions in mammalian transcriptional regulatory networks. RESULTS We developed a new computational approach, PIPES, for predicting tissue-specific TF binding. PIPES integrates in vitro protein binding microarrays (PBMs), sequence conservation and tissue-specific epigenetic (DNase I hypersensitivity) information. We demonstrate that PIPES improves over existing methods on distinguishing between in vivo bound and unbound sequences using ChIP-seq data for 11 mouse TFs. In addition, our predictions are in good agreement with current knowledge of tissue-specific TF regulation. CONCLUSIONS We provide a systematic map of computationally predicted tissue-specific binding targets for 284 mouse TFs across 55 tissue/cell types. Such comprehensive resource is useful for researchers studying gene regulation.
Collapse
Affiliation(s)
| | | | - Ziv Bar-Joseph
- Lane Center for Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.
| |
Collapse
|