1
|
Jain S, Bakolitsa C, Brenner SE, Radivojac P, Moult J, Repo S, Hoskins RA, Andreoletti G, Barsky D, Chellapan A, Chu H, Dabbiru N, Kollipara NK, Ly M, Neumann AJ, Pal LR, Odell E, Pandey G, Peters-Petrulewicz RC, Srinivasan R, Yee SF, Yeleswarapu SJ, Zuhl M, Adebali O, Patra A, Beer MA, Hosur R, Peng J, Bernard BM, Berry M, Dong S, Boyle AP, Adhikari A, Chen J, Hu Z, Wang R, Wang Y, Miller M, Wang Y, Bromberg Y, Turina P, Capriotti E, Han JJ, Ozturk K, Carter H, Babbi G, Bovo S, Di Lena P, Martelli PL, Savojardo C, Casadio R, Cline MS, De Baets G, Bonache S, Díez O, Gutiérrez-Enríquez S, Fernández A, Montalban G, Ootes L, Özkan S, Padilla N, Riera C, De la Cruz X, Diekhans M, Huwe PJ, Wei Q, Xu Q, Dunbrack RL, Gotea V, Elnitski L, Margolin G, Fariselli P, Kulakovskiy IV, Makeev VJ, Penzar DD, Vorontsov IE, Favorov AV, Forman JR, Hasenahuer M, Fornasari MS, Parisi G, Avsec Z, Çelik MH, Nguyen TYD, Gagneur J, Shi FY, Edwards MD, Guo Y, Tian K, Zeng H, Gifford DK, Göke J, Zaucha J, Gough J, Ritchie GRS, Frankish A, Mudge JM, Harrow J, Young EL, Yu Y, Huff CD, Murakami K, Nagai Y, Imanishi T, Mungall CJ, Jacobsen JOB, Kim D, Jeong CS, Jones DT, Li MJ, Guthrie VB, Bhattacharya R, Chen YC, Douville C, Fan J, Kim D, Masica D, Niknafs N, Sengupta S, Tokheim C, Turner TN, Yeo HTG, Karchin R, Shin S, Welch R, Keles S, Li Y, Kellis M, Corbi-Verge C, Strokach AV, Kim PM, Klein TE, Mohan R, Sinnott-Armstrong NA, Wainberg M, Kundaje A, Gonzaludo N, Mak ACY, Chhibber A, Lam HYK, Dahary D, Fishilevich S, Lancet D, Lee I, Bachman B, Katsonis P, Lua RC, Wilson SJ, Lichtarge O, Bhat RR, Sundaram L, Viswanath V, Bellazzi R, Nicora G, Rizzo E, Limongelli I, Mezlini AM, Chang R, Kim S, Lai C, O’Connor R, Topper S, van den Akker J, Zhou AY, Zimmer AD, Mishne G, Bergquist TR, Breese MR, Guerrero RF, Jiang Y, Kiga N, Li B, Mort M, Pagel KA, Pejaver V, Stamboulian MH, Thusberg J, Mooney SD, Teerakulkittipong N, Cao C, Kundu K, Yin Y, Yu CH, Kleyman M, Lin CF, Stackpole M, Mount SM, Eraslan G, Mueller NS, Naito T, Rao AR, Azaria JR, Brodie A, Ofran Y, Garg A, Pal D, Hawkins-Hooker A, Kenlay H, Reid J, Mucaki EJ, Rogan PK, Schwarz JM, Searls DB, Lee GR, Seok C, Krämer A, Shah S, Huang CV, Kirsch JF, Shatsky M, Cao Y, Chen H, Karimi M, Moronfoye O, Sun Y, Shen Y, Shigeta R, Ford CT, Nodzak C, Uppal A, Shi X, Joseph T, Kotte S, Rana S, Rao A, Saipradeep VG, Sivadasan N, Sunderam U, Stanke M, Su A, Adzhubey I, Jordan DM, Sunyaev S, Rousseau F, Schymkowitz J, Van Durme J, Tavtigian SV, Carraro M, Giollo M, Tosatto SCE, Adato O, Carmel L, Cohen NE, Fenesh T, Holtzer T, Juven-Gershon T, Unger R, Niroula A, Olatubosun A, Väliaho J, Yang Y, Vihinen M, Wahl ME, Chang B, Chong KC, Hu I, Sun R, Wu WKK, Xia X, Zee BC, Wang MH, Wang M, Wu C, Lu Y, Chen K, Yang Y, Yates CM, Kreimer A, Yan Z, Yosef N, Zhao H, Wei Z, Yao Z, Zhou F, Folkman L, Zhou Y, Daneshjou R, Altman RB, Inoue F, Ahituv N, Arkin AP, Lovisa F, Bonvini P, Bowdin S, Gianni S, Mantuano E, Minicozzi V, Novak L, Pasquo A, Pastore A, Petrosino M, Puglisi R, Toto A, Veneziano L, Chiaraluce R, Ball MP, Bobe JR, Church GM, Consalvi V, Cooper DN, Buckley BA, Sheridan MB, Cutting GR, Scaini MC, Cygan KJ, Fredericks AM, Glidden DT, Neil C, Rhine CL, Fairbrother WG, Alontaga AY, Fenton AW, Matreyek KA, Starita LM, Fowler DM, Löscher BS, Franke A, Adamson SI, Graveley BR, Gray JW, Malloy MJ, Kane JP, Kousi M, Katsanis N, Schubach M, Kircher M, Mak ACY, Tang PLF, Kwok PY, Lathrop RH, Clark WT, Yu GK, LeBowitz JH, Benedicenti F, Bettella E, Bigoni S, Cesca F, Mammi I, Marino-Buslje C, Milani D, Peron A, Polli R, Sartori S, Stanzial F, Toldo I, Turolla L, Aspromonte MC, Bellini M, Leonardi E, Liu X, Marshall C, McCombie WR, Elefanti L, Menin C, Meyn MS, Murgia A, Nadeau KCY, Neuhausen SL, Nussbaum RL, Pirooznia M, Potash JB, Dimster-Denk DF, Rine JD, Sanford JR, Snyder M, Cote AG, Sun S, Verby MW, Weile J, Roth FP, Tewhey R, Sabeti PC, Campagna J, Refaat MM, Wojciak J, Grubb S, Schmitt N, Shendure J, Spurdle AB, Stavropoulos DJ, Walton NA, Zandi PP, Ziv E, Burke W, Chen F, Carr LR, Martinez S, Paik J, Harris-Wai J, Yarborough M, Fullerton SM, Koenig BA, McInnes G, Shigaki D, Chandonia JM, Furutsuki M, Kasak L, Yu C, Chen R, Friedberg I, Getz GA, Cong Q, Kinch LN, Zhang J, Grishin NV, Voskanian A, Kann MG, Tran E, Ioannidis NM, Hunter JM, Udani R, Cai B, Morgan AA, Sokolov A, Stuart JM, Minervini G, Monzon AM, Batzoglou S, Butte AJ, Greenblatt MS, Hart RK, Hernandez R, Hubbard TJP, Kahn S, O’Donnell-Luria A, Ng PC, Shon J, Veltman J, Zook JM. CAGI, the Critical Assessment of Genome Interpretation, establishes progress and prospects for computational genetic variant interpretation methods. Genome Biol 2024; 25:53. [PMID: 38389099 PMCID: PMC10882881 DOI: 10.1186/s13059-023-03113-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2023] [Accepted: 11/17/2023] [Indexed: 02/24/2024] Open
Abstract
BACKGROUND The Critical Assessment of Genome Interpretation (CAGI) aims to advance the state-of-the-art for computational prediction of genetic variant impact, particularly where relevant to disease. The five complete editions of the CAGI community experiment comprised 50 challenges, in which participants made blind predictions of phenotypes from genetic data, and these were evaluated by independent assessors. RESULTS Performance was particularly strong for clinical pathogenic variants, including some difficult-to-diagnose cases, and extends to interpretation of cancer-related variants. Missense variant interpretation methods were able to estimate biochemical effects with increasing accuracy. Assessment of methods for regulatory variants and complex trait disease risk was less definitive and indicates performance potentially suitable for auxiliary use in the clinic. CONCLUSIONS Results show that while current methods are imperfect, they have major utility for research and clinical applications. Emerging methods and increasingly large, robust datasets for training and assessment promise further progress ahead.
Collapse
|
2
|
Vorontsov IE, Eliseeva IA, Zinkevich A, Nikonov M, Abramov S, Boytsov A, Kamenets V, Kasianova A, Kolmykov S, Yevshin I, Favorov A, Medvedeva YA, Jolma A, Kolpakov F, Makeev V, Kulakovskiy I. HOCOMOCO in 2024: a rebuild of the curated collection of binding models for human and mouse transcription factors. Nucleic Acids Res 2024; 52:D154-D163. [PMID: 37971293 PMCID: PMC10767914 DOI: 10.1093/nar/gkad1077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 10/17/2023] [Accepted: 10/26/2023] [Indexed: 11/19/2023] Open
Abstract
We present a major update of the HOCOMOCO collection that provides DNA binding specificity patterns of 949 human transcription factors and 720 mouse orthologs. To make this release, we performed motif discovery in peak sets that originated from 14 183 ChIP-Seq experiments and reads from 2554 HT-SELEX experiments yielding more than 400 thousand candidate motifs. The candidate motifs were annotated according to their similarity to known motifs and the hierarchy of DNA-binding domains of the respective transcription factors. Next, the motifs underwent human expert curation to stratify distinct motif subtypes and remove non-informative patterns and common artifacts. Finally, the curated subset of 100 thousand motifs was supplied to the automated benchmarking to select the best-performing motifs for each transcription factor. The resulting HOCOMOCO v12 core collection contains 1443 verified position weight matrices, including distinct subtypes of DNA binding motifs for particular transcription factors. In addition to the core collection, HOCOMOCO v12 provides motif sets optimized for the recognition of binding sites in vivo and in vitro, and for annotation of regulatory sequence variants. HOCOMOCO is available at https://hocomoco12.autosome.org and https://hocomoco.autosome.org.
Collapse
Affiliation(s)
- Ilya E Vorontsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Irina A Eliseeva
- Institute of Protein Research, Russian Academy of Sciences, 142290 Pushchino, Russia
| | - Arsenii Zinkevich
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119991 Moscow, Russia
| | - Mikhail Nikonov
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119991 Moscow, Russia
| | - Sergey Abramov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
- Altius Institute for Biomedical Sciences, 98121 Seattle, WA, USA
| | - Alexandr Boytsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
- Altius Institute for Biomedical Sciences, 98121 Seattle, WA, USA
| | - Vasily Kamenets
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
- Moscow Institute of Physics and Technology, 141700 Dolgoprudny, Russia
- Institute of Biochemistry and Genetics of the Ufa Federal Research Centre of the Russian Academy of Sciences, 450054 Ufa, Russia
| | - Alexandra Kasianova
- Skolkovo Institute of Science and Technology, 121205 Moscow, Russia
- Institute for Information Transmission Problems of the Russian Academy of Sciences, 127051 Moscow, Russia
| | - Semyon Kolmykov
- Department of Computational Biology, Sirius University of Science and Technology, 354340 Sirius, Krasnodar region, Russia
| | | | - Alexander Favorov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
- Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Yulia A Medvedeva
- Research Center of Biotechnology RAS, Russian Academy of Sciences, 119071 Moscow, Russia
| | - Arttu Jolma
- Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Fedor Kolpakov
- Department of Computational Biology, Sirius University of Science and Technology, 354340 Sirius, Krasnodar region, Russia
- Bioinformatics Laboratory, Federal Research Center for Information and Computational Technologies, 630090 Novosibirsk, Russia
| | - Vsevolod J Makeev
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
- Moscow Institute of Physics and Technology, 141700 Dolgoprudny, Russia
- Institute of Biochemistry and Genetics of the Ufa Federal Research Centre of the Russian Academy of Sciences, 450054 Ufa, Russia
| | - Ivan V Kulakovskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
- Institute of Protein Research, Russian Academy of Sciences, 142290 Pushchino, Russia
- Laboratory of Regulatory Genomics, Institute of Fundamental Medicine and Biology, Kazan Federal University, 420008 Kazan, Russia
| |
Collapse
|
3
|
Marakulina D, Vorontsov IE, Kulakovskiy IV, Lennartsson A, Drabløs F, Medvedeva Y. EpiFactors 2022: expansion and enhancement of a curated database of human epigenetic factors and complexes. Nucleic Acids Res 2022; 51:D564-D570. [PMID: 36350659 PMCID: PMC9825597 DOI: 10.1093/nar/gkac989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Revised: 09/30/2022] [Accepted: 10/24/2022] [Indexed: 11/11/2022] Open
Abstract
We present an update of EpiFactors, a manually curated database providing information about epigenetic regulators, their complexes, targets, and products which is openly accessible at http://epifactors.autosome.org. An updated version of the EpiFactors contains information on 902 proteins, including 101 histones and protamines, and, as a main update, a newly curated collection of 124 lncRNAs involved in epigenetic regulation. The amount of publications concerning the role of lncRNA in epigenetics is rapidly growing. Yet, the resource that compiles, integrates, organizes, and presents curated information on lncRNAs in epigenetics is missing. EpiFactors fills this gap and provides data on epigenetic regulators in an accessible and user-friendly form. For 820 of the genes in EpiFactors, we include expression estimates across multiple cell types assessed by CAGE-Seq in the FANTOM5 project. In addition, the updated EpiFactors contains information on 73 protein complexes involved in epigenetic regulation. Our resource is practical for a wide range of users, including biologists, bioinformaticians and molecular/systems biologists.
Collapse
Affiliation(s)
- Daria Marakulina
- Department of Biological and Medical Physics, Moscow Institute of Physics and Technology, 141701, Dolgoprudny, Moscow Region, Russia
| | - Ilya E Vorontsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Ivan V Kulakovskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia,Institute of Protein Research, Russian Academy of Sciences, Pushchino 142290, Russia
| | - Andreas Lennartsson
- Department of Biosciences and Nutrition, NEO, Karolinska Institutet, 14157, Huddinge, Sweden
| | - Finn Drabløs
- Department of Clinical and Molecular Medicine, NTNU - Norwegian University of Science and Technology, PO Box 8905, NO-7491 Trondheim, Norway
| | | |
Collapse
|
4
|
Ershova AS, Eliseeva IA, Nikonov OS, Fedorova AD, Vorontsov IE, Papatsenko D, Kulakovskiy IV. Enhanced C/EBP binding to G·T mismatches facilitates fixation of CpG mutations in cancer and adult stem cells. Cell Rep 2021; 36:109365. [PMID: 34260924 DOI: 10.1016/j.celrep.2021.109365] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
|
5
|
Ershova AS, Eliseeva IA, Nikonov OS, Fedorova AD, Vorontsov IE, Papatsenko D, Kulakovskiy IV. Enhanced C/EBP binding to G·T mismatches facilitates fixation of CpG mutations in cancer and adult stem cells. Cell Rep 2021; 35:109221. [PMID: 34107262 DOI: 10.1016/j.celrep.2021.109221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2020] [Revised: 03/21/2021] [Accepted: 05/13/2021] [Indexed: 10/21/2022] Open
Abstract
Somatic mutations in regulatory sites of human stem cells affect cell identity or cause malignant transformation. By mining the human genome for co-occurrence of mutations and transcription factor binding sites, we show that C/EBP binding sites are strongly enriched with [C > T]G mutations in cancer and adult stem cells, which is of special interest because C/EBPs regulate cell fate and differentiation. In vitro protein-DNA binding assay and structural modeling of the CEBPB-DNA complex show that the G·T mismatch in the core CG dinucleotide strongly enhances affinity of the binding site. We conclude that enhanced binding of C/EBPs shields CpG·TpG mismatches from DNA repair, leading to selective accumulation of [C > T]G mutations and consequent deterioration of the binding sites. This mechanism of targeted mutagenesis highlights the effect of a mutational process on certain regulatory sites and reveals the molecular basis of putative regulatory alterations in stem cells.
Collapse
Affiliation(s)
- Anna S Ershova
- Belozersky Institute of Physical and Chemical Biology, Lomonosov Moscow State University, Moscow 119992, Russia.
| | - Irina A Eliseeva
- Institute of Protein Research, Russian Academy of Sciences, Pushchino 142290, Russia
| | - Oleg S Nikonov
- Institute of Protein Research, Russian Academy of Sciences, Pushchino 142290, Russia
| | - Alla D Fedorova
- School of Biochemistry and Cell Biology, University College Cork, Cork T12 YN60, Ireland
| | - Ilya E Vorontsov
- Institute of Protein Research, Russian Academy of Sciences, Pushchino 142290, Russia; Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia
| | - Dmitry Papatsenko
- Center for Data-Intensive Biomedicine and Biotechnology, Skolkovo Institute of Science and Technology, Moscow 143026, Russia
| | - Ivan V Kulakovskiy
- Institute of Protein Research, Russian Academy of Sciences, Pushchino 142290, Russia; Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia; Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russia.
| |
Collapse
|
6
|
Abramov S, Boytsov A, Bykova D, Penzar DD, Yevshin I, Kolmykov SK, Fridman MV, Favorov AV, Vorontsov IE, Baulin E, Kolpakov F, Makeev VJ, Kulakovskiy IV. Landscape of allele-specific transcription factor binding in the human genome. Nat Commun 2021; 12:2751. [PMID: 33980847 PMCID: PMC8115691 DOI: 10.1038/s41467-021-23007-0] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 04/12/2021] [Indexed: 12/28/2022] Open
Abstract
Sequence variants in gene regulatory regions alter gene expression and contribute to phenotypes of individual cells and the whole organism, including disease susceptibility and progression. Single-nucleotide variants in enhancers or promoters may affect gene transcription by altering transcription factor binding sites. Differential transcription factor binding in heterozygous genomic loci provides a natural source of information on such regulatory variants. We present a novel approach to call the allele-specific transcription factor binding events at single-nucleotide variants in ChIP-Seq data, taking into account the joint contribution of aneuploidy and local copy number variation, that is estimated directly from variant calls. We have conducted a meta-analysis of more than 7 thousand ChIP-Seq experiments and assembled the database of allele-specific binding events listing more than half a million entries at nearly 270 thousand single-nucleotide polymorphisms for several hundred human transcription factors and cell types. These polymorphisms are enriched for associations with phenotypes of medical relevance and often overlap eQTLs, making candidates for causality by linking variants with molecular mechanisms. Specifically, there is a special class of switching sites, where different transcription factors preferably bind alternative alleles, thus revealing allele-specific rewiring of molecular circuitry. Single-nucleotide variants in enhancers or promoters may affect gene transcription by altering transcription factor binding sites. Here the authors present a meta-analysis empowered by a new statistical method covering thousands of ChIP-Seq experiments resulting in the identification of more than 500 thousand allele-specific binding (ASB) events in the human genome.
Collapse
Affiliation(s)
- Sergey Abramov
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia.,Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia.,Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | - Alexandr Boytsov
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia.,Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia.,Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | - Daria Bykova
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia
| | - Dmitry D Penzar
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia.,Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia.,Moscow Institute of Physics and Technology, Dolgoprudny, Russia.,Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia
| | - Ivan Yevshin
- Federal Research Center for Information and Computational Technologies, Novosibirsk, Russia.,Sirius University of Science and Technology, Sochi, Russia.,BIOSOFT.RU LLC, Novosibirsk, Russia
| | - Semyon K Kolmykov
- Federal Research Center for Information and Computational Technologies, Novosibirsk, Russia.,Sirius University of Science and Technology, Sochi, Russia.,BIOSOFT.RU LLC, Novosibirsk, Russia
| | - Marina V Fridman
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Alexander V Favorov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia.,Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Ilya E Vorontsov
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia.,Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Eugene Baulin
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia.,Institute of Mathematical Problems of Biology RAS-The Branch of Keldysh Institute of Applied Mathematics of Russian Academy of Sciences, Pushchino, Russia
| | - Fedor Kolpakov
- Federal Research Center for Information and Computational Technologies, Novosibirsk, Russia.,Sirius University of Science and Technology, Sochi, Russia.,BIOSOFT.RU LLC, Novosibirsk, Russia
| | - Vsevolod J Makeev
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia. .,Moscow Institute of Physics and Technology, Dolgoprudny, Russia. .,State Research Institute of Genetics and Selection of Industrial Microorganisms of the National Research Center Kurchatov Institute, Moscow, Russia. .,Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia.
| | - Ivan V Kulakovskiy
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia. .,Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia. .,Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia.
| |
Collapse
|
7
|
Sethi S, Vorontsov IE, Kulakovskiy IV, Greenaway S, Williams J, Makeev VJ, Brown SDM, Simon MM, Mallon AM. A holistic view of mouse enhancer architectures reveals analogous pleiotropic effects and correlation with human disease. BMC Genomics 2020; 21:754. [PMID: 33138777 PMCID: PMC7607678 DOI: 10.1186/s12864-020-07109-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Accepted: 09/29/2020] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND Efforts to elucidate the function of enhancers in vivo are underway but their vast numbers alongside differing enhancer architectures make it difficult to determine their impact on gene activity. By systematically annotating multiple mouse tissues with super- and typical-enhancers, we have explored their relationship with gene function and phenotype. RESULTS Though super-enhancers drive high total- and tissue-specific expression of their associated genes, we find that typical-enhancers also contribute heavily to the tissue-specific expression landscape on account of their large numbers in the genome. Unexpectedly, we demonstrate that both enhancer types are preferentially associated with relevant 'tissue-type' phenotypes and exhibit no difference in phenotype effect size or pleiotropy. Modelling regulatory data alongside molecular data, we built a predictive model to infer gene-phenotype associations and use this model to predict potentially novel disease-associated genes. CONCLUSION Overall our findings reveal that differing enhancer architectures have a similar impact on mammalian phenotypes whilst harbouring differing cellular and expression effects. Together, our results systematically characterise enhancers with predicted phenotypic traits endorsing the role for both types of enhancers in human disease and disorders.
Collapse
Affiliation(s)
- Siddharth Sethi
- Mammalian Genetics Unit, MRC Harwell Institute, Oxfordshire, OX11 0RD, UK
| | - Ilya E Vorontsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Gubkina 3, Moscow, 119991, Russia
- Institute of Protein Research, Russian Academy of Sciences, Institutskaya 4, Pushchino, Moscow Region, 142290, Russia
| | - Ivan V Kulakovskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Gubkina 3, Moscow, 119991, Russia
- Institute of Protein Research, Russian Academy of Sciences, Institutskaya 4, Pushchino, Moscow Region, 142290, Russia
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilova 32, Moscow, 119991, Russia
| | - Simon Greenaway
- Mammalian Genetics Unit, MRC Harwell Institute, Oxfordshire, OX11 0RD, UK
| | - John Williams
- Mammalian Genetics Unit, MRC Harwell Institute, Oxfordshire, OX11 0RD, UK
- Institute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, Birmingham, B15 2TH, UK
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, B15 2TT, UK
| | - Vsevolod J Makeev
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Gubkina 3, Moscow, 119991, Russia
- Institute of Protein Research, Russian Academy of Sciences, Institutskaya 4, Pushchino, Moscow Region, 142290, Russia
- Moscow Institute of Physics and Technology, 9 Institutskiy per., Dolgoprudny, Moscow Region, 141700, Russia
| | - Steve D M Brown
- Mammalian Genetics Unit, MRC Harwell Institute, Oxfordshire, OX11 0RD, UK
| | - Michelle M Simon
- Mammalian Genetics Unit, MRC Harwell Institute, Oxfordshire, OX11 0RD, UK.
| | - Ann-Marie Mallon
- Mammalian Genetics Unit, MRC Harwell Institute, Oxfordshire, OX11 0RD, UK.
| |
Collapse
|
8
|
Ramilowski JA, Yip CW, Agrawal S, Chang JC, Ciani Y, Kulakovskiy IV, Mendez M, Ooi JLC, Ouyang JF, Parkinson N, Petri A, Roos L, Severin J, Yasuzawa K, Abugessaisa I, Akalin A, Antonov IV, Arner E, Bonetti A, Bono H, Borsari B, Brombacher F, Cameron CJ, Cannistraci CV, Cardenas R, Cardon M, Chang H, Dostie J, Ducoli L, Favorov A, Fort A, Garrido D, Gil N, Gimenez J, Guler R, Handoko L, Harshbarger J, Hasegawa A, Hasegawa Y, Hashimoto K, Hayatsu N, Heutink P, Hirose T, Imada EL, Itoh M, Kaczkowski B, Kanhere A, Kawabata E, Kawaji H, Kawashima T, Kelly ST, Kojima M, Kondo N, Koseki H, Kouno T, Kratz A, Kurowska-Stolarska M, Kwon ATJ, Leek J, Lennartsson A, Lizio M, López-Redondo F, Luginbühl J, Maeda S, Makeev VJ, Marchionni L, Medvedeva YA, Minoda A, Müller F, Muñoz-Aguirre M, Murata M, Nishiyori H, Nitta KR, Noguchi S, Noro Y, Nurtdinov R, Okazaki Y, Orlando V, Paquette D, Parr CJ, Rackham OJ, Rizzu P, Martinez DFS, Sandelin A, Sanjana P, Semple CA, Shibayama Y, Sivaraman DM, Suzuki T, Szumowski SC, Tagami M, Taylor MS, Terao C, Thodberg M, Thongjuea S, Tripathi V, Ulitsky I, Verardo R, Vorontsov IE, Yamamoto C, Young RS, Baillie JK, Forrest AR, Guigó R, Hoffman MM, Hon CC, Kasukawa T, Kauppinen S, Kere J, Lenhard B, Schneider C, Suzuki H, Yagi K, de Hoon MJ, Shin JW, Carninci P. Corrigendum: Functional annotation of human long noncoding RNAs via molecular phenotyping. Genome Res 2020. [DOI: 10.1101/gr.270330.120] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
9
|
Ramilowski JA, Yip CW, Agrawal S, Chang JC, Ciani Y, Kulakovskiy IV, Mendez M, Ooi JLC, Ouyang JF, Parkinson N, Petri A, Roos L, Severin J, Yasuzawa K, Abugessaisa I, Akalin A, Antonov IV, Arner E, Bonetti A, Bono H, Borsari B, Brombacher F, Cameron CJF, Cannistraci CV, Cardenas R, Cardon M, Chang H, Dostie J, Ducoli L, Favorov A, Fort A, Garrido D, Gil N, Gimenez J, Guler R, Handoko L, Harshbarger J, Hasegawa A, Hasegawa Y, Hashimoto K, Hayatsu N, Heutink P, Hirose T, Imada EL, Itoh M, Kaczkowski B, Kanhere A, Kawabata E, Kawaji H, Kawashima T, Kelly ST, Kojima M, Kondo N, Koseki H, Kouno T, Kratz A, Kurowska-Stolarska M, Kwon ATJ, Leek J, Lennartsson A, Lizio M, López-Redondo F, Luginbühl J, Maeda S, Makeev VJ, Marchionni L, Medvedeva YA, Minoda A, Müller F, Muñoz-Aguirre M, Murata M, Nishiyori H, Nitta KR, Noguchi S, Noro Y, Nurtdinov R, Okazaki Y, Orlando V, Paquette D, Parr CJC, Rackham OJL, Rizzu P, Sánchez Martinez DF, Sandelin A, Sanjana P, Semple CAM, Shibayama Y, Sivaraman DM, Suzuki T, Szumowski SC, Tagami M, Taylor MS, Terao C, Thodberg M, Thongjuea S, Tripathi V, Ulitsky I, Verardo R, Vorontsov IE, Yamamoto C, Young RS, Baillie JK, Forrest ARR, Guigó R, Hoffman MM, Hon CC, Kasukawa T, Kauppinen S, Kere J, Lenhard B, Schneider C, Suzuki H, Yagi K, de Hoon MJL, Shin JW, Carninci P. Functional annotation of human long noncoding RNAs via molecular phenotyping. Genome Res 2020; 30:1060-1072. [PMID: 32718982 PMCID: PMC7397864 DOI: 10.1101/gr.254219.119] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Accepted: 06/24/2020] [Indexed: 12/12/2022]
Abstract
Long noncoding RNAs (lncRNAs) constitute the majority of transcripts in the mammalian genomes, and yet, their functions remain largely unknown. As part of the FANTOM6 project, we systematically knocked down the expression of 285 lncRNAs in human dermal fibroblasts and quantified cellular growth, morphological changes, and transcriptomic responses using Capped Analysis of Gene Expression (CAGE). Antisense oligonucleotides targeting the same lncRNAs exhibited global concordance, and the molecular phenotype, measured by CAGE, recapitulated the observed cellular phenotypes while providing additional insights on the affected genes and pathways. Here, we disseminate the largest-to-date lncRNA knockdown data set with molecular phenotyping (over 1000 CAGE deep-sequencing libraries) for further exploration and highlight functional roles for ZNF213-AS1 and lnc-KHDC3L-2.
Collapse
Affiliation(s)
- Jordan A Ramilowski
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Chi Wai Yip
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Saumya Agrawal
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Jen-Chien Chang
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Yari Ciani
- Laboratorio Nazionale Consorzio Interuniversitario Biotecnologie (CIB), Trieste 34127, Italy
| | - Ivan V Kulakovskiy
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russia.,Institute of Protein Research, Russian Academy of Sciences, Pushchino 142290, Russia
| | - Mickaël Mendez
- Department of Computer Science, University of Toronto, Toronto, Ontario M5S 1A1, Canada
| | | | - John F Ouyang
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore Medical School, Singapore 169857, Singapore
| | - Nick Parkinson
- Roslin Institute, University of Edinburgh, Edinburgh EH25 9RG, United Kingdom
| | - Andreas Petri
- Center for RNA Medicine, Department of Clinical Medicine, Aalborg University, Copenhagen 9220, Denmark
| | - Leonie Roos
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London W12 0NN, United Kingdom.,Computational Regulatory Genomics, MRC London Institute of Medical Sciences, London W12 0NN, United Kingdom
| | - Jessica Severin
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Kayoko Yasuzawa
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Imad Abugessaisa
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Altuna Akalin
- Berlin Institute for Medical Systems Biology, Max Delbrük Center for Molecular Medicine in the Helmholtz Association, Berlin 13125, Germany
| | - Ivan V Antonov
- Institute of Bioengineering, Research Center of Biotechnology, Russian Academy of Sciences, Moscow 117312, Russia
| | - Erik Arner
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Alessandro Bonetti
- RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Hidemasa Bono
- Graduate School of Integrated Sciences for Life, Hiroshima University, Higashi-Hiroshima City 739-0046, Japan
| | - Beatrice Borsari
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08003, Spain
| | - Frank Brombacher
- International Centre for Genetic Engineering and Biotechnology (ICGEB), University of Cape Town, Cape Town 7925, South Africa.,Institute of Infectious Diseases and Molecular Medicine (IDM), Department of Pathology, Division of Immunology and South African Medical Research Council (SAMRC) Immunology of Infectious Diseases, Faculty of Health Sciences, University of Cape Town, Cape Town 7925, South Africa
| | - Christopher JF Cameron
- School of Computer Science, McGill University, Montréal, Québec H3G 1Y6, Canada.,Department of Biochemistry, Rosalind and Morris Goodman Cancer Research Center, McGill University, Montréal, Québec H3G 1Y6, Canada.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06510, USA
| | - Carlo Vittorio Cannistraci
- Biomedical Cybernetics Group, Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering (CMCB), Center for Systems Biology Dresden (CSBD), Cluster of Excellence Physics of Life (PoL), Department of Physics, Technische Universität Dresden, Dresden 01062, Germany.,Center for Complex Network Intelligence (CCNI) at the Tsinghua Laboratory of Brain and Intelligence (THBI), Department of Bioengineering, Tsinghua University, Beijing 100084, China
| | - Ryan Cardenas
- Institute of Cancer and Genomic Sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham B15 2TT, United Kingdom
| | - Melissa Cardon
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan
| | - Howard Chang
- Center for Personal Dynamic Regulome, Stanford University, Stanford, California 94305, USA
| | - Josée Dostie
- Department of Biochemistry, Rosalind and Morris Goodman Cancer Research Center, McGill University, Montréal, Québec H3G 1Y6, Canada
| | - Luca Ducoli
- Institute of Pharmaceutical Sciences, Swiss Federal Institute of Technology, Zurich 8093, Switzerland
| | - Alexander Favorov
- Department of Computational Systems Biology, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia.,Department of Oncology, Johns Hopkins University, Baltimore, Maryland 21287, USA
| | - Alexandre Fort
- RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Diego Garrido
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08003, Spain
| | - Noa Gil
- Department of Biological Regulation, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Juliette Gimenez
- Epigenetics and Genome Reprogramming Laboratory, IRCCS Fondazione Santa Lucia, Rome 00179, Italy
| | - Reto Guler
- International Centre for Genetic Engineering and Biotechnology (ICGEB), University of Cape Town, Cape Town 7925, South Africa.,Institute of Infectious Diseases and Molecular Medicine (IDM), Department of Pathology, Division of Immunology and South African Medical Research Council (SAMRC) Immunology of Infectious Diseases, Faculty of Health Sciences, University of Cape Town, Cape Town 7925, South Africa
| | - Lusy Handoko
- RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Jayson Harshbarger
- RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Akira Hasegawa
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Yuki Hasegawa
- RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Kosuke Hashimoto
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Norihito Hayatsu
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan
| | - Peter Heutink
- Genome Biology of Neurodegenerative Diseases, German Center for Neurodegenerative Diseases (DZNE), Tübingen 72076, Germany
| | - Tetsuro Hirose
- Graduate School of Frontier Biosciences, Osaka University, Suita 565-0871, Japan
| | - Eddie L Imada
- Department of Oncology, Johns Hopkins University, Baltimore, Maryland 21287, USA
| | - Masayoshi Itoh
- RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Preventive Medicine and Diagnosis Innovation Program (PMI), Saitama 351-0198, Japan
| | - Bogumil Kaczkowski
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Aditi Kanhere
- Institute of Cancer and Genomic Sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham B15 2TT, United Kingdom
| | - Emily Kawabata
- RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Hideya Kawaji
- RIKEN Preventive Medicine and Diagnosis Innovation Program (PMI), Saitama 351-0198, Japan
| | - Tsugumi Kawashima
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - S Thomas Kelly
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan
| | - Miki Kojima
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Naoto Kondo
- RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Haruhiko Koseki
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan
| | - Tsukasa Kouno
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Anton Kratz
- RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Mariola Kurowska-Stolarska
- Institute of Infection, Immunity, and Inflammation, University of Glasgow, Glasgow, Scotland G12 8QQ, United Kingdom
| | - Andrew Tae Jun Kwon
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Jeffrey Leek
- Department of Oncology, Johns Hopkins University, Baltimore, Maryland 21287, USA
| | - Andreas Lennartsson
- Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge 14157, Sweden
| | - Marina Lizio
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Fernando López-Redondo
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Joachim Luginbühl
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Shiori Maeda
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan
| | - Vsevolod J Makeev
- Department of Computational Systems Biology, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia.,Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Luigi Marchionni
- Department of Oncology, Johns Hopkins University, Baltimore, Maryland 21287, USA
| | - Yulia A Medvedeva
- Institute of Bioengineering, Research Center of Biotechnology, Russian Academy of Sciences, Moscow 117312, Russia.,Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Aki Minoda
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Ferenc Müller
- Institute of Cancer and Genomic Sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham B15 2TT, United Kingdom
| | - Manuel Muñoz-Aguirre
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08003, Spain
| | - Mitsuyoshi Murata
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Hiromi Nishiyori
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Kazuhiro R Nitta
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Shuhei Noguchi
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Yukihiko Noro
- RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Ramil Nurtdinov
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08003, Spain
| | - Yasushi Okazaki
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Valerio Orlando
- Biological and Environmental Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Denis Paquette
- Department of Biochemistry, Rosalind and Morris Goodman Cancer Research Center, McGill University, Montréal, Québec H3G 1Y6, Canada
| | - Callum J C Parr
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan
| | - Owen J L Rackham
- Program in Cardiovascular and Metabolic Disorders, Duke-National University of Singapore Medical School, Singapore 169857, Singapore
| | - Patrizia Rizzu
- Genome Biology of Neurodegenerative Diseases, German Center for Neurodegenerative Diseases (DZNE), Tübingen 72076, Germany
| | | | - Albin Sandelin
- Department of Biology and BRIC, University of Copenhagen, Denmark, Copenhagen N DK2200, Denmark
| | - Pillay Sanjana
- Institute of Cancer and Genomic Sciences, College of Medical and Dental Sciences, University of Birmingham, Birmingham B15 2TT, United Kingdom
| | - Colin A M Semple
- MRC Human Genetics Unit, University of Edinburgh, Edinburgh EH4 2XU, United Kingdom
| | - Youtaro Shibayama
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Divya M Sivaraman
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Takahiro Suzuki
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | | | - Michihira Tagami
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Martin S Taylor
- MRC Human Genetics Unit, University of Edinburgh, Edinburgh EH4 2XU, United Kingdom
| | - Chikashi Terao
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan
| | - Malte Thodberg
- Department of Biology and BRIC, University of Copenhagen, Denmark, Copenhagen N DK2200, Denmark
| | - Supat Thongjuea
- RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Vidisha Tripathi
- National Centre for Cell Science, Pune, Maharashtra 411007, India
| | - Igor Ulitsky
- Department of Biological Regulation, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Roberto Verardo
- Laboratorio Nazionale Consorzio Interuniversitario Biotecnologie (CIB), Trieste 34127, Italy
| | - Ilya E Vorontsov
- Department of Computational Systems Biology, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow 119991, Russia
| | - Chinatsu Yamamoto
- RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Robert S Young
- Centre for Global Health Research, Usher Institute, University of Edinburgh, Edinburgh EH8 9AG, United Kingdom
| | - J Kenneth Baillie
- Roslin Institute, University of Edinburgh, Edinburgh EH25 9RG, United Kingdom
| | - Alistair R R Forrest
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan.,Harry Perkins Institute of Medical Research, QEII Medical Centre and Centre for Medical Research, The University of Western Australia, Nedlands, Perth, Western Australia 6009, Australia
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08003, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Catalonia 08002, Spain
| | | | - Chung Chau Hon
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Takeya Kasukawa
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Sakari Kauppinen
- Center for RNA Medicine, Department of Clinical Medicine, Aalborg University, Copenhagen 9220, Denmark
| | - Juha Kere
- Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge 14157, Sweden.,Stem Cells and Metabolism Research Program, University of Helsinki and Folkhälsan Research Center, 00290 Helsinki, Finland
| | - Boris Lenhard
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London W12 0NN, United Kingdom.,Computational Regulatory Genomics, MRC London Institute of Medical Sciences, London W12 0NN, United Kingdom.,Sars International Centre for Marine Molecular Biology, University of Bergen, Bergen N-5008, Norway
| | - Claudio Schneider
- Laboratorio Nazionale Consorzio Interuniversitario Biotecnologie (CIB), Trieste 34127, Italy.,Department of Medicine and Consorzio Interuniversitario Biotecnologie p.zle Kolbe 1 University of Udine, Udine 33100, Italy
| | - Harukazu Suzuki
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Ken Yagi
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Michiel J L de Hoon
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Jay W Shin
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Piero Carninci
- RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.,RIKEN Center for Life Science Technologies, Yokohama, Kanagawa 230-0045, Japan
| |
Collapse
|
10
|
Penzar DD, Zinkevich AO, Vorontsov IE, Sitnik VV, Favorov AV, Makeev VJ, Kulakovskiy IV. What Do Neighbors Tell About You: The Local Context of Cis-Regulatory Modules Complicates Prediction of Regulatory Variants. Front Genet 2019; 10:1078. [PMID: 31737053 PMCID: PMC6834773 DOI: 10.3389/fgene.2019.01078] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2019] [Accepted: 10/09/2019] [Indexed: 02/05/2023] Open
Abstract
Many problems of modern genetics and functional genomics require the assessment of functional effects of sequence variants, including gene expression changes. Machine learning is considered to be a promising approach for solving this task, but its practical applications remain a challenge due to the insufficient volume and diversity of training data. A promising source of valuable data is a saturation mutagenesis massively parallel reporter assay, which quantitatively measures changes in transcription activity caused by sequence variants. Here, we explore the computational predictions of the effects of individual single-nucleotide variants on gene transcription measured in the massively parallel reporter assays, based on the data from the recent "Regulation Saturation" Critical Assessment of Genome Interpretation challenge. We show that the estimated prediction quality strongly depends on the structure of the training and validation data. Particularly, training on the sequence segments located next to the validation data results in the "information leakage" caused by the local context. This information leakage allows reproducing the prediction quality of the best CAGI challenge submissions with a fairly simple machine learning approach, and even obtaining notably better-than-random predictions using irrelevant genomic regions. Validation scenarios preventing such information leakage dramatically reduce the measured prediction quality. The performance at independent regulatory regions entirely excluded from the training set appears to be much lower than needed for practical applications, and even the performance estimation will become reliable only in the future with richer data from multiple reporters. The source code and data are available at https://bitbucket.org/autosomeru_cagi2018/cagi2018_regsat and https://genomeinterpretation.org/content/expression-variants.
Collapse
Affiliation(s)
- Dmitry D. Penzar
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia
- Department of Medical and Biological Physics, Moscow Institute of Physics and Technology (State University), Dolgoprudny, Russia
| | - Arsenii O. Zinkevich
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia
| | - Ilya E. Vorontsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Vasily V. Sitnik
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Alexander V. Favorov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, MD, United States
| | - Vsevolod J. Makeev
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Department of Medical and Biological Physics, Moscow Institute of Physics and Technology (State University), Dolgoprudny, Russia
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Ivan V. Kulakovskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
- Institute of Mathematical Problems of Biology RAS - the Branch of Keldysh Institute of Applied Mathematics of Russian Academy of Sciences, Pushchino, Russia
| |
Collapse
|
11
|
Kulakovskiy IV, Vorontsov IE, Yevshin IS, Sharipov RN, Fedorova AD, Rumynskiy EI, Medvedeva YA, Magana-Mora A, Bajic VB, Papatsenko DA, Kolpakov FA, Makeev VJ. HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis. Nucleic Acids Res 2019; 46:D252-D259. [PMID: 29140464 PMCID: PMC5753240 DOI: 10.1093/nar/gkx1106] [Citation(s) in RCA: 446] [Impact Index Per Article: 89.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Accepted: 10/31/2017] [Indexed: 12/15/2022] Open
Abstract
We present a major update of the HOCOMOCO collection that consists of patterns describing DNA binding specificities for human and mouse transcription factors. In this release, we profited from a nearly doubled volume of published in vivo experiments on transcription factor (TF) binding to expand the repertoire of binding models, replace low-quality models previously based on in vitro data only and cover more than a hundred TFs with previously unknown binding specificities. This was achieved by systematic motif discovery from more than five thousand ChIP-Seq experiments uniformly processed within the BioUML framework with several ChIP-Seq peak calling tools and aggregated in the GTRD database. HOCOMOCO v11 contains binding models for 453 mouse and 680 human transcription factors and includes 1302 mononucleotide and 576 dinucleotide position weight matrices, which describe primary binding preferences of each transcription factor and reliable alternative binding specificities. An interactive interface and bulk downloads are available on the web: http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco11. In this release, we complement HOCOMOCO by MoLoTool (Motif Location Toolbox, http://molotool.autosome.ru) that applies HOCOMOCO models for visualization of binding sites in short DNA sequences.
Collapse
Affiliation(s)
- Ivan V Kulakovskiy
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991, GSP-1, Vavilova 32, Moscow, Russia.,Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, GSP-1, Gubkina 3, Moscow, Russia.,Center for Data-Intensive Biomedicine and Biotechnology, Skolkovo Institute of Science and Technology, 143026 Moscow, Russia
| | - Ilya E Vorontsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, GSP-1, Gubkina 3, Moscow, Russia
| | - Ivan S Yevshin
- BIOSOFT.RU Ltd, 630058, Russkaya 41/1, Novosibirsk, Russia
| | - Ruslan N Sharipov
- BIOSOFT.RU Ltd, 630058, Russkaya 41/1, Novosibirsk, Russia.,Institute of Computational Technologies, Siberian Branch of the Russian Academy of Sciences, 630090, Akad. Rzhanova 6, Novosibirsk, Russia.,Novosibirsk State University, 630090, Pirogova 2, Novosibirsk, Russia
| | - Alla D Fedorova
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, 119234, Leninskiye Gory 1-73, Moscow, Russia
| | - Eugene I Rumynskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, GSP-1, Gubkina 3, Moscow, Russia.,Moscow Institute of Physics and Technology (State University), 141700, 9 Institutskiy per, Dolgoprudny, Russia
| | - Yulia A Medvedeva
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, GSP-1, Gubkina 3, Moscow, Russia.,Moscow Institute of Physics and Technology (State University), 141700, 9 Institutskiy per, Dolgoprudny, Russia.,Institute of Bioengineering, Research Center of Biotechnology of the Russian Academy of Sciences, 119071, 2 Leninsky Ave. 33, Moscow, Russia
| | - Arturo Magana-Mora
- National Institute of Advanced Industrial Science and Technology (AIST), Com. Bio Big-Data Open Innovation Lab. (CBBD-OIL), AIST Tokyo Waterfront Main Bldg. #323, 2-3-26 Aomi, Tokyo 135-0064, Japan.,King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal 23955-6900, Saudi Arabia
| | - Vladimir B Bajic
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal 23955-6900, Saudi Arabia
| | - Dmitry A Papatsenko
- Center for Data-Intensive Biomedicine and Biotechnology, Skolkovo Institute of Science and Technology, 143026 Moscow, Russia
| | - Fedor A Kolpakov
- BIOSOFT.RU Ltd, 630058, Russkaya 41/1, Novosibirsk, Russia.,Institute of Computational Technologies, Siberian Branch of the Russian Academy of Sciences, 630090, Akad. Rzhanova 6, Novosibirsk, Russia
| | - Vsevolod J Makeev
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991, GSP-1, Vavilova 32, Moscow, Russia.,Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, GSP-1, Gubkina 3, Moscow, Russia.,Moscow Institute of Physics and Technology (State University), 141700, 9 Institutskiy per, Dolgoprudny, Russia
| |
Collapse
|
12
|
Vorontsov IE, Fedorova AD, Yevshin IS, Sharipov RN, Kolpakov FA, Makeev VJ, Kulakovskiy IV. Genome-wide map of human and mouse transcription factor binding sites aggregated from ChIP-Seq data. BMC Res Notes 2018; 11:756. [PMID: 30352610 PMCID: PMC6199713 DOI: 10.1186/s13104-018-3856-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Accepted: 10/16/2018] [Indexed: 11/25/2022] Open
Abstract
Objectives Mammalian genomics studies, especially those focusing on transcriptional regulation, require information on genomic locations of regulatory regions, particularly, transcription factor (TF) binding sites. There are plenty of published ChIP-Seq data on in vivo binding of transcription factors in different cell types and conditions. However, handling of thousands of separate data sets is often impractical and it is desirable to have a single global map of genomic regions potentially bound by a particular TF in any of studied cell types and conditions. Data description Here we report human and mouse cistromes, the maps of genomic regions that are routinely identified as TF binding sites, organized by TF. We provide cistromes for 349 mouse and 599 human TFs. Given a TF, its cistrome regions are supported by evidence from several ChIP-Seq experiments or several computational tools, and, as an optional filter, contain occurrences of sequence motifs recognized by the TF. Using the cistrome, we provide an annotation of TF binding sites in the vicinity of human and mouse transcription start sites. This information is useful for selecting potential gene targets of transcription factors and detecting co-regulated genes in differential gene expression data.
Collapse
Affiliation(s)
- Ilya E Vorontsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, GSP-1, Gubkina 3, Moscow, Russia, 119991
| | - Alla D Fedorova
- Vavilov Institute of General Genetics, Russian Academy of Sciences, GSP-1, Gubkina 3, Moscow, Russia, 119991
| | - Ivan S Yevshin
- BIOSOFT.RU Ltd, Russkaya 41/1, Novosibirsk, Russia, 630058
| | - Ruslan N Sharipov
- BIOSOFT.RU Ltd, Russkaya 41/1, Novosibirsk, Russia, 630058.,Institute of Computational Technologies, Siberian Branch of the Russian Academy of Sciences, Akad. Rzhanova 6, Novosibirsk, Russia, 630090.,Novosibirsk State University, Pirogova 2, Novosibirsk, Russia, 630090
| | - Fedor A Kolpakov
- BIOSOFT.RU Ltd, Russkaya 41/1, Novosibirsk, Russia, 630058.,Institute of Computational Technologies, Siberian Branch of the Russian Academy of Sciences, Akad. Rzhanova 6, Novosibirsk, Russia, 630090
| | - Vsevolod J Makeev
- Vavilov Institute of General Genetics, Russian Academy of Sciences, GSP-1, Gubkina 3, Moscow, Russia, 119991.,Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, GSP-1, Vavilova 32, Moscow, Russia, 119991.,Moscow Institute of Physics and Technology (State University), 9 Institutskiy per, Dolgoprudny, Russia, 141700
| | - Ivan V Kulakovskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, GSP-1, Gubkina 3, Moscow, Russia, 119991. .,Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, GSP-1, Vavilova 32, Moscow, Russia, 119991. .,Institute of Mathematical Problems of Biology RAS-the Branch of Keldysh Institute of Applied Mathematics of Russian Academy of Sciences, Vitkevicha 1, Pushchino, Russia, 142290.
| |
Collapse
|
13
|
Afanasyeva MA, Putlyaeva LV, Demin DE, Kulakovskiy IV, Vorontsov IE, Fridman MV, Makeev VJ, Kuprash DV, Schwartz AM. The single nucleotide variant rs12722489 determines differential estrogen receptor binding and enhancer properties of an IL2RA intronic region. PLoS One 2017; 12:e0172681. [PMID: 28234966 PMCID: PMC5325477 DOI: 10.1371/journal.pone.0172681] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2016] [Accepted: 02/08/2017] [Indexed: 12/11/2022] Open
Abstract
We studied functional effect of rs12722489 single nucleotide polymorphism located in the first intron of human IL2RA gene on transcriptional regulation. This polymorphism is associated with multiple autoimmune conditions (rheumatoid arthritis, multiple sclerosis, Crohn's disease, and ulcerative colitis). Analysis in silico suggested significant difference in the affinity of estrogen receptor (ER) binding site between alternative allelic variants, with stronger predicted affinity for the risk (G) allele. Electrophoretic mobility shift assay showed that purified human ERα bound only G variant of a 32-bp genomic sequence containing rs12722489. Chromatin immunoprecipitation demonstrated that endogenous human ERα interacted with rs12722489 genomic region in vivo and DNA pull-down assay confirmed differential allelic binding of amplified 189-bp genomic fragments containing rs12722489 with endogenous human ERα. In a luciferase reporter assay, a kilobase-long genomic segment containing G but not A allele of rs12722489 demonstrated enhancer properties in MT-2 cell line, an HTLV-1 transformed human cell line with a regulatory T cell phenotype.
Collapse
Affiliation(s)
- Marina A. Afanasyeva
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
- * E-mail:
| | - Lidia V. Putlyaeva
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Denis E. Demin
- Moscow Institute of Physics and Technology, Moscow, Russia
| | - Ivan V. Kulakovskiy
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
- Skolkovo Institute of Science and Technology, Skolkovo Innovation Center, Moscow, Russia
| | - Ilya E. Vorontsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Marina V. Fridman
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Vsevolod J. Makeev
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
- Moscow Institute of Physics and Technology, Moscow, Russia
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Dmitry V. Kuprash
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
- Moscow Institute of Physics and Technology, Moscow, Russia
- Faculty of Biology, Lomonosov Moscow State University, Moscow, Russia
| | - Anton M. Schwartz
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| |
Collapse
|
14
|
Schwartz AM, Demin DE, Vorontsov IE, Kasyanov AS, Putlyaeva LV, Tatosyan KA, Kulakovskiy IV, Kuprash DV. Multiple single nucleotide polymorphisms in the first intron of the IL2RA gene affect transcription factor binding and enhancer activity. Gene 2016; 602:50-56. [PMID: 27876533 DOI: 10.1016/j.gene.2016.11.032] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2016] [Revised: 11/01/2016] [Accepted: 11/16/2016] [Indexed: 10/20/2022]
Abstract
IL2RA gene encodes the alpha subunit of a high-affinity receptor for interleukin-2 which is expressed by several distinct populations of lymphocytes involved in autoimmune processes. A large number of polymorphic alleles of the IL2RA locus are associated with the development of various autoimmune diseases. With bioinformatics analysis we the dissected the first intron of the IL2RA gene and selected several single nucleotide polymorphisms (SNPs) that may influence the regulation of the IL2RA gene in cell types relevant to autoimmune pathology. We described five enhancers containing the selected SNPs that stimulated activity of the IL2RA promoter in a cell-type specific manner, and tested the effect of specific SNP alleles on activity of the respective enhancers (E1 to E5, labeled according to the distance to the promoter). The E4 enhancer with minor T variant of rs61839660 SNP demonstrated reduced activity due to disrupted binding of MEF2A/C transcription factors (TFs). Neither rs706778 nor rs706779 SNPs, both associated with a number of autoimmune diseases, had any effect on the activity of the enhancer E2. However, rare variants of several SNPs (rs139767239, rs115133228, rs12722502, rs12722635) genetically linked to either rs706778 and/or rs706779 significantly influenced the activity of E1, E3 and E5 enhancers, presumably by disrupting EBF1, GABPA and ELF1 binding sites.
Collapse
Affiliation(s)
- Anton M Schwartz
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Denis E Demin
- Moscow Institute of Physics and Technology, Department Molecular and Biological Physics, Moscow, Russia
| | - Ilya E Vorontsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Artem S Kasyanov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Lidia V Putlyaeva
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Karina A Tatosyan
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Ivan V Kulakovskiy
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia; Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Dmitry V Kuprash
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia; Moscow Institute of Physics and Technology, Department Molecular and Biological Physics, Moscow, Russia; Faculty of Biology, Lomonosov Moscow State University, Moscow, Russia.
| |
Collapse
|
15
|
Schwartz AM, Putlyaeva LV, Covich M, Klepikova AV, Akulich KA, Vorontsov IE, Korneev KV, Dmitriev SE, Polanovsky OL, Sidorenko SP, Kulakovskiy IV, Kuprash DV. Early B-cell factor 1 (EBF1) is critical for transcriptional control of SLAMF1 gene in human B cells. Biochim Biophys Acta 2016; 1859:1259-68. [PMID: 27424222 DOI: 10.1016/j.bbagrm.2016.07.004] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2016] [Revised: 07/01/2016] [Accepted: 07/12/2016] [Indexed: 10/21/2022]
Abstract
Signaling lymphocytic activation molecule family member 1 (SLAMF1)/CD150 is a co-stimulatory receptor expressed on a variety of hematopoietic cells, in particular on mature lymphocytes activated by specific antigen, costimulation and cytokines. Changes in CD150 expression level have been reported in association with autoimmunity and with B-cell chronic lymphocytic leukemia. We characterized the core promoter for SLAMF1 gene in human B-cell lines and explored binding sites for a number of transcription factors involved in B cell differentiation and activation. Mutations of SP1, STAT6, IRF4, NF-kB, ELF1, TCF3, and SPI1/PU.1 sites resulted in significantly decreased promoter activity of varying magnitude, depending on the cell line tested. The most profound effect on the promoter strength was observed upon mutation of the binding site for Early B-cell factor 1 (EBF1). This mutation produced a 10-20 fold drop in promoter activity and pinpointed EBF1 as the master regulator of human SLAMF1 gene in B cells. We also identified three potent transcriptional enhancers in human SLAMF1 locus, each containing functional EBF1 binding sites. Thus, EBF1 interacts with specific binding sites located both in the promoter and in the enhancer regions of the SLAMF1 gene and is critical for its expression in human B cells.
Collapse
Affiliation(s)
- Anton M Schwartz
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Lidia V Putlyaeva
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Milica Covich
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Anna V Klepikova
- Institute for Information Transmission Problems of the Russian Academy of Sciences, Moscow, Russia; Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, Russia
| | - Kseniya A Akulich
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, Russia; School of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia
| | - Ilya E Vorontsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Kirill V Korneev
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia; Faculty of Biology, Lomonosov Moscow State University, Moscow, Russia
| | - Sergey E Dmitriev
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia; Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, Russia; Faculty of Biology, Lomonosov Moscow State University, Moscow, Russia
| | - Oleg L Polanovsky
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Svetlana P Sidorenko
- R.E. Kavetsky Institute of Experimental Pathology, Oncology and Radiobiology of National Academy of Sciences of Ukraine, Kyiv, Ukraine
| | - Ivan V Kulakovskiy
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia; Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Dmitry V Kuprash
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia; Faculty of Biology, Lomonosov Moscow State University, Moscow, Russia.
| |
Collapse
|
16
|
Kulakovskiy IV, Vorontsov IE, Yevshin IS, Soboleva AV, Kasianov AS, Ashoor H, Ba-Alawi W, Bajic VB, Medvedeva YA, Kolpakov FA, Makeev VJ. HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models. Nucleic Acids Res 2016; 44:D116-25. [PMID: 26586801 PMCID: PMC4702883 DOI: 10.1093/nar/gkv1249] [Citation(s) in RCA: 145] [Impact Index Per Article: 18.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2015] [Revised: 10/29/2015] [Accepted: 10/30/2015] [Indexed: 02/06/2023] Open
Abstract
Models of transcription factor (TF) binding sites provide a basis for a wide spectrum of studies in regulatory genomics, from reconstruction of regulatory networks to functional annotation of transcripts and sequence variants. While TFs may recognize different sequence patterns in different conditions, it is pragmatic to have a single generic model for each particular TF as a baseline for practical applications. Here we present the expanded and enhanced version of HOCOMOCO (http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco10), the collection of models of DNA patterns, recognized by transcription factors. HOCOMOCO now provides position weight matrix (PWM) models for binding sites of 601 human TFs and, in addition, PWMs for 396 mouse TFs. Furthermore, we introduce the largest up to date collection of dinucleotide PWM models for 86 (52) human (mouse) TFs. The update is based on the analysis of massive ChIP-Seq and HT-SELEX datasets, with the validation of the resulting models on in vivo data. To facilitate a practical application, all HOCOMOCO models are linked to gene and protein databases (Entrez Gene, HGNC, UniProt) and accompanied by precomputed score thresholds. Finally, we provide command-line tools for PWM and diPWM threshold estimation and motif finding in nucleotide sequences.
Collapse
Affiliation(s)
- Ivan V Kulakovskiy
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991, GSP-1, Vavilova 32, Moscow, Russia Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, GSP-1, Gubkina 3, Moscow, Russia
| | - Ilya E Vorontsov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, GSP-1, Gubkina 3, Moscow, Russia
| | - Ivan S Yevshin
- Design Technological Institute of Digital Techniques, Siberian Branch of the Russian Academy of Sciences, 630090, Academician Rzhanov 6, Novosibirsk, Russia Institute of Systems Biology Ltd, 630112, office 901, Krasina 54, Novosibirsk, Russia
| | - Anastasiia V Soboleva
- Moscow Institute of Physics and Technology, 141700, Institutskiy per. 9, Dolgoprudny, Moscow Region, Russia
| | - Artem S Kasianov
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, GSP-1, Gubkina 3, Moscow, Russia
| | - Haitham Ashoor
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal 23955-6900, Saudi Arabia
| | - Wail Ba-Alawi
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal 23955-6900, Saudi Arabia
| | - Vladimir B Bajic
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal 23955-6900, Saudi Arabia
| | - Yulia A Medvedeva
- Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, GSP-1, Gubkina 3, Moscow, Russia Center for Bioengineering, Russian Academy of Sciences, 117312, 60-letiya Oktyabrya 7/2, Moscow, Russia
| | - Fedor A Kolpakov
- Design Technological Institute of Digital Techniques, Siberian Branch of the Russian Academy of Sciences, 630090, Academician Rzhanov 6, Novosibirsk, Russia Institute of Systems Biology Ltd, 630112, office 901, Krasina 54, Novosibirsk, Russia
| | - Vsevolod J Makeev
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991, GSP-1, Vavilova 32, Moscow, Russia Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, GSP-1, Gubkina 3, Moscow, Russia Moscow Institute of Physics and Technology, 141700, Institutskiy per. 9, Dolgoprudny, Moscow Region, Russia
| |
Collapse
|
17
|
Medvedeva YA, Lennartsson A, Ehsani R, Kulakovskiy IV, Vorontsov IE, Panahandeh P, Khimulya G, Kasukawa T, Drabløs F. EpiFactors: a comprehensive database of human epigenetic factors and complexes. Database (Oxford) 2015; 2015:bav067. [PMID: 26153137 PMCID: PMC4494013 DOI: 10.1093/database/bav067] [Citation(s) in RCA: 164] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/24/2015] [Accepted: 06/15/2015] [Indexed: 12/22/2022]
Abstract
Epigenetics refers to stable and long-term alterations of cellular traits that are
not caused by changes in the DNA sequence per se. Rather, covalent
modifications of DNA and histones affect gene expression and genome stability
via proteins that recognize and act upon such modifications. Many
enzymes that catalyse epigenetic modifications or are critical for enzymatic
complexes have been discovered, and this is encouraging investigators to study the
role of these proteins in diverse normal and pathological processes. Rapidly growing
knowledge in the area has resulted in the need for a resource that compiles,
organizes and presents curated information to the researchers in an easily accessible
and user-friendly form. Here we present EpiFactors, a manually curated database
providing information about epigenetic regulators, their complexes, targets and
products. EpiFactors contains information on 815 proteins, including 95 histones and
protamines. For 789 of these genes, we include expressions values across several
samples, in particular a collection of 458 human primary cell samples (for
approximately 200 cell types, in many cases from three individual donors), covering
most mammalian cell steady states, 255 different cancer cell lines (representing
approximately 150 cancer subtypes) and 134 human postmortem tissues. Expression
values were obtained by the FANTOM5 consortium using Cap Analysis of Gene Expression
technique. EpiFactors also contains information on 69 protein complexes that are
involved in epigenetic regulation. The resource is practical for a wide range of
users, including biologists, pharmacologists and clinicians. Database URL: http://epifactors.autosome.ru
Collapse
Affiliation(s)
- Yulia A Medvedeva
- Institute of Personal and Predictive Medicine of Cancer, 08916 Badalona, Spain, Department of Computational Biology, Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia,
| | - Andreas Lennartsson
- Department of Biosciences and Nutrition, Karolinska Institutet, 14183 Huddinge, Sweden
| | - Rezvan Ehsani
- Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, NO-7489 Trondheim, Norway
| | - Ivan V Kulakovskiy
- Department of Computational Biology, Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Ilya E Vorontsov
- Department of Computational Biology, Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Pouda Panahandeh
- Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, NO-7489 Trondheim, Norway
| | - Grigory Khimulya
- Department of Computational Biology, Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Takeya Kasukawa
- Division of Genomic Technologies (DGT), RIKEN Center for Life Science Technologies, 1-7-22 Suehiro-Cho, Tsurumi-Ku, Yokohama 230-0045, Kanagawa, Japan
| | | | - Finn Drabløs
- Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, NO-7489 Trondheim, Norway,
| |
Collapse
|
18
|
Vorontsov IE, Kulakovskiy IV, Makeev VJ. Jaccard index based similarity measure to compare transcription factor binding site models. Algorithms Mol Biol 2013; 8:23. [PMID: 24074225 PMCID: PMC3851813 DOI: 10.1186/1748-7188-8-23] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2012] [Accepted: 09/18/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Positional weight matrix (PWM) remains the most popular for quantification of transcription factor (TF) binding. PWM supplied with a score threshold defines a set of putative transcription factor binding sites (TFBS), thus providing a TFBS model.TF binding DNA fragments obtained by different experimental methods usually give similar but not identical PWMs. This is also common for different TFs from the same structural family. Thus it is often necessary to measure the similarity between PWMs. The popular tools compare PWMs directly using matrix elements. Yet, for log-odds PWMs, negative elements do not contribute to the scores of highly scoring TFBS and thus may be different without affecting the sets of the best recognized binding sites. Moreover, the two TFBS sets recognized by a given pair of PWMs can be more or less different depending on the score thresholds. RESULTS We propose a practical approach for comparing two TFBS models, each consisting of a PWM and the respective scoring threshold. The proposed measure is a variant of the Jaccard index between two TFBS sets. The measure defines a metric space for TFBS models of all finite lengths. The algorithm can compare TFBS models constructed using substantially different approaches, like PWMs with raw positional counts and log-odds. We present the efficient software implementation: MACRO-APE (MAtrix CompaRisOn by Approximate P-value Estimation). CONCLUSIONS MACRO-APE can be effectively used to compute the Jaccard index based similarity for two TFBS models. A two-pass scanning algorithm is presented to scan a given collection of PWMs for PWMs similar to a given query. AVAILABILITY AND IMPLEMENTATION MACRO-APE is implemented in ruby 1.9; software including source code and a manual is freely available at http://autosome.ru/macroape/ and in supplementary materials.
Collapse
|
19
|
Kulakovskiy IV, Medvedeva YA, Schaefer U, Kasianov AS, Vorontsov IE, Bajic VB, Makeev VJ. HOCOMOCO: a comprehensive collection of human transcription factor binding sites models. Nucleic Acids Res 2012; 41:D195-202. [PMID: 23175603 PMCID: PMC3531053 DOI: 10.1093/nar/gks1089] [Citation(s) in RCA: 155] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Transcription factor (TF) binding site (TFBS) models are crucial for computational reconstruction of transcription regulatory networks. In existing repositories, a TF often has several models (also called binding profiles or motifs), obtained from different experimental data. Having a single TFBS model for a TF is more pragmatic for practical applications. We show that integration of TFBS data from various types of experiments into a single model typically results in the improved model quality probably due to partial correction of source specific technique bias. We present the Homo sapiens comprehensive model collection (HOCOMOCO, http://autosome.ru/HOCOMOCO/, http://cbrc.kaust.edu.sa/hocomoco/) containing carefully hand-curated TFBS models constructed by integration of binding sequences obtained by both low- and high-throughput methods. To construct position weight matrices to represent these TFBS models, we used ChIPMunk software in four computational modes, including newly developed periodic positional prior mode associated with DNA helix pitch. We selected only one TFBS model per TF, unless there was a clear experimental evidence for two rather distinct TFBS models. We assigned a quality rating to each model. HOCOMOCO contains 426 systematically curated TFBS models for 401 human TFs, where 172 models are based on more than one data source.
Collapse
Affiliation(s)
- Ivan V Kulakovskiy
- Laboratory of Bioinformatics and Systems Biology, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Street 32, Moscow 119991, GSP-1, Russia.
| | | | | | | | | | | | | |
Collapse
|