1
|
Sarpan N, Taranenko E, Ooi SE, Low ETL, Espinoza A, Tatarinova TV, Ong-Abdullah M. DNA methylation changes in clonally propagated oil palm. PLANT CELL REPORTS 2020; 39:1219-1233. [PMID: 32591850 DOI: 10.1007/s00299-020-02561-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Accepted: 06/17/2020] [Indexed: 06/11/2023]
Abstract
Several hypomethylated sites within the Karma region of EgDEF1 and hotspot regions in chromosomes 1, 2, 3, and 5 may be associated with mantling. One of the main challenges faced by the oil palm industry is fruit abnormalities, such as the "mantled" phenotype that can lead to reduced yields. This clonal abnormality is an epigenetic phenomenon and has been linked to the hypomethylation of a transposable element within the EgDEF1 gene. To understand the epigenome changes in clones, methylomes of clonal oil palms were compared to methylomes of seedling-derived oil palms. Whole-genome bisulfite sequencing data from seedlings, normal, and mantled clones were analyzed to determine and compare the context-specific DNA methylomes. In seedlings, coding and regulatory regions are generally hypomethylated while introns and repeats are extensively methylated. Genes with a low number of guanines and cytosines in the third position of codons (GC3-poor genes) were increasingly methylated towards their 3' region, while GC3-rich genes remain demethylated, similar to patterns in other eukaryotic species. Predicted promoter regions were generally hypomethylated in seedlings. In clones, CG, CHG, and CHH methylation levels generally decreased in functionally important regions, such as promoters, 5' UTRs, and coding regions. Although random regions were found to be hypomethylated in clonal genomes, hypomethylation of certain hotspot regions may be associated with the clonal mantling phenotype. Our findings, therefore, suggest other hypomethylated CHG sites within the Karma of EgDEF1 and hypomethylated hotspot regions in chromosomes 1, 2, 3 and 5, are associated with mantling.
Collapse
Affiliation(s)
- Norashikin Sarpan
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000, Kajang, Selangor, Malaysia
| | - Elizaveta Taranenko
- Department of Biology, University of La Verne, La Verne, CA, USA
- Department of Fundamental Biology and Biotechnology, Siberian Federal University, 660074, Krasnoyarsk, Russia
| | - Siew-Eng Ooi
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000, Kajang, Selangor, Malaysia
| | - Eng-Ti Leslie Low
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000, Kajang, Selangor, Malaysia
| | | | - Tatiana V Tatarinova
- Department of Biology, University of La Verne, La Verne, CA, USA.
- Department of Fundamental Biology and Biotechnology, Siberian Federal University, 660074, Krasnoyarsk, Russia.
- Vavilov Institute for General Genetics, Moscow, Russia.
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia.
| | - Meilina Ong-Abdullah
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, 6 Persiaran Institusi, Bandar Baru Bangi, 43000, Kajang, Selangor, Malaysia.
| |
Collapse
|
2
|
Makarenko MS, Usatov AV, Tatarinova TV, Azarin KV, Logacheva MD, Gavrilova VA, Horn R. Characterization of the mitochondrial genome of the MAX1 type of cytoplasmic male-sterile sunflower. BMC PLANT BIOLOGY 2019; 19:51. [PMID: 30813888 PMCID: PMC6394147 DOI: 10.1186/s12870-019-1637-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
BACKGROUND More than 70 cytoplasmic male sterility (CMS) types have been identified in Helianthus, but only for less than half of them, research of mitochondrial organization has been conducted. Moreover, complete mitochondrion sequences have only been published for two CMS sources - PET1 and PET2. It has been demonstrated that other sunflower CMS sources like MAX1, significantly differ from the PET1 and PET2 types. However, possible molecular causes for the CMS induction by MAX1 have not yet been proposed. In the present study, we have investigated structural changes in the mitochondrial genome of HA89 (MAX1) CMS sunflower line in comparison to the fertile mitochondrial genome. RESULTS Eight significant major reorganization events have been determined in HA89 (MAX1) mtDNA: one 110 kb inverted region, four deletions of 439 bp, 978 bp, 3183 bp and 14,296 bp, respectively, and three insertions of 1999 bp, 5272 bp and 6583 bp. The rearrangements have led to functional changes in the mitochondrial genome of HA89 (MAX1) resulting in the complete elimination of orf777 and the appearance of new ORFs - orf306, orf480, orf645 and orf1287. Aligning the mtDNA of the CMS sources PET1 and PET2 with MAX1 we found some common reorganization features in their mitochondrial genome sequences. CONCLUSION The new open reading frame orf1287, representing a chimeric atp6 gene, may play a key role in MAX1 CMS phenotype formation in sunflower, while the contribution of other mitochondrial reorganizations seems to appear negligible for the CMS development.
Collapse
Affiliation(s)
| | | | - Tatiana V. Tatarinova
- University of La Verne, La Verne, CA USA
- Institute for Information Transmission Problems, Moscow, Russia
- Institute for General Genetics, Moscow, Russia
- Siberian Federal University, Krasnoyarsk, Russia
| | | | - Maria D. Logacheva
- Institute for Information Transmission Problems, Moscow, Russia
- Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Vera A. Gavrilova
- The N.I. Vavilov All Russian Institute of Plant Genetic Resources, Saint Petersburg, Russia
| | - Renate Horn
- University of Rostock, Institute of Biological Sciences, Plant Genetics, Rostock, Germany
| |
Collapse
|
3
|
Vishnevsky OV, Bocharnikov AV, Kolchanov NA. Argo_CUDA: Exhaustive GPU based approach for motif discovery in large DNA datasets. J Bioinform Comput Biol 2017; 16:1740012. [PMID: 29281953 DOI: 10.1142/s0219720017400121] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The development of chromatin immunoprecipitation sequencing (ChIP-seq) technology has revolutionized the genetic analysis of the basic mechanisms underlying transcription regulation and led to accumulation of information about a huge amount of DNA sequences. There are a lot of web services which are currently available for de novo motif discovery in datasets containing information about DNA/protein binding. An enormous motif diversity makes their finding challenging. In order to avoid the difficulties, researchers use different stochastic approaches. Unfortunately, the efficiency of the motif discovery programs dramatically declines with the query set size increase. This leads to the fact that only a fraction of top "peak" ChIP-Seq segments can be analyzed or the area of analysis should be narrowed. Thus, the motif discovery in massive datasets remains a challenging issue. Argo_Compute Unified Device Architecture (CUDA) web service is designed to process the massive DNA data. It is a program for the detection of degenerate oligonucleotide motifs of fixed length written in 15-letter IUPAC code. Argo_CUDA is a full-exhaustive approach based on the high-performance GPU technologies. Compared with the existing motif discovery web services, Argo_CUDA shows good prediction quality on simulated sets. The analysis of ChIP-Seq sequences revealed the motifs which correspond to known transcription factor binding sites.
Collapse
Affiliation(s)
- Oleg V Vishnevsky
- * Institute of Cytology and Genetics SB RAS, Lavrentieva Ave., 10, Novosibirsk 630090, Russia.,† Novosibirsk State University, Pirogova, 10, Novosibirsk 630090, Russia
| | | | - Nikolay A Kolchanov
- * Institute of Cytology and Genetics SB RAS, Lavrentieva Ave., 10, Novosibirsk 630090, Russia.,† Novosibirsk State University, Pirogova, 10, Novosibirsk 630090, Russia
| |
Collapse
|
4
|
Triska M, Solovyev V, Baranova A, Kel A, Tatarinova TV. Nucleotide patterns aiding in prediction of eukaryotic promoters. PLoS One 2017; 12:e0187243. [PMID: 29141011 PMCID: PMC5687710 DOI: 10.1371/journal.pone.0187243] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2017] [Accepted: 09/05/2017] [Indexed: 01/09/2023] Open
Abstract
Computational analysis of promoters is hindered by the complexity of their architecture. In less studied genomes with complex organization, false positive promoter predictions are common. Accurate identification of transcription start sites and core promoter regions remains an unsolved problem. In this paper, we present a comprehensive analysis of genomic features associated with promoters and show that probabilistic integrative algorithms-driven models allow accurate classification of DNA sequence into “promoters” and “non-promoters” even in absence of the full-length cDNA sequences. These models may be built upon the maps of the distributions of sequence polymorphisms, RNA sequencing reads on genomic DNA, methylated nucleotides, transcription factor binding sites, as well as relative frequencies of nucleotides and their combinations. Positional clustering of binding sites shows that the cells of Oryza sativa utilize three distinct classes of transcription factors: those that bind preferentially to the [-500,0] region (188 “promoter-specific” transcription factors), those that bind preferentially to the [0,500] region (282 “5′ UTR-specific” TFs), and 207 of the “promiscuous” transcription factors with little or no location preference with respect to TSS. For the most informative motifs, their positional preferences are conserved between dicots and monocots.
Collapse
Affiliation(s)
- Martin Triska
- Children’s Hospital Los Angeles, University of Southern California, Los Angeles, CA, United States of America
- Faculty of Advanced Technology, University of South Wales, Pontypridd, Wales, United Kingdom
| | | | - Ancha Baranova
- School of Systems Biology, George Mason University, Fairfax, VA, United States of America
- Research Centre for Medical Genetics, Moscow, Russia
| | - Alexander Kel
- geneXplain GmbH, Wolfenbuettel, Germany
- Institute of Chemical Biology and Fundamental Medicine, Novosibirsk, Russia
| | - Tatiana V. Tatarinova
- School of Systems Biology, George Mason University, Fairfax, VA, United States of America
- Department of Biology, Division of Natural Sciences, University of La Verne, La Verne, CA, United States of America
- Bioinformatics Center, AA Kharkevich Institute for Information Transmission Problems RAS, Moscow, Russia
- Vavilov’s Institute for General Genetics, Moscow, Russia, Moscow, Russia
- * E-mail:
| |
Collapse
|
5
|
Chan KL, Tatarinova TV, Rosli R, Amiruddin N, Azizi N, Halim MAA, Sanusi NSNM, Jayanthi N, Ponomarenko P, Triska M, Solovyev V, Firdaus-Raih M, Sambanthamurthi R, Murphy D, Low ETL. Evidence-based gene models for structural and functional annotations of the oil palm genome. Biol Direct 2017; 12:21. [PMID: 28886750 PMCID: PMC5591544 DOI: 10.1186/s13062-017-0191-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2017] [Accepted: 08/07/2017] [Indexed: 11/13/2022] Open
Abstract
Background Oil palm is an important source of edible oil. The importance of the crop, as well as its long breeding cycle (10-12 years) has led to the sequencing of its genome in 2013 to pave the way for genomics-guided breeding. Nevertheless, the first set of gene predictions, although useful, had many fragmented genes. Classification and characterization of genes associated with traits of interest, such as those for fatty acid biosynthesis and disease resistance, were also limited. Lipid-, especially fatty acid (FA)-related genes are of particular interest for the oil palm as they specify oil yields and quality. This paper presents the characterization of the oil palm genome using different gene prediction methods and comparative genomics analysis, identification of FA biosynthesis and disease resistance genes, and the development of an annotation database and bioinformatics tools. Results Using two independent gene-prediction pipelines, Fgenesh++ and Seqping, 26,059 oil palm genes with transcriptome and RefSeq support were identified from the oil palm genome. These coding regions of the genome have a characteristic broad distribution of GC3 (fraction of cytosine and guanine in the third position of a codon) with over half the GC3-rich genes (GC3 ≥ 0.75286) being intronless. In comparison, only one-seventh of the oil palm genes identified are intronless. Using comparative genomics analysis, characterization of conserved domains and active sites, and expression analysis, 42 key genes involved in FA biosynthesis in oil palm were identified. For three of them, namely EgFABF, EgFABH and EgFAD3, segmental duplication events were detected. Our analysis also identified 210 candidate resistance genes in six classes, grouped by their protein domain structures. Conclusions We present an accurate and comprehensive annotation of the oil palm genome, focusing on analysis of important categories of genes (GC3-rich and intronless), as well as those associated with important functions, such as FA biosynthesis and disease resistance. The study demonstrated the advantages of having an integrated approach to gene prediction and developed a computational framework for combining multiple genome annotations. These results, available in the oil palm annotation database (http://palmxplore.mpob.gov.my), will provide important resources for studies on the genomes of oil palm and related crops. Reviewers This article was reviewed by Alexander Kel, Igor Rogozin, and Vladimir A. Kuznetsov. Electronic supplementary material The online version of this article (doi:10.1186/s13062-017-0191-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kuang-Lim Chan
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia.,Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
| | - Tatiana V Tatarinova
- Department of Biology, University of La Verne, La Verne, California, 91750, USA.,Spatial Sciences Institute, University of Southern California, Los Angeles, CA, 90089, USA
| | - Rozana Rosli
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia.,Genomics and Computational Biology Research Group, University of South Wales, Pontypridd, CF371DL, UK
| | - Nadzirah Amiruddin
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Norazah Azizi
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Mohd Amin Ab Halim
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Nik Shazana Nik Mohd Sanusi
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Nagappan Jayanthi
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Petr Ponomarenko
- Spatial Sciences Institute, University of Southern California, Los Angeles, CA, 90089, USA
| | - Martin Triska
- Children's Hospital Los Angeles, University of Southern California, Los Angeles, CA, 90089, USA
| | - Victor Solovyev
- Softberry Inc., 116 Radio Circle, Suite 400, Mount Kisco, NY, 10549, USA
| | - Mohd Firdaus-Raih
- Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
| | - Ravigadevi Sambanthamurthi
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
| | - Denis Murphy
- Genomics and Computational Biology Research Group, University of South Wales, Pontypridd, CF371DL, UK
| | - Eng-Ti Leslie Low
- Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia.
| |
Collapse
|
6
|
Evolution of Brain Active Gene Promoters in Human Lineage Towards the Increased Plasticity of Gene Regulation. Mol Neurobiol 2017; 55:1871-1904. [PMID: 28233272 DOI: 10.1007/s12035-017-0427-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Accepted: 01/26/2017] [Indexed: 01/31/2023]
Abstract
Adaptability to a variety of environmental conditions is a prominent feature of Homo sapiens. We hypothesize that this feature can be explained by evolutionary changes in gene promoters active in the brain prefrontal cortex leading to a more flexible gene regulation network. The genotype-dependent range of gene expression can be broader in humans than in other higher primates. Thus, we searched for specific signatures of evolutionary changes in promoter architectures of multiple hominid genes, including the genes active in human cortical neurons that may indicate an increase of variability of gene expression rather than just changes in the level of expression, such as downregulation or upregulation of the genes. We performed a whole-genome search for genetic-based alterations that may impact gene regulation "flexibility" in a process of hominids evolution, such as (i) CpG dinucleotide content, (ii) predicted nucleosome-DNA dissociation constant, and (iii) predicted affinities for TATA-binding protein (TBP) in gene promoters. We tested all putative promoter regions across the human genome and especially gene promoters in active chromatin state in neurons of prefrontal cortex, the brain region critical for abstract thinking and social and behavioral adaptation. Our data imply that the origin of modern man has been associated with an increase of flexibility of promoter-driven gene regulation in brain. In contrast, after splitting from the ancestral lineages of H. sapiens, the evolution of ape species is characterized by reduced flexibility of gene promoter functioning, underlying reduced variability of the gene expression.
Collapse
|
7
|
Zolotarenko A, Chekalin E, Mehta R, Baranova A, Tatarinova TV, Bruskin S. Identification of Transcriptional Regulators of Psoriasis from RNA-Seq Experiments. Methods Mol Biol 2017; 1613:355-370. [PMID: 28849568 DOI: 10.1007/978-1-4939-7027-8_14] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Psoriasis is a common inflammatory skin disease with complex etiology and chronic progression. To provide novel insights into the molecular mechanisms of regulation of the disease we performed RNA sequencing (RNA-Seq) analysis of 14 pairs of skin samples collected from psoriatic patients. Subsequent pathway analysis and an extraction of transcriptional regulators governing psoriasis-associated pathways was executed using a combination of MetaCore Interactome enrichment tool and cisExpress algorithm, and followed by comparison to a set of previously described psoriasis response elements. A comparative approach has allowed us to identify 42 core transcriptional regulators of the disease associated with inflammation (NFkB, IRF9, JUN, FOS, SRF), activity of T-cells in the psoriatic lesions (STAT6, FOXP3, NFATC2, GATA3, TCF7, RUNX1, etc.), hyperproliferation and migration of keratinocytes (JUN, FOS, NFIB, TFAP2A, TFAP2C), and lipid metabolism (TFAP2, RARA, VDR). After merging the ChIP-seq and RNA-seq data, we conclude that the atypical expression of FOXA1 transcriptional factor is an important player in psoriasis, as it inhibits maturation of naive T cells into this Treg subpopulation (CD4+FOXA1+CD47+CD69+PD-L1(hi)FOXP3-), therefore contributing to the development of psoriatic skin lesions.
Collapse
Affiliation(s)
- Alena Zolotarenko
- Laboratory of Functional Genomics, Vavilov Institute of General Genetics RAS, Gubkina Street, 3119991, Moscow, Russia
| | - Evgeny Chekalin
- Laboratory of Functional Genomics, Vavilov Institute of General Genetics RAS, Gubkina Street, 3119991, Moscow, Russia
| | - Rohini Mehta
- The Center of the Study of Chronic Metabolic and Rare Diseases, School of Systems Biology, George Mason University, Fairfax, VA, USA
| | - Ancha Baranova
- The Center of the Study of Chronic Metabolic and Rare Diseases, School of Systems Biology, George Mason University, Fairfax, VA, USA
- Research Centre for Medical Genetics RAMS, Moscow, Russia
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow, Russia
- Atlas Biomed Group, Moscow, Russia
| | - Tatiana V Tatarinova
- Atlas Biomed Group, Moscow, Russia
- Center for Personalized Medicine, Children's Hospital Los Angeles and Spatial Sciences Institute, University of Southern California, Los Angeles, CA, USA
- A.A. Kharkevich Institute for Information Transmission Problems RAS, Moscow, Russia
| | - Sergey Bruskin
- Laboratory of Functional Genomics, Vavilov Institute of General Genetics RAS, Gubkina Street, 3119991, Moscow, Russia.
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow, Russia.
| |
Collapse
|
8
|
Triska M, Ivliev A, Nikolsky Y, Tatarinova TV. Analysis of cis-Regulatory Elements in Gene Co-expression Networks in Cancer. Methods Mol Biol 2017; 1613:291-310. [PMID: 28849565 DOI: 10.1007/978-1-4939-7027-8_11] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Analysis of gene co-expression networks is a powerful "data-driven" tool, invaluable for understanding cancer biology and mechanisms of tumor development. Yet, despite of completion of thousands of studies on cancer gene expression, there were few attempts to normalize and integrate co-expression data from scattered sources in a concise "meta-analysis" framework. Here we describe an integrated approach to cancer expression meta-analysis, which combines generation of "data-driven" co-expression networks with detailed statistical detection of promoter sequence motifs within the co-expression clusters. First, we applied Weighted Gene Co-Expression Network Analysis (WGCNA) workflow and Pearson's correlation to generate a comprehensive set of over 3000 co-expression clusters in 82 normalized microarray datasets from nine cancers of different origin. Next, we designed a genome-wide statistical approach to the detection of specific DNA sequence motifs based on similarities between the promoters of similarly expressed genes. The approach, realized as cisExpress software module, was specifically designed for analysis of very large data sets such as those generated by publicly accessible whole genome and transcriptome projects. cisExpress uses a task farming algorithm to exploit all available computational cores within a shared memory node.We discovered that although co-expression modules are populated with different sets of genes, they share distinct stable patterns of co-regulation based on promoter sequence analysis. The number of motifs per co-expression cluster varies widely in accordance with cancer tissue of origin, with the largest number in colon (68 motifs) and the lowest in ovary (18 motifs). The top scored motifs are typically shared between several tissues; they define sets of target genes responsible for certain functionality of cancerogenesis. Both the co-expression modules and a database of precalculated motifs are publically available and accessible for further studies.
Collapse
Affiliation(s)
- Martin Triska
- Spatial Sciences Institute, University of Southern California, Los Angeles, CA, USA
| | | | - Yuri Nikolsky
- Prosapia Genetics, Solana Beach, CA, USA.,School of Systems Biology, George Mason University, Fairfax, VA, USA
| | - Tatiana V Tatarinova
- Spatial Sciences Institute, University of Southern California, Los Angeles, CA, USA. .,Center for Personalized Medicine, Children's Hospital Los Angeles, 4640 Hollywood Blvd, Los Angeles, CA, 90027, USA. .,A.A. Kharkevich Institute for Information Transmission Problems RAS, Moscow, Russia.
| |
Collapse
|
9
|
Integrated computational approach to the analysis of RNA-seq data reveals new transcriptional regulators of psoriasis. Exp Mol Med 2016; 48:e268. [PMID: 27811935 PMCID: PMC5133374 DOI: 10.1038/emm.2016.97] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2016] [Revised: 05/06/2016] [Accepted: 05/24/2016] [Indexed: 02/07/2023] Open
Abstract
Psoriasis is a common inflammatory skin disease with complex etiology and chronic progression. To provide novel insights into the regulatory molecular mechanisms of the disease, we performed RNA sequencing analysis of 14 pairs of skin samples collected from patients with psoriasis. Subsequent pathway analysis and extraction of the transcriptional regulators governing psoriasis-associated pathways was executed using a combination of the MetaCore Interactome enrichment tool and the cisExpress algorithm, followed by comparison to a set of previously described psoriasis response elements. A comparative approach allowed us to identify 42 core transcriptional regulators of the disease associated with inflammation (NFκB, IRF9, JUN, FOS, SRF), the activity of T cells in psoriatic lesions (STAT6, FOXP3, NFATC2, GATA3, TCF7, RUNX1), the hyperproliferation and migration of keratinocytes (JUN, FOS, NFIB, TFAP2A, TFAP2C) and lipid metabolism (TFAP2, RARA, VDR). In addition to the core regulators, we identified 38 transcription factors previously not associated with the disease that can clarify the pathogenesis of psoriasis. To illustrate these findings, we analyzed the regulatory role of one of the identified transcription factors (TFs), FOXA1. Using ChIP-seq and RNA-seq data, we concluded that the atypical expression of the FOXA1 TF is an important player in the disease as it inhibits the maturation of naive T cells into the (CD4+FOXA1+CD47+CD69+PD-L1(hi)FOXP3-) regulatory T cell subpopulation, therefore contributing to the development of psoriatic skin lesions.
Collapse
|
10
|
Tatarinova TV, Chekalin E, Nikolsky Y, Bruskin S, Chebotarov D, McNally KL, Alexandrov N. Nucleotide diversity analysis highlights functionally important genomic regions. Sci Rep 2016; 6:35730. [PMID: 27774999 PMCID: PMC5075931 DOI: 10.1038/srep35730] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 09/30/2016] [Indexed: 12/15/2022] Open
Abstract
We analyzed functionality and relative distribution of genetic variants across the complete Oryza sativa genome, using the 40 million single nucleotide polymorphisms (SNPs) dataset from the 3,000 Rice Genomes Project (http://snp-seek.irri.org), the largest and highest density SNP collection for any higher plant. We have shown that the DNA-binding transcription factors (TFs) are the most conserved group of genes, whereas kinases and membrane-localized transporters are the most variable ones. TFs may be conserved because they belong to some of the most connected regulatory hubs that modulate transcription of vast downstream gene networks, whereas signaling kinases and transporters need to adapt rapidly to changing environmental conditions. In general, the observed profound patterns of nucleotide variability reveal functionally important genomic regions. As expected, nucleotide diversity is much higher in intergenic regions than within gene bodies (regions spanning gene models), and protein-coding sequences are more conserved than untranslated gene regions. We have observed a sharp decline in nucleotide diversity that begins at about 250 nucleotides upstream of the transcription start and reaches minimal diversity exactly at the transcription start. We found the transcription termination sites to have remarkably symmetrical patterns of SNP density, implying presence of functional sites near transcription termination. Also, nucleotide diversity was significantly lower near 3′ UTRs, the area rich with regulatory regions.
Collapse
Affiliation(s)
- Tatiana V Tatarinova
- Center for Personalized Medicine and Spatial Sciences Institute, University of Southern California, Los Angeles, CA, USA.,Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russian Federation
| | | | - Yuri Nikolsky
- Vavilov Institute of General Genetics, Moscow, Russia.,F1 Genomics, San Diego, CA, USA.,School of Systems Biology, George Mason University, VA, USA
| | | | - Dmitry Chebotarov
- International Rice Research Institute, Los Baños, Laguna 4031, Philippines
| | - Kenneth L McNally
- International Rice Research Institute, Los Baños, Laguna 4031, Philippines
| | | |
Collapse
|
11
|
iRegulon: from a gene list to a gene regulatory network using large motif and track collections. PLoS Comput Biol 2014; 10:e1003731. [PMID: 25058159 PMCID: PMC4109854 DOI: 10.1371/journal.pcbi.1003731] [Citation(s) in RCA: 606] [Impact Index Per Article: 60.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2014] [Accepted: 05/27/2014] [Indexed: 01/17/2023] Open
Abstract
Identifying master regulators of biological processes and mapping their downstream gene networks are key challenges in systems biology. We developed a computational method, called iRegulon, to reverse-engineer the transcriptional regulatory network underlying a co-expressed gene set using cis-regulatory sequence analysis. iRegulon implements a genome-wide ranking-and-recovery approach to detect enriched transcription factor motifs and their optimal sets of direct targets. We increase the accuracy of network inference by using very large motif collections of up to ten thousand position weight matrices collected from various species, and linking these to candidate human TFs via a motif2TF procedure. We validate iRegulon on gene sets derived from ENCODE ChIP-seq data with increasing levels of noise, and we compare iRegulon with existing motif discovery methods. Next, we use iRegulon on more challenging types of gene lists, including microRNA target sets, protein-protein interaction networks, and genetic perturbation data. In particular, we over-activate p53 in breast cancer cells, followed by RNA-seq and ChIP-seq, and could identify an extensive up-regulated network controlled directly by p53. Similarly we map a repressive network with no indication of direct p53 regulation but rather an indirect effect via E2F and NFY. Finally, we generalize our computational framework to include regulatory tracks such as ChIP-seq data and show how motif and track discovery can be combined to map functional regulatory interactions among co-expressed genes. iRegulon is available as a Cytoscape plugin from http://iregulon.aertslab.org. Gene regulatory networks control developmental, homeostatic, and disease processes by governing precise levels and spatio-temporal patterns of gene expression. Determining their topology can provide mechanistic insight into these processes. Gene regulatory networks consist of interactions between transcription factors and their direct target genes. Each regulatory interaction represents the binding of the transcription factor to a specific DNA binding site near its target gene. Here we present a computational method, called iRegulon, to identify master regulators and direct target genes in a human gene signature, i.e. a set of co-expressed genes. iRegulon relies on the analysis of the regulatory sequences around each gene in the gene set to detect enriched TF motifs or ChIP-seq peaks, using databases of nearly 10.000 TF motifs and 1000 ChIP-seq data sets or “tracks”. Next, it associates enriched motifs and tracks with candidate transcription factors and determines the optimal subset of direct target genes. We validate iRegulon on ENCODE data, and use it in combination with RNA-seq and ChIP-seq data to map a p53 downstream network with new predicted co-factors and targets. iRegulon is available as a Cytoscape plugin, supporting human, mouse, and Drosophila genes, and provides access to hundreds of cancer-related TF-target subnetworks or “regulons”.
Collapse
|
12
|
Bolívar JC, Machens F, Brill Y, Romanov A, Bülow L, Hehl R. 'In silico expression analysis', a novel PathoPlant web tool to identify abiotic and biotic stress conditions associated with specific cis-regulatory sequences. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau030. [PMID: 24727366 PMCID: PMC3983564 DOI: 10.1093/database/bau030] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Using bioinformatics, putative cis-regulatory sequences can be easily identified using pattern recognition programs on promoters of specific gene sets. The abundance of predicted cis-sequences is a major challenge to associate these sequences with a possible function in gene expression regulation. To identify a possible function of the predicted cis-sequences, a novel web tool designated ‘in silico expression analysis’ was developed that correlates submitted cis-sequences with gene expression data from Arabidopsis thaliana. The web tool identifies the A. thaliana genes harbouring the sequence in a defined promoter region and compares the expression of these genes with microarray data. The result is a hierarchy of abiotic and biotic stress conditions to which these genes are most likely responsive. When testing the performance of the web tool, known cis-regulatory sequences were submitted to the ‘in silico expression analysis’ resulting in the correct identification of the associated stress conditions. When using a recently identified novel elicitor-responsive sequence, a WT-box (CGACTTTT), the ‘in silico expression analysis’ predicts that genes harbouring this sequence in their promoter are most likely Botrytis cinerea induced. Consistent with this prediction, the strongest induction of a reporter gene harbouring this sequence in the promoter is observed with B. cinerea in transgenic A. thaliana. Database URL:http://www.pathoplant.de/expression_analysis.php.
Collapse
Affiliation(s)
- Julio C Bolívar
- Institut für Genetik, Technische Universität Braunschweig, Spielmannstr 7, 38106 Braunschweig, Germany
| | | | | | | | | | | |
Collapse
|
13
|
Abstract
In this paper we present NPEST, a novel tool for the analysis of expressed sequence tags (EST) distributions and transcription start site (TSS) prediction. This method estimates an unknown probability distribution of ESTs using a maximum likelihood (ML) approach, which is then used to predict positions of TSS. Accurate identification of TSS is an important genomics task, since the position of regulatory elements with respect to the TSS can have large effects on gene regulation, and performance of promoter motif-finding methods depends on correct identification of TSSs. Our probabilistic approach expands recognition capabilities to multiple TSS per locus that may be a useful tool to enhance the understanding of alternative splicing mechanisms. This paper presents analysis of simulated data as well as statistical analysis of promoter regions of a model dicot plant Arabidopsis thaliana. Using our statistical tool we analyzed 16520 loci and developed a database of TSS, which is now publicly available at www.glacombio.net/NPEST.
Collapse
|