Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Troukhan M, Tatarinova T, Bouck J, Flavell RB, Alexandrov NN. Genome-wide discovery of cis-elements in promoter sequences using gene expression. OMICS 2010;13:139-51. [PMID: 19231992 DOI: 10.1089/omi.2008.0034] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

For:	Troukhan M, Tatarinova T, Bouck J, Flavell RB, Alexandrov NN. Genome-wide discovery of cis-elements in promoter sequences using gene expression. OMICS 2010;13:139-51. [PMID: 19231992 DOI: 10.1089/omi.2008.0034] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Number

Cited by Other Article(s)

Savinkova LK, Sharypova EB, Kolchanov NA. On the Role of TATA Boxes and TATA-Binding Protein in Arabidopsis thaliana. PLANTS (BASEL, SWITZERLAND) 2023;12:1000. [PMID: 36903861 PMCID: PMC10005294 DOI: 10.3390/plants12051000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 01/13/2023] [Accepted: 02/20/2023] [Indexed: 06/18/2023]

Abstract

For transcription initiation by RNA polymerase II (Pol II), all eukaryotes require assembly of basal transcription machinery on the core promoter, a region located approximately in the locus spanning a transcription start site (-50; +50 bp). Although Pol II is a complex multi-subunit enzyme conserved among all eukaryotes, it cannot initiate transcription without the participation of many other proteins. Transcription initiation on TATA-containing promoters requires the assembly of the preinitiation complex; this process is triggered by an interaction of TATA-binding protein (TBP, a component of the general transcription factor TFIID (transcription factor II D)) with a TATA box. The interaction of TBP with various TATA boxes in plants, in particular Arabidopsis thaliana, has hardly been investigated, except for a few early studies that addressed the role of a TATA box and substitutions in it in plant transcription systems. This is despite the fact that the interaction of TBP with TATA boxes and their variants can be used to regulate transcription. In this review, we examine the roles of some general transcription factors in the assembly of the basal transcription complex, as well as functions of TATA boxes of the model plant A. thaliana. We review examples showing not only the involvement of TATA boxes in the initiation of transcription machinery assembly but also their indirect participation in plant adaptation to environmental conditions in responses to light and other phenomena. Examples of an influence of the expression levels of A. thaliana TBP1 and TBP2 on morphological traits of the plants are also examined. We summarize available functional data on these two early players that trigger the assembly of transcription machinery. This information will deepen the understanding of the mechanisms underlying transcription by Pol II in plants and will help to utilize the functions of the interaction of TBP with TATA boxes in practice.

Collapse

Deviatiiarov RM, Gams A, Kulakovskiy IV, Buyan A, Meshcheryakov G, Syunyaev R, Singh R, Shah P, Tatarinova TV, Gusev O, Efimov IR. An atlas of transcribed human cardiac promoters and enhancers reveals an important role of regulatory elements in heart failure. NATURE CARDIOVASCULAR RESEARCH 2023;2:58-75. [PMID: 39196209 DOI: 10.1038/s44161-022-00182-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/23/2021] [Accepted: 11/02/2022] [Indexed: 08/29/2024]

Affiliation(s)

Ruslan M Deviatiiarov Laboratory of Regulatory Genomics, Institute of Fundamental Medicine and Biology, Kazan Federal University, Kazan, Russia
Anna Gams Department of Biomedical Engineering, The George Washington University, Washington, DC, USA
Ivan V Kulakovskiy Laboratory of Regulatory Genomics, Institute of Fundamental Medicine and Biology, Kazan Federal University, Kazan, Russia Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
Andrey Buyan Laboratory of Regulatory Genomics, Institute of Fundamental Medicine and Biology, Kazan Federal University, Kazan, Russia Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia
Georgy Meshcheryakov Institute of Protein Research, Russian Academy of Sciences, Pushchino, Russia
Roman Syunyaev Department of Biomedical Engineering, The George Washington University, Washington, DC, USA I.M. Sechenov First Moscow State Medical University, Moscow, Russia
Ramesh Singh Inova Heart and Vascular Institute, Falls Church, VA, USA
Palak Shah Department of Biomedical Engineering, The George Washington University, Washington, DC, USA Inova Heart and Vascular Institute, Falls Church, VA, USA
Tatiana V Tatarinova Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia. Department of Biology, University of La Verne, La Verne, CA, USA.
Oleg Gusev Laboratory of Regulatory Genomics, Institute of Fundamental Medicine and Biology, Kazan Federal University, Kazan, Russia. Graduate School of Medicine, Juntendo University, Tokyo, Japan. RIKEN Center for Integrative Medical Sciences, RIKEN, Yokohama, Japan. Endocrinology Research Center, Moscow, Russia.
Igor R Efimov Department of Biomedical Engineering, The George Washington University, Washington, DC, USA. Department of Biomedical Engineering, Northwestern University, Chicago, IL, USA. Department of Medicine, Northwestern University, Chicago, IL, USA.

Collapse

Genome-Wide Prediction of Transcription Start Sites in Conifers. Int J Mol Sci 2022;23:ijms23031735. [PMID: 35163661 PMCID: PMC8836283 DOI: 10.3390/ijms23031735] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 01/30/2022] [Accepted: 02/01/2022] [Indexed: 02/04/2023] Open

Abstract

The identification of promoters is an essential step in the genome annotation process, providing a framework for gene regulatory networks and their role in transcription regulation. Despite considerable advances in the high-throughput determination of transcription start sites (TSSs) and transcription factor binding sites (TFBSs), experimental methods are still time-consuming and expensive. Instead, several computational approaches have been developed to provide fast and reliable means for predicting the location of TSSs and regulatory motifs on a genome-wide scale. Numerous studies have been carried out on the regulatory elements of mammalian genomes, but plant promoters, especially in gymnosperms, have been left out of the limelight and, therefore, have been poorly investigated. The aim of this study was to enhance and expand the existing genome annotations using computational approaches for genome-wide prediction of TSSs in the four conifer species: loblolly pine, white spruce, Norway spruce, and Siberian larch. Our pipeline will be useful for TSS predictions in other genomes, especially for draft assemblies, where reliable TSS predictions are not usually available. We also explored some of the features of the nucleotide composition of the predicted promoters and compared the GC properties of conifer genes with model monocot and dicot plants. Here, we demonstrate that even incomplete genome assemblies and partial annotations can be a reliable starting point for TSS annotation. The results of the TSS prediction in four conifer species have been deposited in the Persephone genome browser, which allows smooth visualization and is optimized for large data sets. This work provides the initial basis for future experimental validation and the study of the regulatory regions to understand gene regulation in gymnosperms.

Collapse

Zhang M, Jia C, Li F, Li C, Zhu Y, Akutsu T, Webb GI, Zou Q, Coin LJM, Song J. Critical assessment of computational tools for prokaryotic and eukaryotic promoter prediction. Brief Bioinform 2022;23:6502561. [PMID: 35021193 PMCID: PMC8921625 DOI: 10.1093/bib/bbab551] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 11/12/2021] [Accepted: 11/30/2021] [Indexed: 01/13/2023] Open

Affiliation(s)

Meng Zhang
Cangzhi Jia Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
Fuyi Li
Chen Li
Yan Zhu
Tatsuya Akutsu
Geoffrey I Webb Department of Data Science and Artificial Intelligence, Monash University, Melbourne, VIC 3800, Australia,Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
Quan Zou Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
Lachlan J M Coin Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:
Jiangning Song Corresponding authors: Jiangning Song, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. E-mail: ; Lachlan J.M. Coin, Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia. E-mail: ; Quan Zou, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China. E-mail: ; Cangzhi Jia, School of Science, Dalian Maritime University, Dalian 116026, China. E-mail:

Collapse

To JPC, Davis IW, Marengo MS, Shariff A, Baublite C, Decker K, Galvão RM, Gao Z, Haragutchi O, Jung JW, Li H, O'Brien B, Sant A, Elich TD. Expression Elements Derived From Plant Sequences Provide Effective Gene Expression Regulation and New Opportunities for Plant Biotechnology Traits. FRONTIERS IN PLANT SCIENCE 2021;12:712179. [PMID: 34745155 PMCID: PMC8569612 DOI: 10.3389/fpls.2021.712179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Accepted: 09/15/2021] [Indexed: 06/13/2023]

Affiliation(s)

Jennifer P. C. To Bayer Crop Science, Chesterfield, MO, United States GrassRoots Biotechnology, Durham, NC, United States Monsanto Company, Research Triangle Park, Durham, NC, United States
Ian W. Davis Bayer Crop Science, Chesterfield, MO, United States GrassRoots Biotechnology, Durham, NC, United States Monsanto Company, Research Triangle Park, Durham, NC, United States
Matthew S. Marengo Bayer Crop Science, Chesterfield, MO, United States GrassRoots Biotechnology, Durham, NC, United States Monsanto Company, Research Triangle Park, Durham, NC, United States
Aabid Shariff GrassRoots Biotechnology, Durham, NC, United States Monsanto Company, Research Triangle Park, Durham, NC, United States Pairwise Plants, Durham, NC, United States
Catherine Baublite Bayer Crop Science, Chesterfield, MO, United States
Keith Decker Bayer Crop Science, Chesterfield, MO, United States
Rafaelo M. Galvão Bayer Crop Science, Chesterfield, MO, United States GrassRoots Biotechnology, Durham, NC, United States Monsanto Company, Research Triangle Park, Durham, NC, United States
Zhihuan Gao Bayer Crop Science, Chesterfield, MO, United States GrassRoots Biotechnology, Durham, NC, United States Monsanto Company, Research Triangle Park, Durham, NC, United States
Olivia Haragutchi Bayer Crop Science, Chesterfield, MO, United States GrassRoots Biotechnology, Durham, NC, United States Monsanto Company, Research Triangle Park, Durham, NC, United States
Jee W. Jung Bayer Crop Science, Chesterfield, MO, United States GrassRoots Biotechnology, Durham, NC, United States Monsanto Company, Research Triangle Park, Durham, NC, United States Duke University, Office for Translation and Commercialization, Durham, NC, United States
Hong Li Bayer Crop Science, Chesterfield, MO, United States
Brent O'Brien Bayer Crop Science, Chesterfield, MO, United States GrassRoots Biotechnology, Durham, NC, United States Monsanto Company, Research Triangle Park, Durham, NC, United States
Anagha Sant Bayer Crop Science, Chesterfield, MO, United States
Tedd D. Elich GrassRoots Biotechnology, Durham, NC, United States Monsanto Company, Research Triangle Park, Durham, NC, United States LifeEDIT Therapeutics, Durham, NC, United States

Collapse

Flavell RB. Perspective: 50 years of plant chromosome biology. PLANT PHYSIOLOGY 2021;185:731-753. [PMID: 33604616 PMCID: PMC8133586 DOI: 10.1093/plphys/kiaa108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Accepted: 12/04/2020] [Indexed: 06/12/2023]

Pachganov S, Murtazalieva K, Zarubin A, Taran T, Chartier D, Tatarinova TV. Prediction of Rice Transcription Start Sites Using TransPrise: A Novel Machine Learning Approach. Methods Mol Biol 2021;2238:261-274. [PMID: 33471337 DOI: 10.1007/978-1-0716-1068-8_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Sarpan N, Taranenko E, Ooi SE, Low ETL, Espinoza A, Tatarinova TV, Ong-Abdullah M. DNA methylation changes in clonally propagated oil palm. PLANT CELL REPORTS 2020;39:1219-1233. [PMID: 32591850 DOI: 10.1007/s00299-020-02561-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Accepted: 06/17/2020] [Indexed: 06/11/2023]

Pachganov S, Murtazalieva K, Zarubin A, Sokolov D, Chartier DR, Tatarinova TV. TransPrise: a novel machine learning approach for eukaryotic promoter prediction. PeerJ 2019;7:e7990. [PMID: 31695967 PMCID: PMC6827441 DOI: 10.7717/peerj.7990] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Accepted: 10/04/2019] [Indexed: 02/01/2023] Open

Abstract

As interest in genetic resequencing increases, so does the need for effective mathematical, computational, and statistical approaches. One of the difficult problems in genome annotation is determination of precise positions of transcription start sites. In this paper we present TransPrise-an efficient deep learning tool for prediction of positions of eukaryotic transcription start sites. Our pipeline consists of two parts: the binary classifier operates the first, and if a sequence is classified as TSS-containing the regression step follows, where the precise location of TSS is being identified. TransPrise offers significant improvement over existing promoter-prediction methods. To illustrate this, we compared predictions of TransPrise classification and regression models with the TSSPlant approach for the well annotated genome of Oryza sativa. Using a computer equipped with a graphics processing unit, the run time of TransPrise is 250 minutes on a genome of 374 Mb long. The Matthews correlation coefficient value for TransPrise is 0.79, more than two times larger than the 0.31 for TSSPlant classification models. This represents a high level of prediction accuracy. Additionally, the mean absolute error for the regression model is 29.19 nt, allowing for accurate prediction of TSS location. TransPrise was also tested in Homo sapiens, where mean absolute error of the regression model was 47.986 nt. We provide the full basis for the comparison and encourage users to freely access a set of our computational tools to facilitate and streamline their own analyses. The ready-to-use Docker image with all necessary packages, models, code as well as the source code of the TransPrise algorithm are available at (http://compubioverne.group/). The source code is ready to use and customizable to predict TSS in any eukaryotic organism.

Collapse

Tonnessen BW, Bossa-Castro AM, Mauleon R, Alexandrov N, Leach JE. Shared cis-regulatory architecture identified across defense response genes is associated with broad-spectrum quantitative resistance in rice. Sci Rep 2019;9:1536. [PMID: 30733489 PMCID: PMC6367480 DOI: 10.1038/s41598-018-38195-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Accepted: 12/18/2018] [Indexed: 12/30/2022] Open

Vishnevsky OV, Bocharnikov AV, Kolchanov NA. Argo_CUDA: Exhaustive GPU based approach for motif discovery in large DNA datasets. J Bioinform Comput Biol 2017;16:1740012. [PMID: 29281953 DOI: 10.1142/s0219720017400121] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Triska M, Solovyev V, Baranova A, Kel A, Tatarinova TV. Nucleotide patterns aiding in prediction of eukaryotic promoters. PLoS One 2017;12:e0187243. [PMID: 29141011 PMCID: PMC5687710 DOI: 10.1371/journal.pone.0187243] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2017] [Accepted: 09/05/2017] [Indexed: 01/09/2023] Open

Chan KL, Tatarinova TV, Rosli R, Amiruddin N, Azizi N, Halim MAA, Sanusi NSNM, Jayanthi N, Ponomarenko P, Triska M, Solovyev V, Firdaus-Raih M, Sambanthamurthi R, Murphy D, Low ETL. Evidence-based gene models for structural and functional annotations of the oil palm genome. Biol Direct 2017;12:21. [PMID: 28886750 PMCID: PMC5591544 DOI: 10.1186/s13062-017-0191-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2017] [Accepted: 08/07/2017] [Indexed: 11/13/2022] Open

Abstract

Background

Oil palm is an important source of edible oil. The importance of the crop, as well as its long breeding cycle (10-12 years) has led to the sequencing of its genome in 2013 to pave the way for genomics-guided breeding. Nevertheless, the first set of gene predictions, although useful, had many fragmented genes. Classification and characterization of genes associated with traits of interest, such as those for fatty acid biosynthesis and disease resistance, were also limited. Lipid-, especially fatty acid (FA)-related genes are of particular interest for the oil palm as they specify oil yields and quality. This paper presents the characterization of the oil palm genome using different gene prediction methods and comparative genomics analysis, identification of FA biosynthesis and disease resistance genes, and the development of an annotation database and bioinformatics tools.

Results

Using two independent gene-prediction pipelines, Fgenesh++ and Seqping, 26,059 oil palm genes with transcriptome and RefSeq support were identified from the oil palm genome. These coding regions of the genome have a characteristic broad distribution of GC₃ (fraction of cytosine and guanine in the third position of a codon) with over half the GC₃-rich genes (GC₃ ≥ 0.75286) being intronless. In comparison, only one-seventh of the oil palm genes identified are intronless. Using comparative genomics analysis, characterization of conserved domains and active sites, and expression analysis, 42 key genes involved in FA biosynthesis in oil palm were identified. For three of them, namely EgFABF, EgFABH and EgFAD3, segmental duplication events were detected. Our analysis also identified 210 candidate resistance genes in six classes, grouped by their protein domain structures.

Conclusions

We present an accurate and comprehensive annotation of the oil palm genome, focusing on analysis of important categories of genes (GC₃-rich and intronless), as well as those associated with important functions, such as FA biosynthesis and disease resistance. The study demonstrated the advantages of having an integrated approach to gene prediction and developed a computational framework for combining multiple genome annotations. These results, available in the oil palm annotation database (http://palmxplore.mpob.gov.my), will provide important resources for studies on the genomes of oil palm and related crops.

Reviewers

This article was reviewed by Alexander Kel, Igor Rogozin, and Vladimir A. Kuznetsov.

Electronic supplementary material

The online version of this article (doi:10.1186/s13062-017-0191-4) contains supplementary material, which is available to authorized users.

Collapse

Affiliation(s)

Kuang-Lim Chan Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia.,Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
Tatiana V Tatarinova Department of Biology, University of La Verne, La Verne, California, 91750, USA.,Spatial Sciences Institute, University of Southern California, Los Angeles, CA, 90089, USA
Rozana Rosli Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia.,Genomics and Computational Biology Research Group, University of South Wales, Pontypridd, CF371DL, UK
Nadzirah Amiruddin Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
Norazah Azizi Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
Mohd Amin Ab Halim Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
Nik Shazana Nik Mohd Sanusi Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
Nagappan Jayanthi Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
Petr Ponomarenko Spatial Sciences Institute, University of Southern California, Los Angeles, CA, 90089, USA
Martin Triska Children's Hospital Los Angeles, University of Southern California, Los Angeles, CA, 90089, USA
Victor Solovyev Softberry Inc., 116 Radio Circle, Suite 400, Mount Kisco, NY, 10549, USA
Mohd Firdaus-Raih Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
Ravigadevi Sambanthamurthi Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia
Denis Murphy Genomics and Computational Biology Research Group, University of South Wales, Pontypridd, CF371DL, UK
Eng-Ti Leslie Low Advanced Biotechnology and Breeding Centre, Malaysian Palm Oil Board, No. 6, Persiaran Institusi, Bandar Baru Bangi, 43000 Kajang, Selangor, Malaysia.

Collapse

Evolution of Brain Active Gene Promoters in Human Lineage Towards the Increased Plasticity of Gene Regulation. Mol Neurobiol 2017;55:1871-1904. [PMID: 28233272 DOI: 10.1007/s12035-017-0427-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Accepted: 01/26/2017] [Indexed: 01/31/2023]

Chan KL, Rosli R, Tatarinova TV, Hogan M, Firdaus-Raih M, Low ETL. Seqping: gene prediction pipeline for plant genomes using self-training gene models and transcriptomic data. BMC Bioinformatics 2017;18:1426. [PMID: 28466793 PMCID: PMC5333190 DOI: 10.1186/s12859-016-1426-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Gene prediction is one of the most important steps in the genome annotation process. A large number of software tools and pipelines developed by various computing techniques are available for gene prediction. However, these systems have yet to accurately predict all or even most of the protein-coding regions. Furthermore, none of the currently available gene-finders has a universal Hidden Markov Model (HMM) that can perform gene prediction for all organisms equally well in an automatic fashion.

RESULTS

We present an automated gene prediction pipeline, Seqping that uses self-training HMM models and transcriptomic data. The pipeline processes the genome and transcriptome sequences of the target species using GlimmerHMM, SNAP, and AUGUSTUS pipelines, followed by MAKER2 program to combine predictions from the three tools in association with the transcriptomic evidence. Seqping generates species-specific HMMs that are able to offer unbiased gene predictions. The pipeline was evaluated using the Oryza sativa and Arabidopsis thaliana genomes. Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis showed that the pipeline was able to identify at least 95% of BUSCO's plantae dataset. Our evaluation shows that Seqping was able to generate better gene predictions compared to three HMM-based programs (MAKER2, GlimmerHMM and AUGUSTUS) using their respective available HMMs. Seqping had the highest accuracy in rice (0.5648 for CDS, 0.4468 for exon, and 0.6695 nucleotide structure) and A. thaliana (0.5808 for CDS, 0.5955 for exon, and 0.8839 nucleotide structure).

CONCLUSIONS

Seqping provides researchers a seamless pipeline to train species-specific HMMs and predict genes in newly sequenced or less-studied genomes. We conclude that the Seqping pipeline predictions are more accurate than gene predictions using the other three approaches with the default or available HMMs.

Collapse

Zolotarenko A, Chekalin E, Mehta R, Baranova A, Tatarinova TV, Bruskin S. Identification of Transcriptional Regulators of Psoriasis from RNA-Seq Experiments. Methods Mol Biol 2017;1613:355-370. [PMID: 28849568 DOI: 10.1007/978-1-4939-7027-8_14] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]

Triska M, Ivliev A, Nikolsky Y, Tatarinova TV. Analysis of cis-Regulatory Elements in Gene Co-expression Networks in Cancer. Methods Mol Biol 2017;1613:291-310. [PMID: 28849565 DOI: 10.1007/978-1-4939-7027-8_11] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]

Abstract

Analysis of gene co-expression networks is a powerful "data-driven" tool, invaluable for understanding cancer biology and mechanisms of tumor development. Yet, despite of completion of thousands of studies on cancer gene expression, there were few attempts to normalize and integrate co-expression data from scattered sources in a concise "meta-analysis" framework. Here we describe an integrated approach to cancer expression meta-analysis, which combines generation of "data-driven" co-expression networks with detailed statistical detection of promoter sequence motifs within the co-expression clusters. First, we applied Weighted Gene Co-Expression Network Analysis (WGCNA) workflow and Pearson's correlation to generate a comprehensive set of over 3000 co-expression clusters in 82 normalized microarray datasets from nine cancers of different origin. Next, we designed a genome-wide statistical approach to the detection of specific DNA sequence motifs based on similarities between the promoters of similarly expressed genes. The approach, realized as cisExpress software module, was specifically designed for analysis of very large data sets such as those generated by publicly accessible whole genome and transcriptome projects. cisExpress uses a task farming algorithm to exploit all available computational cores within a shared memory node.We discovered that although co-expression modules are populated with different sets of genes, they share distinct stable patterns of co-regulation based on promoter sequence analysis. The number of motifs per co-expression cluster varies widely in accordance with cancer tissue of origin, with the largest number in colon (68 motifs) and the lowest in ovary (18 motifs). The top scored motifs are typically shared between several tissues; they define sets of target genes responsible for certain functionality of cancerogenesis. Both the co-expression modules and a database of precalculated motifs are publically available and accessible for further studies.

Collapse

Integrated computational approach to the analysis of RNA-seq data reveals new transcriptional regulators of psoriasis. Exp Mol Med 2016;48:e268. [PMID: 27811935 PMCID: PMC5133374 DOI: 10.1038/emm.2016.97] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2016] [Revised: 05/06/2016] [Accepted: 05/24/2016] [Indexed: 02/07/2023] Open

Tatarinova TV, Chekalin E, Nikolsky Y, Bruskin S, Chebotarov D, McNally KL, Alexandrov N. Nucleotide diversity analysis highlights functionally important genomic regions. Sci Rep 2016;6:35730. [PMID: 27774999 PMCID: PMC5075931 DOI: 10.1038/srep35730] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 09/30/2016] [Indexed: 12/15/2022] Open

Morozova I, Flegontov P, Mikheyev AS, Bruskin S, Asgharian H, Ponomarenko P, Klyuchnikov V, ArunKumar G, Prokhortchouk E, Gankin Y, Rogaev E, Nikolsky Y, Baranova A, Elhaik E, Tatarinova TV. Toward high-resolution population genomics using archaeological samples. DNA Res 2016;23:295-310. [PMID: 27436340 PMCID: PMC4991838 DOI: 10.1093/dnares/dsw029] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2015] [Accepted: 05/22/2016] [Indexed: 12/30/2022] Open

Affiliation(s)

Irina Morozova Institute of Evolutionary Medicine, University of Zurich, Zurich, Switzerland
Pavel Flegontov Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czech Republic Bioinformatics Center, A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russian Federation
Alexander S Mikheyev Ecology and Evolution Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
Sergey Bruskin Vavilov Institute of General Genetics RAS, Moscow, Russia
Hosseinali Asgharian Department of Computational and Molecular Biology, University of Southern California, Los Angeles, CA, USA
Petr Ponomarenko Center for Personalized Medicine, Children's Hospital Los Angeles, Los Angeles, CA, USA Spatial Sciences Institute, University of Southern California, Los Angeles, CA, USA
Vladimir Klyuchnikov Donskaya Archeologia, Rostov, Russia
GaneshPrasad ArunKumar School of Chemical and Biotechnology, SASTRA University, Tanjore, India
Egor Prokhortchouk Research Center of Biotechnology RAS, Moscow, Russia Department of Biology, Lomonosov Moscow State University, Russia
Yuriy Gankin EPAM Systems, Newtown, PA, USA
Evgeny Rogaev Vavilov Institute of General Genetics RAS, Moscow, Russia University of Massachusetts Medical School, Worcester, MA, USA
Yuri Nikolsky Vavilov Institute of General Genetics RAS, Moscow, Russia F1 Genomics, San Diego, CA, USA School of Systems Biology, George Mason University, VA, USA
Ancha Baranova School of Systems Biology, George Mason University, VA, USA Research Centre for Medical Genetics, Moscow, Russia Atlas Biomed Group, Moscow, Russia
Eran Elhaik Department of Animal & Plant Sciences, University of Sheffield, Sheffield, South Yorkshire, UK
Tatiana V Tatarinova Bioinformatics Center, A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russian Federation Center for Personalized Medicine, Children's Hospital Los Angeles, Los Angeles, CA, USA Spatial Sciences Institute, University of Southern California, Los Angeles, CA, USA

Collapse

Li WL, Buckley J, Sanchez-Lara PA, Maglinte DT, Viduetsky L, Tatarinova TV, Aparicio JG, Kim JW, Au M, Ostrow D, Lee TC, O'Gorman M, Judkins A, Cobrinik D, Triche TJ. A Rapid and Sensitive Next-Generation Sequencing Method to Detect RB1 Mutations Improves Care for Retinoblastoma Patients and Their Families. J Mol Diagn 2016;18:480-93. [PMID: 27155049 DOI: 10.1016/j.jmoldx.2016.02.006] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2015] [Revised: 01/14/2016] [Accepted: 02/01/2016] [Indexed: 01/26/2023] Open

Affiliation(s)

Wenhui L Li Department of Pathology and Laboratory Medicine, Children's Hospital Los Angeles, Los Angeles, California; Department of Pathology, USC Roski Eye Institute, University of Southern California, Los Angeles, California.
Jonathan Buckley Department of Pathology and Laboratory Medicine, Children's Hospital Los Angeles, Los Angeles, California; Department of Pathology, USC Roski Eye Institute, University of Southern California, Los Angeles, California
Pedro A Sanchez-Lara Department of Pathology and Laboratory Medicine, Children's Hospital Los Angeles, Los Angeles, California; Department of Pathology, USC Roski Eye Institute, University of Southern California, Los Angeles, California; Department of Pediatrics, USC Roski Eye Institute, University of Southern California, Los Angeles, California
Dennis T Maglinte Department of Pathology and Laboratory Medicine, Children's Hospital Los Angeles, Los Angeles, California
Lucy Viduetsky Department of Pathology and Laboratory Medicine, Children's Hospital Los Angeles, Los Angeles, California
Tatiana V Tatarinova Department of Pediatrics, USC Roski Eye Institute, University of Southern California, Los Angeles, California; Spatial Sciences Institute, Dornsife College of Letters, Arts and Sciences, University of Southern California, Los Angeles, California
Jennifer G Aparicio Vision Center, Children's Hospital Los Angeles, Los Angeles, California
Jonathan W Kim Vision Center, Children's Hospital Los Angeles, Los Angeles, California; Department of Opthalmology, USC Roski Eye Institute, University of Southern California, Los Angeles, California
Margaret Au Department of Pathology and Laboratory Medicine, Children's Hospital Los Angeles, Los Angeles, California
Dejerianne Ostrow Department of Pathology and Laboratory Medicine, Children's Hospital Los Angeles, Los Angeles, California
Thomas C Lee Vision Center, Children's Hospital Los Angeles, Los Angeles, California; Department of Opthalmology, USC Roski Eye Institute, University of Southern California, Los Angeles, California
Maurice O'Gorman Department of Pathology and Laboratory Medicine, Children's Hospital Los Angeles, Los Angeles, California; Department of Pathology, USC Roski Eye Institute, University of Southern California, Los Angeles, California
Alexander Judkins Department of Pathology and Laboratory Medicine, Children's Hospital Los Angeles, Los Angeles, California; Department of Pathology, USC Roski Eye Institute, University of Southern California, Los Angeles, California
David Cobrinik Vision Center, Children's Hospital Los Angeles, Los Angeles, California; Department of Opthalmology, USC Roski Eye Institute, University of Southern California, Los Angeles, California; Division of Ophthalmology and Department of Surgery, and Saban Research Institute, Children's Hospital Los Angeles, Los Angeles, California; Department of Biochemistry & Molecular Biology, USC Roski Eye Institute, University of Southern California, Los Angeles, California; Norris Comprehensive Cancer Center, USC Keck School of Medicine, University of Southern California, Los Angeles, California
Timothy J Triche Department of Pathology and Laboratory Medicine, Children's Hospital Los Angeles, Los Angeles, California; Department of Pathology, USC Roski Eye Institute, University of Southern California, Los Angeles, California.

Collapse

Jiang N, Wang L, Chen J, Wang L, Leach L, Luo Z. Conserved and divergent patterns of DNA methylation in higher vertebrates. Genome Biol Evol 2014;6:2998-3014. [PMID: 25355807 PMCID: PMC4255770 DOI: 10.1093/gbe/evu238] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/20/2014] [Indexed: 02/07/2023] Open

iRegulon: from a gene list to a gene regulatory network using large motif and track collections. PLoS Comput Biol 2014;10:e1003731. [PMID: 25058159 PMCID: PMC4109854 DOI: 10.1371/journal.pcbi.1003731] [Citation(s) in RCA: 613] [Impact Index Per Article: 61.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2014] [Accepted: 05/27/2014] [Indexed: 01/17/2023] Open

Abstract

Identifying master regulators of biological processes and mapping their downstream gene networks are key challenges in systems biology. We developed a computational method, called iRegulon, to reverse-engineer the transcriptional regulatory network underlying a co-expressed gene set using cis-regulatory sequence analysis. iRegulon implements a genome-wide ranking-and-recovery approach to detect enriched transcription factor motifs and their optimal sets of direct targets. We increase the accuracy of network inference by using very large motif collections of up to ten thousand position weight matrices collected from various species, and linking these to candidate human TFs via a motif2TF procedure. We validate iRegulon on gene sets derived from ENCODE ChIP-seq data with increasing levels of noise, and we compare iRegulon with existing motif discovery methods. Next, we use iRegulon on more challenging types of gene lists, including microRNA target sets, protein-protein interaction networks, and genetic perturbation data. In particular, we over-activate p53 in breast cancer cells, followed by RNA-seq and ChIP-seq, and could identify an extensive up-regulated network controlled directly by p53. Similarly we map a repressive network with no indication of direct p53 regulation but rather an indirect effect via E2F and NFY. Finally, we generalize our computational framework to include regulatory tracks such as ChIP-seq data and show how motif and track discovery can be combined to map functional regulatory interactions among co-expressed genes. iRegulon is available as a Cytoscape plugin from http://iregulon.aertslab.org.

Gene regulatory networks control developmental, homeostatic, and disease processes by governing precise levels and spatio-temporal patterns of gene expression. Determining their topology can provide mechanistic insight into these processes. Gene regulatory networks consist of interactions between transcription factors and their direct target genes. Each regulatory interaction represents the binding of the transcription factor to a specific DNA binding site near its target gene. Here we present a computational method, called iRegulon, to identify master regulators and direct target genes in a human gene signature, i.e. a set of co-expressed genes. iRegulon relies on the analysis of the regulatory sequences around each gene in the gene set to detect enriched TF motifs or ChIP-seq peaks, using databases of nearly 10.000 TF motifs and 1000 ChIP-seq data sets or “tracks”. Next, it associates enriched motifs and tracks with candidate transcription factors and determines the optimal subset of direct target genes. We validate iRegulon on ENCODE data, and use it in combination with RNA-seq and ChIP-seq data to map a p53 downstream network with new predicted co-factors and targets. iRegulon is available as a Cytoscape plugin, supporting human, mouse, and Drosophila genes, and provides access to hundreds of cancer-related TF-target subnetworks or “regulons”.

Collapse

NPEST: a nonparametric method and a database for transcription start site prediction. QUANTITATIVE BIOLOGY 2014;1:261-271. [PMID: 25197613 DOI: 10.1007/s40484-013-0022-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Triska M, Grocutt D, Southern J, Murphy DJ, Tatarinova T. cisExpress: motif detection in DNA sequences. ACTA ACUST UNITED AC 2013;29:2203-5. [PMID: 23793750 DOI: 10.1093/bioinformatics/btt366] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

POWRS: position-sensitive motif discovery. PLoS One 2012;7:e40373. [PMID: 22792292 PMCID: PMC3390389 DOI: 10.1371/journal.pone.0040373] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2012] [Accepted: 06/07/2012] [Indexed: 12/04/2022] Open

Xie T, Zhang C, Zhang B, Molony C, Oudes A, Roberts C, Dai H, Schadt E, Lamb J. A survey of cancer cell lines reveals highly structured and hierarchical relationships within and between DNA and mRNA that may be the result of selection. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2010;14:91-7. [PMID: 20141331 DOI: 10.1089/omi.2009.0114] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]

Tatarinova TV, Alexandrov NN, Bouck JB, Feldmann KA. GC3 biology in corn, rice, sorghum and other grasses. BMC Genomics 2010;11:308. [PMID: 20470436 PMCID: PMC2895627 DOI: 10.1186/1471-2164-11-308] [Citation(s) in RCA: 105] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2009] [Accepted: 05/16/2010] [Indexed: 11/10/2022] Open