1
|
Yang YL, Cushman SA, Wang SC, Wang F, Li Q, Liu HL, Li Y. Genome-wide investigation of the WRKY transcription factor gene family in weeping forsythia: expression profile and cold and drought stress responses. Genetica 2023; 151:153-165. [PMID: 36853516 PMCID: PMC9973247 DOI: 10.1007/s10709-023-00184-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 02/21/2023] [Indexed: 03/01/2023]
Abstract
Weeping forsythia is a wide-spread shrub in China with important ornamental, medicinal and ecological values. It is widely distributed in China's warm temperate zone. In plants, WRKY transcription factors play important regulatory roles in seed germination, flower development, fruit ripening and coloring, and biotic and abiotic stress response. To date, WRKY transcription factors have not been systematically studied in weeping forsythia. In this study, we identified 79 WRKY genes in weeping forsythia and classified them according to their naming rules in Arabidopsis thaliana. Phylogenetic tree analysis showed that, except for IIe subfamily, whose clustering was inconsistent with A. thaliana clustering, other subfamily clustering groups were consistent. Cis-element analysis showed that WRKY genes related to pathogen resistance in weeping forsythia might be related to methyl jasmonate and salicylic acid-mediated signaling pathways. Combining cis-element and expression pattern analyses of WRKY genes showed that more than half of WRKY genes were involved in light-dependent development and morphogenesis in different tissues. The gene expression results showed that 13 WRKY genes were involved in drought response, most of which might be related to the abscisic acid signaling pathway, and a few of which might be regulated by MYB transcription factors. The gene expression results under cold stress showed that 17 WRKY genes were involved in low temperature response, and 9 of them had low temperature responsiveness cis-elements. Our study of WRKY family in weeping forsythia provided useful resources for molecular breeding and important clues for their functional verification.
Collapse
Affiliation(s)
- Ya-Lin Yang
- Innovation Platform of Molecular Biology, College of Landscape and Art, Henan Agricultural University, Zhengzhou, China
| | - Samuel A Cushman
- School of Forestry, Northern Arizona University, Flagstaff, AZ, USA
| | - Shu-Chen Wang
- Innovation Platform of Molecular Biology, College of Landscape and Art, Henan Agricultural University, Zhengzhou, China
| | - Fan Wang
- Innovation Platform of Molecular Biology, College of Landscape and Art, Henan Agricultural University, Zhengzhou, China
| | - Qian Li
- Innovation Platform of Molecular Biology, College of Landscape and Art, Henan Agricultural University, Zhengzhou, China
| | - Hong-Li Liu
- Innovation Platform of Molecular Biology, College of Landscape and Art, Henan Agricultural University, Zhengzhou, China
| | - Yong Li
- College of Life Science and Technology, Inner Mongolia Normal University, Huhehaote, China. .,State Key Laboratory of Tree Genetics and Breeding, Chinese Academy of Forestry, Beijing, China.
| |
Collapse
|
2
|
Gao Y, He X, Lv H, Liu H, Li Y, Hu Y, Liu Y, Huang Y, Zhang J. Epi-Brassinolide Regulates ZmC4 NADP-ME Expression through the Transcription Factors ZmbHLH157 and ZmNF-YC2. Int J Mol Sci 2023; 24:ijms24054614. [PMID: 36902048 PMCID: PMC10002761 DOI: 10.3390/ijms24054614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Revised: 02/18/2023] [Accepted: 02/20/2023] [Indexed: 03/05/2023] Open
Abstract
Maize is a main food and feed crop with great production potential and high economic benefits. Improving its photosynthesis efficiency is crucial for increasing yield. Maize photosynthesis occurs mainly through the C4 pathway, and NADP-ME (NADP-malic enzyme) is a key enzyme in the photosynthetic carbon assimilation pathway of C4 plants. ZmC4-NADP-ME catalyzes the release of CO2 from oxaloacetate into the Calvin cycle in the maize bundle sheath. Brassinosteroid (BL) can improve photosynthesis; however, its molecular mechanism of action remains unclear. In this study, transcriptome sequencing of maize seedlings treated with epi-brassinolide (EBL) showed that differentially expressed genes (DEGs) were significantly enriched in photosynthetic antenna proteins, porphyrin and chlorophyll metabolism, and photosynthesis pathways. The DEGs of C4-NADP-ME and pyruvate phosphate dikinase in the C4 pathway were significantly enriched in EBL treatment. Co-expression analysis showed that the transcription level of ZmNF-YC2 and ZmbHLH157 transcription factors was increased under EBL treatment and moderately positively correlated with ZmC4-NADP-ME. Transient overexpression of protoplasts revealed that ZmNF-YC2 and ZmbHLH157 activate C4-NADP-ME promoters. Further experiments showed ZmNF-YC2 and ZmbHLH157 transcription factor binding sites on the -1616 bp and -1118 bp ZmC4 NADP-ME promoter. ZmNF-YC2 and ZmbHLH157 were screened as candidate transcription factors mediating brassinosteroid hormone regulation of the ZmC4 NADP-ME gene. The results provide a theoretical basis for improving maize yield using BR hormones.
Collapse
Affiliation(s)
- Yuanfen Gao
- College of Life Science, Sichuan Agricultural University, Ya’an 625000, China
| | - Xuewu He
- College of Life Science, Sichuan Agricultural University, Ya’an 625000, China
| | - Huayang Lv
- College of Life Science, Sichuan Agricultural University, Ya’an 625000, China
| | - Hanmei Liu
- College of Life Science, Sichuan Agricultural University, Ya’an 625000, China
| | - Yangping Li
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu 611130, China
| | - Yufeng Hu
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu 611130, China
| | - Yinghong Liu
- Maize Research Institute, Sichuan Agricultural University, Chengdu 611130, China
| | - Yubi Huang
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest China, Sichuan Agricultural University, Chengdu 611130, China
- Correspondence: (Y.H.); (J.Z.)
| | - Junjie Zhang
- College of Life Science, Sichuan Agricultural University, Ya’an 625000, China
- Correspondence: (Y.H.); (J.Z.)
| |
Collapse
|
3
|
Liu Y, Yuan G, Hassan MM, Abraham PE, Mitchell JC, Jacobson D, Tuskan GA, Khakhar A, Medford J, Zhao C, Liu CJ, Eckert CA, Doktycz MJ, Tschaplinski TJ, Yang X. Biological and Molecular Components for Genetically Engineering Biosensors in Plants. BIODESIGN RESEARCH 2022; 2022:9863496. [PMID: 37850147 PMCID: PMC10521658 DOI: 10.34133/2022/9863496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Accepted: 10/08/2022] [Indexed: 10/19/2023] Open
Abstract
Plants adapt to their changing environments by sensing and responding to physical, biological, and chemical stimuli. Due to their sessile lifestyles, plants experience a vast array of external stimuli and selectively perceive and respond to specific signals. By repurposing the logic circuitry and biological and molecular components used by plants in nature, genetically encoded plant-based biosensors (GEPBs) have been developed by directing signal recognition mechanisms into carefully assembled outcomes that are easily detected. GEPBs allow for in vivo monitoring of biological processes in plants to facilitate basic studies of plant growth and development. GEPBs are also useful for environmental monitoring, plant abiotic and biotic stress management, and accelerating design-build-test-learn cycles of plant bioengineering. With the advent of synthetic biology, biological and molecular components derived from alternate natural organisms (e.g., microbes) and/or de novo parts have been used to build GEPBs. In this review, we summarize the framework for engineering different types of GEPBs. We then highlight representative validated biological components for building plant-based biosensors, along with various applications of plant-based biosensors in basic and applied plant science research. Finally, we discuss challenges and strategies for the identification and design of biological components for plant-based biosensors.
Collapse
Affiliation(s)
- Yang Liu
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA
| | - Guoliang Yuan
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA
- The Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA
| | - Md Mahmudul Hassan
- Department of Genetics and Plant Breeding, Patuakhali Science and Technology University, Dumki, Patuakhali, 8602, Bangladesh
| | - Paul E. Abraham
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA
- The Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA
| | - Julie C. Mitchell
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA
| | - Daniel Jacobson
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA
- The Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA
| | - Gerald A. Tuskan
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA
- The Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA
| | - Arjun Khakhar
- Department of Biology, Colorado State University, Fort Collins, Colorado 80523, USA
| | - June Medford
- Department of Biology, Colorado State University, Fort Collins, Colorado 80523, USA
| | - Cheng Zhao
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Chang-Jun Liu
- Biology Department, Brookhaven National Laboratory, Upton, New York 11973, USA
| | - Carrie A. Eckert
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA
- The Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA
| | - Mitchel J. Doktycz
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA
| | - Timothy J. Tschaplinski
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA
- The Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA
| | - Xiaohan Yang
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA
- The Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA
| |
Collapse
|
4
|
Zhao XL, Yang YL, Xia HX, Li Y. Genome-wide analysis of the carotenoid cleavage dioxygenases gene family in Forsythia suspensa: Expression profile and cold and drought stress responses. FRONTIERS IN PLANT SCIENCE 2022; 13:998911. [PMID: 36204048 PMCID: PMC9531035 DOI: 10.3389/fpls.2022.998911] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 08/29/2022] [Indexed: 06/12/2023]
Abstract
Forsythia suspensa is a famous ornamental and medicinal plant in Oleaceae. CCD family is involved in the synthesis of pigments, volatiles, strigolactones, and abscisic acid (ABA) in plants. In this study, the CCD family in F. suspensa was analyzed at the genome level. A total of 16 members of the CCD family were identified, which included 11 members of the carotenoid cleavage dioxygenases (CCD) subfamily and 5 members of the 9-cis epoxycarotenoid dioxygenases (NCED) subfamily. The expression analysis of different tissues demonstrated that three FsCCD1 genes might be involved in the synthesis of pigments and volatiles in flowers and fruits. Three CCD4 genes were effectively expressed in flowers, while only FsCCD4-3 was effectively expressed in fruits. Comparison of CCD4 between Osmanthus fragrans and F. suspensa showed that the structure of FsCCD4-1 is was comparable that of OfCCD4-1 protein, indicating that the protein might be performing, especially in catalyzing the synthesis of β-ionone. However, further comparison of the upstream promoter regions showed that the proteins have major differences in the composition of cis-elements, which might be responsible for differences in β-ionone content. On the other hand, four NCED genes were significantly up-regulated under cold stress while two were up-regulated in drought stress. The data showed that these genes might be involved in the synthesis of ABA. Taken together, our data improves understanding of the CCD family and provides key candidate genes associated with cold and drought stresses in F. suspensa.
Collapse
Affiliation(s)
- Xiao-Liang Zhao
- School of Basic Medicine, Xinxiang Medical University, Xinxiang, China
| | - Ya-Lin Yang
- Innovation Platform of Molecular Biology, College of Landscape and Art, Henan Agricultural University, Zhengzhou, China
| | - He-Xiao Xia
- Innovation Platform of Molecular Biology, College of Landscape and Art, Henan Agricultural University, Zhengzhou, China
| | - Yong Li
- Innovation Platform of Molecular Biology, College of Landscape and Art, Henan Agricultural University, Zhengzhou, China
- State Key Laboratory of Tree Genetics and Breeding, Chinese Academy of Forestry, Beijing, China
| |
Collapse
|
5
|
Calderón L, Schindler K, Malin SG, Schebesta A, Sun Q, Schwickert T, Alberti C, Fischer M, Jaritz M, Tagoh H, Ebert A, Minnich M, Liston A, Cochella L, Busslinger M. Pax5 regulates B cell immunity by promoting PI3K signaling via PTEN down-regulation. Sci Immunol 2021; 6:6/61/eabg5003. [PMID: 34301800 DOI: 10.1126/sciimmunol.abg5003] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Accepted: 06/22/2021] [Indexed: 12/26/2022]
Abstract
The transcription factor Pax5 controls B cell development, but its role in mature B cells is largely enigmatic. Here, we demonstrated that the loss of Pax5 by conditional mutagenesis in peripheral B lymphocytes led to the strong reduction of B-1a, marginal zone (MZ), and germinal center (GC) B cells as well as plasma cells. Follicular (FO) B cells tolerated the loss of Pax5 but had a shortened half-life. The Pax5-deficient FO B cells failed to proliferate upon B cell receptor or Toll-like receptor stimulation due to impaired PI3K-AKT signaling, which was caused by increased expression of PTEN, a negative regulator of the PI3K pathway. Pax5 restrained PTEN protein expression at the posttranscriptional level, likely involving Pten-targeting microRNAs. Additional PTEN loss in Pten,Pax5 double-mutant mice rescued FO B cell numbers and the development of MZ B cells but did not restore GC B cell formation. Hence, the posttranscriptional down-regulation of PTEN expression is an important function of Pax5 that facilitates the differentiation and survival of mature B cells, thereby promoting humoral immunity.
Collapse
Affiliation(s)
- Lesly Calderón
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, A-1030 Vienna, Austria
| | - Karina Schindler
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, A-1030 Vienna, Austria
| | - Stephen G Malin
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, A-1030 Vienna, Austria.,Laboratory of Immunobiology, Department of Medicine Solna, Karolinska Institute, Stockholm, Sweden
| | - Alexandra Schebesta
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, A-1030 Vienna, Austria
| | - Qiong Sun
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, A-1030 Vienna, Austria
| | - Tanja Schwickert
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, A-1030 Vienna, Austria
| | - Chiara Alberti
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, A-1030 Vienna, Austria
| | - Maria Fischer
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, A-1030 Vienna, Austria
| | - Markus Jaritz
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, A-1030 Vienna, Austria
| | - Hiromi Tagoh
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, A-1030 Vienna, Austria
| | - Anja Ebert
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, A-1030 Vienna, Austria
| | - Martina Minnich
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, A-1030 Vienna, Austria
| | - Adrian Liston
- Laboratory of Lymphocyte Signalling and Development, The Babraham Institute, Cambridge CB22 3AT, UK
| | - Luisa Cochella
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, A-1030 Vienna, Austria
| | - Meinrad Busslinger
- Research Institute of Molecular Pathology (IMP), Vienna Biocenter (VBC), Campus-Vienna-Biocenter 1, A-1030 Vienna, Austria.
| |
Collapse
|
6
|
Yang Y, Lee JH, Poindexter MR, Shao Y, Liu W, Lenaghan SC, Ahkami AH, Blumwald E, Stewart CN. Rational design and testing of abiotic stress-inducible synthetic promoters from poplar cis-regulatory elements. PLANT BIOTECHNOLOGY JOURNAL 2021; 19:1354-1369. [PMID: 33471413 PMCID: PMC8313130 DOI: 10.1111/pbi.13550] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 12/31/2020] [Accepted: 01/09/2021] [Indexed: 05/27/2023]
Abstract
Abiotic stress resistance traits may be especially crucial for sustainable production of bioenergy tree crops. Here, we show the performance of a set of rationally designed osmotic-related and salt stress-inducible synthetic promoters for use in hybrid poplar. De novo motif-detecting algorithms yielded 30 water-deficit (SD) and 34 salt stress (SS) candidate DNA motifs from relevant poplar transcriptomes. We selected three conserved water-deficit stress motifs (SD18, SD13 and SD9) found in 16 co-expressed gene promoters, and we discovered a well-conserved motif for salt response (SS16). We characterized several native poplar stress-inducible promoters to enable comparisons with our synthetic promoters. Fifteen synthetic promoters were designed using various SD and SS subdomains, in which heptameric repeats of five-to-eight subdomain bases were fused to a common core promoter downstream, which, in turn, drove a green fluorescent protein (GFP) gene for reporter assays. These 15 synthetic promoters were screened by transient expression assays in poplar leaf mesophyll protoplasts and agroinfiltrated Nicotiana benthamiana leaves under osmotic stress conditions. Twelve synthetic promoters were induced in transient expression assays with a GFP readout. Of these, five promoters (SD18-1, SD9-2, SS16-1, SS16-2 and SS16-3) endowed higher inducibility under osmotic stress conditions than native promoters. These five synthetic promoters were stably transformed into Arabidopsis thaliana to study inducibility in whole plants. Herein, SD18-1 and SD9-2 were induced by water-deficit stress, whereas SS16-1, SS16-2 and SS16-3 were induced by salt stress. The synthetic biology design pipeline resulted in five synthetic promoters that outperformed endogenous promoters in transgenic plants.
Collapse
Affiliation(s)
- Yongil Yang
- Center for Agricultural Synthetic BiologyUniversity of Tennessee Institute of AgricultureKnoxvilleTNUSA
- Department of Plant SciencesUniversity of TennesseeKnoxvilleTNUSA
| | - Jun Hyung Lee
- Center for Agricultural Synthetic BiologyUniversity of Tennessee Institute of AgricultureKnoxvilleTNUSA
- Department of Plant SciencesUniversity of TennesseeKnoxvilleTNUSA
- Biosciences DivisionOak Ridge National LaboratoryOak RidgeTNUSA
| | - Magen R. Poindexter
- Center for Agricultural Synthetic BiologyUniversity of Tennessee Institute of AgricultureKnoxvilleTNUSA
- Department of Plant SciencesUniversity of TennesseeKnoxvilleTNUSA
| | - Yuanhua Shao
- Center for Agricultural Synthetic BiologyUniversity of Tennessee Institute of AgricultureKnoxvilleTNUSA
- Department of Plant SciencesUniversity of TennesseeKnoxvilleTNUSA
| | - Wusheng Liu
- Department of Plant SciencesUniversity of TennesseeKnoxvilleTNUSA
- Department of Horticultural ScienceNorth Carolina State UniversityRaleighNCUSA
| | - Scott C. Lenaghan
- Center for Agricultural Synthetic BiologyUniversity of Tennessee Institute of AgricultureKnoxvilleTNUSA
- Department of Food ScienceUniversity of TennesseeKnoxvilleTNUSA
| | - Amir H. Ahkami
- Environmental Molecular Sciences Laboratory (EMSL)Pacific Northwest National Laboratory (PNNL)RichlandWAUSA
| | | | - Charles Neal Stewart
- Center for Agricultural Synthetic BiologyUniversity of Tennessee Institute of AgricultureKnoxvilleTNUSA
- Department of Plant SciencesUniversity of TennesseeKnoxvilleTNUSA
| |
Collapse
|
7
|
Transcription factor expression defines subclasses of developing projection neurons highly similar to single-cell RNA-seq subtypes. Proc Natl Acad Sci U S A 2020; 117:25074-25084. [PMID: 32948690 DOI: 10.1073/pnas.2008013117] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
We are only just beginning to catalog the vast diversity of cell types in the cerebral cortex. Such categorization is a first step toward understanding how diversification relates to function. All cortical projection neurons arise from a uniform pool of progenitor cells that lines the ventricles of the forebrain. It is still unclear how these progenitor cells generate the more than 50 unique types of mature cortical projection neurons defined by their distinct gene-expression profiles. Moreover, exactly how and when neurons diversify their function during development is unknown. Here we relate gene expression and chromatin accessibility of two subclasses of projection neurons with divergent morphological and functional features as they develop in the mouse brain between embryonic day 13 and postnatal day 5 in order to identify transcriptional networks that diversify neuron cell fate. We compare these gene-expression profiles with published profiles of single cells isolated from similar populations and establish that layer-defined cell classes encompass cell subtypes and developmental trajectories identified using single-cell sequencing. Given the depth of our sequencing, we identify groups of transcription factors with particularly dense subclass-specific regulation and subclass-enriched transcription factor binding motifs. We also describe transcription factor-adjacent long noncoding RNAs that define each subclass and validate the function of Myt1l in balancing the ratio of the two subclasses in vitro. Our multidimensional approach supports an evolving model of progressive restriction of cell fate competence through inherited transcriptional identities.
Collapse
|
8
|
Xia X. Beyond Trees: Regulons and Regulatory Motif Characterization. Genes (Basel) 2020; 11:genes11090995. [PMID: 32854400 PMCID: PMC7564462 DOI: 10.3390/genes11090995] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2020] [Revised: 08/13/2020] [Accepted: 08/24/2020] [Indexed: 12/14/2022] Open
Abstract
Trees and their seeds regulate their germination, growth, and reproduction in response to environmental stimuli. These stimuli, through signal transduction, trigger transcription factors that alter the expression of various genes leading to the unfolding of the genetic program. A regulon is conceptually defined as a set of target genes regulated by a transcription factor by physically binding to regulatory motifs to accomplish a specific biological function, such as the CO-FT regulon for flowering timing and fall growth cessation in trees. Only with a clear characterization of regulatory motifs, can candidate target genes be experimentally validated, but motif characterization represents the weakest feature of regulon research, especially in tree genetics. I review here relevant experimental and bioinformatics approaches in characterizing transcription factors and their binding sites, outline problems in tree regulon research, and demonstrate how transcription factor databases can be effectively used to aid the characterization of tree regulons.
Collapse
Affiliation(s)
- Xuhua Xia
- Department of Biology, University of Ottawa, Ottawa, ON K1N 6N5, Canada;
- Ottawa Institute of Systems Biology, Ottawa, ON K1H 8M5, Canada
| |
Collapse
|
9
|
Grote A, Li Y, Liu C, Voronin D, Geber A, Lustigman S, Unnasch TR, Welch L, Ghedin E. Prediction pipeline for discovery of regulatory motifs associated with Brugia malayi molting. PLoS Negl Trop Dis 2020; 14:e0008275. [PMID: 32574217 PMCID: PMC7337397 DOI: 10.1371/journal.pntd.0008275] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Revised: 07/06/2020] [Accepted: 04/07/2020] [Indexed: 11/19/2022] Open
Abstract
Filarial nematodes can cause debilitating diseases in humans. They have complicated life cycles involving an insect vector and mammalian hosts, and they go through a number of developmental molts. While whole genome sequences of parasitic worms are now available, very little is known about transcription factor (TF) binding sites and their cognate transcription factors that play a role in regulating development. To address this gap, we developed a novel motif prediction pipeline, Emotif Alpha, that integrates ten different motif discovery algorithms, multiple statistical tests, and a comparative analysis of conserved elements between the filarial worms Brugia malayi and Onchocerca volvulus, and the free-living nematode Caenorhabditis elegans. We identified stage-specific TF binding motifs in B. malayi, with a particular focus on those potentially involved in the L3-L4 molt, a stage important for the establishment of infection in the mammalian host. Using an in vitro molting system, we tested and validated three of these motifs demonstrating the accuracy of the motif prediction pipeline. Diseases caused by parasitic worms such as the filariae are among the leading causes of morbidity in the developing world. Very little is known about how development is regulated in these vector-transmitted parasites. We have developed a computational method to identify motifs that correspond to transcription factor binding sites in the genome of the parasitic worm, Brugia malayi, one of the causative agents of lymphatic filariasis. Using this approach, we were able to predict stage-specific transcription factor binding sites involved in a stage of the molting process important for the establishment of the infection. We validated the role of these motifs using an in vitro molting system.
Collapse
Affiliation(s)
- Alexandra Grote
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
| | - Yichao Li
- School of Computer Science and Electrical Engineering, Ohio University, Athens, Ohio, United States of America
| | - Canhui Liu
- Center for Global Infectious Disease Research, University of South Florida, Tampa, FL, Florida, United States of America
| | - Denis Voronin
- Laboratory of Molecular Parasitology, Lindsley F. Kimball Research Institute, New York Blood Center, New York, New York, United States of America
| | - Adam Geber
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
| | - Sara Lustigman
- Laboratory of Molecular Parasitology, Lindsley F. Kimball Research Institute, New York Blood Center, New York, New York, United States of America
| | - Thomas R. Unnasch
- Center for Global Infectious Disease Research, University of South Florida, Tampa, FL, Florida, United States of America
| | - Lonnie Welch
- School of Computer Science and Electrical Engineering, Ohio University, Athens, Ohio, United States of America
- * E-mail: (LW); (EG)
| | - Elodie Ghedin
- Department of Biology, Center for Genomics and Systems Biology, New York University, New York, New York, United States of America
- Department of Epidemiology, School of Global Public Health, New York University, New York, New York, United States of America
- * E-mail: (LW); (EG)
| |
Collapse
|
10
|
Zeng S, Lyu Z, Narisetti SRK, Xu D, Joshi T. Knowledge Base Commons (KBCommons) v1.1: a universal framework for multi-omics data integration and biological discoveries. BMC Genomics 2019; 20:947. [PMID: 31856718 PMCID: PMC6923931 DOI: 10.1186/s12864-019-6287-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND Knowledge Base Commons (KBCommons) v1.1 is a universal and all-inclusive web-based framework providing generic functionalities for storing, sharing, analyzing, exploring, integrating and visualizing multiple organisms' genomics and integrative omics data. KBCommons is designed and developed to integrate diverse multi-level omics data and to support biological discoveries for all species via a common platform. METHODS KBCommons has four modules including data storage, data processing, data accessing, and web interface for data management and retrieval. It provides a comprehensive framework for new plant-specific, animal-specific, virus-specific, bacteria-specific or human disease-specific knowledge base (KB) creation, for adding new genome versions and additional multi-omics data to existing KBs, and for exploring existing datasets within current KBs. RESULTS KBCommons has an array of tools for data visualization and data analytics such as multiple gene/metabolite search, gene family/Pfam/Panther function annotation search, miRNA/metabolite/trait/SNP search, differential gene expression analysis, and bulk data download capacity. It contains a highly reliable data privilege management system to make users' data publicly available easily and to share private or pre-publication data with members in their collaborative groups safely and securely. It allows users to conduct data analysis using our in-house developed workflow functionalities that are linked to XSEDE high performance computing resources. Using KBCommons' intuitive web interface, users can easily retrieve genomic data, multi-omics data and analysis results from workflow according to their requirements and interests. CONCLUSIONS KBCommons addresses the needs of many diverse research communities to have a comprehensive multi-level OMICS web resource for data retrieval, sharing, analysis and visualization. KBCommons can be publicly accessed through a dedicated link for all organisms at http://kbcommons.org/.
Collapse
Affiliation(s)
- Shuai Zeng
- Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO USA
- Christopher S. Bond Life Sciences Center, University of Missouri-Columbia, Columbia, MO USA
| | - Zhen Lyu
- Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO USA
- MU Institute for Data Science and Informatics, University of Missouri-Columbia, Columbia, MO USA
| | - Siva Ratna Kumari Narisetti
- Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, University of Missouri-Columbia, Columbia, MO USA
- Christopher S. Bond Life Sciences Center, University of Missouri-Columbia, Columbia, MO USA
- MU Institute for Data Science and Informatics, University of Missouri-Columbia, Columbia, MO USA
| | - Trupti Joshi
- Christopher S. Bond Life Sciences Center, University of Missouri-Columbia, Columbia, MO USA
- MU Institute for Data Science and Informatics, University of Missouri-Columbia, Columbia, MO USA
- Department of Health Management, Informatics University of Missouri-Columbia, Columbia, MO USA
| |
Collapse
|
11
|
Hashim FA, Houssein EH, Hussain K, Mabrouk MS, Al-Atabany W. A modified Henry gas solubility optimization for solving motif discovery problem. Neural Comput Appl 2019. [DOI: 10.1007/s00521-019-04611-0] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
12
|
Myers KS, Riley NM, MacGilvray ME, Sato TK, McGee M, Heilberger J, Coon JJ, Gasch AP. Rewired cellular signaling coordinates sugar and hypoxic responses for anaerobic xylose fermentation in yeast. PLoS Genet 2019; 15:e1008037. [PMID: 30856163 PMCID: PMC6428351 DOI: 10.1371/journal.pgen.1008037] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 03/21/2019] [Accepted: 02/20/2019] [Indexed: 01/08/2023] Open
Abstract
Microbes can be metabolically engineered to produce biofuels and biochemicals, but rerouting metabolic flux toward products is a major hurdle without a systems-level understanding of how cellular flux is controlled. To understand flux rerouting, we investigated a panel of Saccharomyces cerevisiae strains with progressive improvements in anaerobic fermentation of xylose, a sugar abundant in sustainable plant biomass used for biofuel production. We combined comparative transcriptomics, proteomics, and phosphoproteomics with network analysis to understand the physiology of improved anaerobic xylose fermentation. Our results show that upstream regulatory changes produce a suite of physiological effects that collectively impact the phenotype. Evolved strains show an unusual co-activation of Protein Kinase A (PKA) and Snf1, thus combining responses seen during feast on glucose and famine on non-preferred sugars. Surprisingly, these regulatory changes were required to mount the hypoxic response when cells were grown on xylose, revealing a previously unknown connection between sugar source and anaerobic response. Network analysis identified several downstream transcription factors that play a significant, but on their own minor, role in anaerobic xylose fermentation, consistent with the combinatorial effects of small-impact changes. We also discovered that different routes of PKA activation produce distinct phenotypes: deletion of the RAS/PKA inhibitor IRA2 promotes xylose growth and metabolism, whereas deletion of PKA inhibitor BCY1 decouples growth from metabolism to enable robust fermentation without division. Comparing phosphoproteomic changes across ira2Δ and bcy1Δ strains implicated regulatory changes linked to xylose-dependent growth versus metabolism. Together, our results present a picture of the metabolic logic behind anaerobic xylose flux and suggest that widespread cellular remodeling, rather than individual metabolic changes, is an important goal for metabolic engineering.
Collapse
Affiliation(s)
- Kevin S. Myers
- Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI, United States of America
| | - Nicholas M. Riley
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, United States of America
| | - Matthew E. MacGilvray
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI, United States of America
| | - Trey K. Sato
- Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI, United States of America
| | - Mick McGee
- Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI, United States of America
| | - Justin Heilberger
- Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI, United States of America
| | - Joshua J. Coon
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI, United States of America
- Genome Center of Wisconsin, University of Wisconsin-Madison, Madison, WI, United States of America
- Department of Biomolecular Chemistry, University of Wisconsin-Madison, Madison, WI, United States of America
- Morgridge Institute for Research, Madison, WI, United States of America
| | - Audrey P. Gasch
- Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI, United States of America
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, WI, United States of America
- Genome Center of Wisconsin, University of Wisconsin-Madison, Madison, WI, United States of America
| |
Collapse
|
13
|
Hashim FA, Mabrouk MS, Atabany WA. Comparative Analysis of DNA Motif Discovery Algorithms: A Systemic Review. CURRENT CANCER THERAPY REVIEWS 2019. [DOI: 10.2174/1573394714666180417161728] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Bioinformatics is an interdisciplinary field that combines biology and information
technology to study how to deal with the biological data. The DNA motif discovery
problem is the main challenge of genome biology and its importance is directly proportional to increasing
sequencing technologies which produce large amounts of data. DNA motif is a repeated
portion of DNA sequences of major biological interest with important structural and functional
features. Motif discovery plays a vital role in the antibody-biomarker identification which is useful
for diagnosis of disease and to identify Transcription Factor Binding Sites (TFBSs) that help in
learning the mechanisms for regulation of gene expression. Recently, scientists discovered that the
TFs have a mutation rate five times higher than the flanking sequences, so motif discovery also
has a crucial role in cancer discovery.
Methods:
Over the past decades, many attempts use different algorithms to design fast and accurate
motif discovery tools. These algorithms are generally classified into consensus or probabilistic
approach.
Results:
Many of DNA motif discovery algorithms are time-consuming and easily trapped in a local
optimum.
Conclusion:
Nature-inspired algorithms and many of combinatorial algorithms are recently proposed
to overcome the problems of consensus and probabilistic approaches. This paper presents a
general classification of motif discovery algorithms with new sub-categories. It also presents a
summary comparison between them.
Collapse
Affiliation(s)
- Fatma A. Hashim
- Department of Biomedical Engineering, Helwan University, Helwan, Egypt
| | - Mai S. Mabrouk
- Department of Biomedical Engineering, Misr University for Science and Technology (MUST), Cairo, Egypt
| | | |
Collapse
|
14
|
Lee NK, Li X, Wang D. A comprehensive survey on genetic algorithms for DNA motif prediction. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2018.07.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
15
|
Abstract
Codon usage depends on mutation bias, tRNA-mediated selection, and the need for high efficiency and accuracy in translation. One codon in a synonymous codon family is often strongly over-used, especially in highly expressed genes, which often leads to a high dN/dS ratio because dS is very small. Many different codon usage indices have been proposed to measure codon usage and codon adaptation. Sense codon could be misread by release factors and stop codons misread by tRNAs, which also contribute to codon usage in rare cases. This chapter outlines the conceptual framework on codon evolution, illustrates codon-specific and gene-specific codon usage indices, and presents their applications. A new index for codon adaptation that accounts for background mutation bias (Index of Translation Elongation) is presented and contrasted with codon adaptation index (CAI) which does not consider background mutation bias. They are used to re-analyze data from a recent paper claiming that translation elongation efficiency matters little in protein production. The reanalysis disproves the claim.
Collapse
|
16
|
Al-Ouran R, Schmidt R, Naik A, Jones J, Drews F, Juedes D, Elnitski L, Welch L. Discovering Gene Regulatory Elements Using Coverage-Based Heuristics. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1290-1300. [PMID: 26540692 DOI: 10.1109/tcbb.2015.2496261] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Data mining algorithms and sequencing methods (such as RNA-seq and ChIP-seq) are being combined to discover genomic regulatory motifs that relate to a variety of phenotypes. However, motif discovery algorithms often produce very long lists of putative transcription factor binding sites, hindering the discovery of phenotype-related regulatory elements by making it difficult to select a manageable set of candidate motifs for experimental validation. To address this issue, the authors introduce the motif selection problem and provide coverage-based search heuristics for its solution. Analysis of 203 ChIP-seq experiments from the ENCyclopedia of DNA Elements project shows that our algorithms produce motifs that have high sensitivity and specificity and reveals new insights about the regulatory code of the human genome. The greedy algorithm performs the best, selecting a median of two motifs per ChIP-seq transcription factor group while achieving a median sensitivity of 77 percent.
Collapse
|
17
|
Saad C, Noé L, Richard H, Leclerc J, Buisine MP, Touzet H, Figeac M. DiNAMO: highly sensitive DNA motif discovery in high-throughput sequencing data. BMC Bioinformatics 2018; 19:223. [PMID: 29890948 PMCID: PMC5996464 DOI: 10.1186/s12859-018-2215-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2017] [Accepted: 05/21/2018] [Indexed: 12/30/2022] Open
Abstract
Background Discovering over-represented approximate motifs in DNA sequences is an essential part of bioinformatics. This topic has been studied extensively because of the increasing number of potential applications. However, it remains a difficult challenge, especially with the huge quantity of data generated by high throughput sequencing technologies. To overcome this problem, existing tools use greedy algorithms and probabilistic approaches to find motifs in reasonable time. Nevertheless these approaches lack sensitivity and have difficulties coping with rare and subtle motifs. Results We developed DiNAMO (for DNA MOtif), a new software based on an exhaustive and efficient algorithm for IUPAC motif discovery. We evaluated DiNAMO on synthetic and real datasets with two different applications, namely ChIP-seq peaks and Systematic Sequencing Error analysis. DiNAMO proves to compare favorably with other existing methods and is robust to noise. Conclusions We shown that DiNAMO software can serve as a tool to search for degenerate motifs in an exact manner using IUPAC models. DiNAMO can be used in scanning mode with sliding windows or in fixed position mode, which makes it suitable for numerous potential applications. Availability https://github.com/bonsai-team/DiNAMO. Electronic supplementary material The online version of this article (10.1186/s12859-018-2215-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Chadi Saad
- Univ. Lille, CNRS, Inria, UMR 9189 - CRIStAL - Centre de Recherche en Informatique Signal et Automatique de Lille, Lille, France. .,Univ. Lille, Inserm, Lille University Hospital, UMR-S 1172 - JPARC - Centre de Recherche Jean-Pierre AUBERT, Lille, F-59000, France.
| | - Laurent Noé
- Univ. Lille, CNRS, Inria, UMR 9189 - CRIStAL - Centre de Recherche en Informatique Signal et Automatique de Lille, Lille, France
| | - Hugues Richard
- Sorbonne Université, UMR7238, Laboratory Computational and Quantitative Biology, LCQB, Paris, F-75005, France
| | - Julie Leclerc
- Univ. Lille, Inserm, Lille University Hospital, UMR-S 1172 - JPARC - Centre de Recherche Jean-Pierre AUBERT, Lille, F-59000, France
| | - Marie-Pierre Buisine
- Univ. Lille, Inserm, Lille University Hospital, UMR-S 1172 - JPARC - Centre de Recherche Jean-Pierre AUBERT, Lille, F-59000, France
| | - Hélène Touzet
- Univ. Lille, CNRS, Inria, UMR 9189 - CRIStAL - Centre de Recherche en Informatique Signal et Automatique de Lille, Lille, France
| | - Martin Figeac
- Univ. Lille. Plateau de génomique fonctionnelle et structurale, Lille, F-59000, France
| |
Collapse
|
18
|
Lee NK, Azizan FL, Wong YS, Omar N. DeepFinder: An integration of feature-based and deep learning approach for DNA motif discovery. BIOTECHNOL BIOTEC EQ 2018. [DOI: 10.1080/13102818.2018.1438209] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open
Affiliation(s)
- Nung Kion Lee
- Department of Cognitive Sciences, Faculty of Cognitive Sciences and Human Development, Universiti Malaysia Sarawak, Kota Samarahan, Sarawak, Malaysia
| | - Farah Liyana Azizan
- Centre For Pre-University Studies, Universiti Malaysia Sarawak, Kota Samarahan, Sarawak, Malaysia
| | - Yu Shiong Wong
- Department of Cognitive Sciences, Faculty of Cognitive Sciences and Human Development, Universiti Malaysia Sarawak, Kota Samarahan, Sarawak, Malaysia
| | - Norshafarina Omar
- Department of Cognitive Sciences, Faculty of Cognitive Sciences and Human Development, Universiti Malaysia Sarawak, Kota Samarahan, Sarawak, Malaysia
| |
Collapse
|
19
|
Abedini D, Rashidi Monfared S. Co-regulation analysis of co-expressed modules under cold and pathogen stress conditions in tomato. Mol Biol Rep 2018; 45:335-345. [PMID: 29551007 DOI: 10.1007/s11033-018-4166-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Accepted: 03/13/2018] [Indexed: 11/28/2022]
Abstract
A primary mechanism for controlling the development of multicellular organisms is transcriptional regulation, which carried out by transcription factors (TFs) that recognize and bind to their binding sites on promoter region. The distance from translation start site, order, orientation, and spacing between cis elements are key factors in the concentration of active nuclear TFs and transcriptional regulation of target genes. In this study, overrepresented motifs in cold and pathogenesis responsive genes were scanned via Gibbs sampling method, this method is based on detection of overrepresented motifs by means of a stochastic optimization strategy that searches for all possible sets of short DNA segments. Then, identified motifs were checked by TRANSFAC, PLACE and Soft Berry databases in order to identify putative TFs which, interact to the motifs. Several cis/trans regulatory elements were found using these databases. Moreover, cross-talk between cold and pathogenesis responsive genes were confirmed. Statistical analysis was used to determine distribution of identified motifs on promoter region. In addition, co-regulation analysis results, illustrated genes in pathogenesis responsive module are divided into two main groups. Also, promoter region was crunched to six subareas in order to draw the pattern of distribution of motifs in promoter subareas. The result showed the majority of motifs are concentrated on 700 nucleotides upstream of the translational start site (ATG). In contrast, this result isn't true in another group. In other words, there was no difference between total and compartmentalized regions in cold responsive genes.
Collapse
Affiliation(s)
- Davar Abedini
- Department of Biotechnology, Faculty of Agriculture, Tarbiat Modares University, Tehran, Iran
| | - Sajad Rashidi Monfared
- Department of Biotechnology, Faculty of Agriculture, Tarbiat Modares University, Tehran, Iran.
| |
Collapse
|
20
|
Joshi RR. Diversity and motif conservation in protein 3D structural landscape: exploration by a new multivariate simulation method. J Mol Model 2018; 24:76. [PMID: 29500695 DOI: 10.1007/s00894-018-3614-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 01/31/2018] [Indexed: 11/29/2022]
Abstract
In this paper, diversity and conservation in the 'landscape' of random variation of protein tertiary structures are explored for quantitative feature-vector models of major types of functionally important 3D structural motifs. For this, I have deployed a recently developed nonparametric regression (NPR)-based multidimensional copula method of simulation. Apart from improved accuracy of multidimensional random sample generation, the simulation provides additional insight into diversity in the protein structural landscape in terms of random variation in the feature-vector. It shows the relative importance of several features, with biological implications, in conservation of motifs. Mapping of this landscape in distance-preserving 2D eigenspace also shows consistency in demarcation of different motif classes and preservation of their characteristic patterns in this 2D space.
Collapse
Affiliation(s)
- Rajani R Joshi
- Department of Mathematics, Indian Institute of Technology Bombay, Mumbai, India.
| |
Collapse
|
21
|
Heller D, Krestel R, Ohler U, Vingron M, Marsico A. ssHMM: extracting intuitive sequence-structure motifs from high-throughput RNA-binding protein data. Nucleic Acids Res 2017; 45:11004-11018. [PMID: 28977546 PMCID: PMC5737366 DOI: 10.1093/nar/gkx756] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2016] [Accepted: 08/17/2017] [Indexed: 11/14/2022] Open
Abstract
RNA-binding proteins (RBPs) play an important role in RNA post-transcriptional regulation and recognize target RNAs via sequence-structure motifs. The extent to which RNA structure influences protein binding in the presence or absence of a sequence motif is still poorly understood. Existing RNA motif finders either take the structure of the RNA only partially into account, or employ models which are not directly interpretable as sequence-structure motifs. We developed ssHMM, an RNA motif finder based on a hidden Markov model (HMM) and Gibbs sampling which fully captures the relationship between RNA sequence and secondary structure preference of a given RBP. Compared to previous methods which output separate logos for sequence and structure, it directly produces a combined sequence-structure motif when trained on a large set of sequences. ssHMM's model is visualized intuitively as a graph and facilitates biological interpretation. ssHMM can be used to find novel bona fide sequence-structure motifs of uncharacterized RBPs, such as the one presented here for the YY1 protein. ssHMM reaches a high motif recovery rate on synthetic data, it recovers known RBP motifs from CLIP-Seq data, and scales linearly on the input size, being considerably faster than MEMERIS and RNAcontext on large datasets while being on par with GraphProt. It is freely available on Github and as a Docker image.
Collapse
Affiliation(s)
- David Heller
- Max Planck Institute for Molecular Genetics, Ihnestr. 63-73 14195 Berlin, Germany.,Hasso Plattner Institute, Prof.-Dr.-Helmert-Str. 2-3 14482 Potsdam, Germany
| | - Ralf Krestel
- Hasso Plattner Institute, Prof.-Dr.-Helmert-Str. 2-3 14482 Potsdam, Germany
| | - Uwe Ohler
- Max Delbruck Center, Robert-Roessle-Str. 10 13029 Berlin, Germany
| | - Martin Vingron
- Max Planck Institute for Molecular Genetics, Ihnestr. 63-73 14195 Berlin, Germany
| | - Annalisa Marsico
- Max Planck Institute for Molecular Genetics, Ihnestr. 63-73 14195 Berlin, Germany.,Freie Universitaet Berlin, Arnimallee 14 14195 Berlin, Germany
| |
Collapse
|
22
|
López Y, Vandenbon A, Nose A, Nakai K. Modeling the cis-regulatory modules of genes expressed in developmental stages of Drosophila melanogaster. PeerJ 2017; 5:e3389. [PMID: 28584716 PMCID: PMC5452948 DOI: 10.7717/peerj.3389] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Accepted: 05/08/2017] [Indexed: 12/30/2022] Open
Abstract
Because transcription is the first step in the regulation of gene expression, understanding how transcription factors bind to their DNA binding motifs has become absolutely necessary. It has been shown that the promoters of genes with similar expression profiles share common structural patterns. This paper presents an extensive study of the regulatory regions of genes expressed in 24 developmental stages of Drosophila melanogaster. It proposes the use of a combination of structural features, such as positioning of individual motifs relative to the transcription start site, orientation, pairwise distance between motifs, and presence of motifs anywhere in the promoter for predicting gene expression from structural features of promoter sequences. RNA-sequencing data was utilized to create and validate the 24 models. When genes with high-scoring promoters were compared to those identified by RNA-seq samples, 19 (79.2%) statistically significant models, a number that exceeds previous studies, were obtained. Each model yielded a set of highly informative features, which were used to search for genes with similar biological functions.
Collapse
Affiliation(s)
- Yosvany López
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan.,Department of Computational Biology, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Alexis Vandenbon
- Immunology Frontier Research Center, Osaka University, Osaka, Japan
| | - Akinao Nose
- Department of Complexity Science and Engineering, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Kenta Nakai
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
23
|
Yang J, Chen X, McDermaid A, Ma Q. DMINDA 2.0: integrated and systematic views of regulatory DNA motif identification and analyses. Bioinformatics 2017; 33:2586-2588. [DOI: 10.1093/bioinformatics/btx223] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Accepted: 04/12/2017] [Indexed: 11/14/2022] Open
Affiliation(s)
- Jinyu Yang
- Bioinformatics and Mathematical Biosciences Lab, Department of Agronomy, Horticulture and Plant Science, South Dakota State University, Brookings, SD, USA
- Department of Mathematics and Statistics, South Dakota State University, Brookings, SD, USA
| | - Xin Chen
- Center for Applied Mathematics, Tianjin University, Tianjin, China
| | - Adam McDermaid
- Bioinformatics and Mathematical Biosciences Lab, Department of Agronomy, Horticulture and Plant Science, South Dakota State University, Brookings, SD, USA
- Department of Mathematics and Statistics, South Dakota State University, Brookings, SD, USA
| | - Qin Ma
- Bioinformatics and Mathematical Biosciences Lab, Department of Agronomy, Horticulture and Plant Science, South Dakota State University, Brookings, SD, USA
- Department of Mathematics and Statistics, South Dakota State University, Brookings, SD, USA
- BioSNTR, Brookings, SD, USA
- Population Health group, Sanford Research, Sioux Falls, SD, USA
| |
Collapse
|
24
|
Baichoo S, Ouzounis CA. Computational complexity of algorithms for sequence comparison, short-read assembly and genome alignment. Biosystems 2017; 156-157:72-85. [PMID: 28392341 DOI: 10.1016/j.biosystems.2017.03.003] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Revised: 03/21/2017] [Accepted: 03/22/2017] [Indexed: 12/12/2022]
Abstract
A multitude of algorithms for sequence comparison, short-read assembly and whole-genome alignment have been developed in the general context of molecular biology, to support technology development for high-throughput sequencing, numerous applications in genome biology and fundamental research on comparative genomics. The computational complexity of these algorithms has been previously reported in original research papers, yet this often neglected property has not been reviewed previously in a systematic manner and for a wider audience. We provide a review of space and time complexity of key sequence analysis algorithms and highlight their properties in a comprehensive manner, in order to identify potential opportunities for further research in algorithm or data structure optimization. The complexity aspect is poised to become pivotal as we will be facing challenges related to the continuous increase of genomic data on unprecedented scales and complexity in the foreseeable future, when robust biological simulation at the cell level and above becomes a reality.
Collapse
Affiliation(s)
- Shakuntala Baichoo
- Department of Computer Science & Engineering, University of Mauritius, Réduit 80837, Mauritius.
| | - Christos A Ouzounis
- Biological Computation & Process Laboratory, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas, Thessalonica 57001, Greece.
| |
Collapse
|
25
|
Ramsey SA. An Empirical Prior Improves Accuracy for Bayesian Estimation of Transcription Factor Binding Site Frequencies within Gene Promoters. Bioinform Biol Insights 2016; 9:59-69. [PMID: 27812284 PMCID: PMC5081247 DOI: 10.4137/bbi.s29330] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Revised: 09/11/2016] [Accepted: 09/18/2016] [Indexed: 12/24/2022] Open
Abstract
A Bayesian method for sampling from the distribution of matches to a precompiled transcription factor binding site (TFBS) sequence pattern (conditioned on an observed nucleotide sequence and the sequence pattern) is described. The method takes a position frequency matrix as input for a set of representative binding sites for a transcription factor and two sets of noncoding, 5′ regulatory sequences for gene sets that are to be compared. An empirical prior on the frequency A (per base pair of gene-vicinal, noncoding DNA) of TFBSs is developed using data from the ENCODE project and incorporated into the method. In addition, a probabilistic model for binding site occurrences conditioned on λ is developed analytically, taking into account the finite-width effects of binding sites. The count of TFBS β (conditioned on the observed sequence) is sampled using Metropolis–Hastings with an information entropy-based move generator. The derivation of the method is presented in a step-by-step fashion, starting from specific conditional independence assumptions. Empirical results show that the newly proposed prior on β improves accuracy for estimating the number of TFBS within a set of promoter sequences.
Collapse
Affiliation(s)
- Stephen A Ramsey
- Department of Biomedical Sciences, Oregon State University, Corvallis, OR, USA.; School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA
| |
Collapse
|
26
|
Gunasekara C, Subramanian A, Avvari JVRK, Li B, Chen S, Wei H. ExactSearch: a web-based plant motif search tool. PLANT METHODS 2016; 12:26. [PMID: 27134638 PMCID: PMC4850730 DOI: 10.1186/s13007-016-0126-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2015] [Accepted: 04/19/2016] [Indexed: 05/28/2023]
Abstract
BACKGROUND Plant biologists frequently need to examine if a sequence motif bound by a specific transcription or translation factor is present in the proximal promoters or 3' untranslated regions (3' UTR) of a set of plant genes of interest. To achieve such a task, plant biologists have to not only identify an appropriate algorithm for motif searching, but also manipulate the large volume of sequence data, making it burdensome to carry out or fulfill. RESULT In this study, we developed a web portal that enables plant molecular biologists to search for DNA motifs especially degenerate ones in custom sequences or the flanking regions of all genes in the 50 plant species whose genomes have been sequenced. A web tool like this is demanded to meet a variety of needs of plant biologists for identifying the potential gene regulatory relationships. We implemented a suffix tree algorithm to accelerate the searching process of a group of motifs in a multitude of target genes. The motifs to be searched can be in the degenerate bases in addition to adenine (A), cytosine (C), guanine (G), and thymine (T). The target sequences to be searched can be custom sequences or the selected proximal gene sequences from any one of the 50 sequenced plant species. The web portal also contains the functionality to facilitate the search of motifs that are represented by position probability matrix in above-mentioned species. Currently, the algorithm can accomplish an exhaust search of 100 motifs in 35,000 target sequences of 2 kb long in 4.2 min. However, the runtime may change in the future depending on the space availability, number of running jobs, network traffic, data loading, and output packing and delivery through electronic mailing. CONCLUSION A web portal was developed to facilitate searching of motifs presents in custom sequences or the proximal promoters or 3' UTR of 50 plant species with the sequenced genomes. This web tool is accessible by using this URL: http://sys.bio.mtu.edu/motif/index.php.
Collapse
Affiliation(s)
- Chathura Gunasekara
- />School of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI 49931 USA
| | - Avinash Subramanian
- />Department of Computer Science, Michigan Technological University, Houghton, MI USA
| | | | - Bin Li
- />Department of Computer Science, Michigan Technological University, Houghton, MI USA
| | - Su Chen
- />State Key Laboratory of Forest Genetics and Tree Breeding, Northeast Forestry University, Harbin, Heilongjiang 150040 People’s Republic of China
| | - Hairong Wei
- />School of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI 49931 USA
- />Department of Computer Science, Michigan Technological University, Houghton, MI USA
- />Life Science and Technology Institute, Michigan Technological University Houghton, Michigan, MI 49931 USA
| |
Collapse
|
27
|
FabR regulates Salmonella biofilm formation via its direct target FabB. BMC Genomics 2016; 17:253. [PMID: 27004424 PMCID: PMC4804515 DOI: 10.1186/s12864-016-2387-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2015] [Accepted: 01/08/2016] [Indexed: 12/02/2022] Open
Abstract
Background Biofilm formation is an important survival strategy of Salmonella in all environments. By mutant screening, we showed a knock-out mutant of fabR, encoding a repressor of unsaturated fatty acid biosynthesis (UFA), to have impaired biofilm formation. In order to unravel how this regulator impinges on Salmonella biofilm formation, we aimed at elucidating the S. Typhimurium FabR regulon. Hereto, we applied a combinatorial high-throughput approach, combining ChIP-chip with transcriptomics. Results All the previously identified E. coli FabR transcriptional target genes (fabA, fabB and yqfA) were shown to be direct S. Typhimurium FabR targets as well. As we found a fabB overexpressing strain to partly mimic the biofilm defect of the fabR mutant, the effect of FabR on biofilms can be attributed at least partly to FabB, which plays a key role in UFA biosynthesis. Additionally, ChIP-chip identified a number of novel direct FabR targets (the intergenic regions between hpaR/hpaG and ddg/ydfZ) and yet putative direct targets (i.a. genes involved in tRNA metabolism, ribosome synthesis and translation). Next to UFA biosynthesis, a number of these direct targets and other indirect targets identified by transcriptomics (e.g. ribosomal genes, ompA, ompC, ompX, osmB, osmC, sseI), could possibly contribute to the effect of FabR on biofilm formation. Conclusion Overall, our results point at the importance of FabR and UFA biosynthesis in Salmonella biofilm formation and their role as potential targets for biofilm inhibitory strategies. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2387-x) contains supplementary material, which is available to authorized users.
Collapse
|
28
|
Surujon D, Ratner DI. Use of a Probabilistic Motif Search to Identify Histidine Phosphotransfer Domain-Containing Proteins. PLoS One 2016; 11:e0146577. [PMID: 26751210 PMCID: PMC4709007 DOI: 10.1371/journal.pone.0146577] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2015] [Accepted: 12/18/2015] [Indexed: 11/18/2022] Open
Abstract
The wealth of newly obtained proteomic information affords researchers the possibility of searching for proteins of a given structure or function. Here we describe a general method for the detection of a protein domain of interest in any species for which a complete proteome exists. In particular, we apply this approach to identify histidine phosphotransfer (HPt) domain-containing proteins across a range of eukaryotic species. From the sequences of known HPt domains, we created an amino acid occurrence matrix which we then used to define a conserved, probabilistic motif. Examination of various organisms either known to contain (plant and fungal species) or believed to lack (mammals) HPt domains established criteria by which new HPt candidates were identified and ranked. Search results using a probabilistic motif matrix compare favorably with data to be found in several commonly used protein structure/function databases: our method identified all known HPt proteins in the Arabidopsis thaliana proteome, confirmed the absence of such motifs in mice and humans, and suggests new candidate HPts in several organisms. Moreover, probabilistic motif searching can be applied more generally, in a manner both readily customized and computationally compact, to other protein domains; this utility is demonstrated by our identification of histones in a range of eukaryotic organisms.
Collapse
Affiliation(s)
- Defne Surujon
- Program in Biochemistry and Biophysics, Amherst College, Amherst, Massachusetts, United States of America
| | - David I. Ratner
- Program in Biochemistry and Biophysics, Amherst College, Amherst, Massachusetts, United States of America
- Department of Biology, Amherst College, Amherst, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
29
|
Yang WF, Yu ZG, Anh V. Whole genome/proteome based phylogeny reconstruction for prokaryotes using higher order Markov model and chaos game representation. Mol Phylogenet Evol 2015; 96:102-111. [PMID: 26724405 DOI: 10.1016/j.ympev.2015.12.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2015] [Revised: 12/17/2015] [Accepted: 12/18/2015] [Indexed: 01/18/2023]
Abstract
UNLABELLED Traditional methods for sequence comparison and phylogeny reconstruction rely on pair wise and multiple sequence alignments. But alignment could not be directly applied to whole genome/proteome comparison and phylogenomic studies due to their high computational complexity. Hence alignment-free methods became popular in recent years. Here we propose a fast alignment-free method for whole genome/proteome comparison and phylogeny reconstruction using higher order Markov model and chaos game representation. In the present method, we use the transition matrices of higher order Markov models to characterize amino acid or DNA sequences for their comparison. The order of the Markov model is uniquely identified by maximizing the average Shannon entropy of conditional probability distributions. Using one-dimensional chaos game representation and linked list, this method can reduce large memory and time consumption which is due to the large-scale conditional probability distributions. To illustrate the effectiveness of our method, we employ it for fast phylogeny reconstruction based on genome/proteome sequences of two species data sets used in previous published papers. Our results demonstrate that the present method is useful and efficient. AVAILABILITY AND IMPLEMENTATION The source codes for our algorithm to get the distance matrix and genome/proteome sequences can be downloaded from ftp://121.199.20.25/. The software Phylip and EvolView we used to construct phylogenetic trees can be referred from their websites.
Collapse
Affiliation(s)
- Wei-Feng Yang
- Hunan Key Laboratory for Computation and Simulation in Science and Engineering and Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Hunan 411105, PR China; Department of Mathematics and Physics, Hunan Institute of Engineering, Hunan 411104, PR China.
| | - Zu-Guo Yu
- Hunan Key Laboratory for Computation and Simulation in Science and Engineering and Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Hunan 411105, PR China; School of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane, QLD 4001, Australia.
| | - Vo Anh
- School of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane, QLD 4001, Australia.
| |
Collapse
|
30
|
Kakeshpour T, Nayebi S, Rashidi Monfared S, Moieni A, Karimzadeh G. Identification and expression analyses of MYB and WRKY transcription factor genes in Papaver somniferum L. PHYSIOLOGY AND MOLECULAR BIOLOGY OF PLANTS : AN INTERNATIONAL JOURNAL OF FUNCTIONAL PLANT BIOLOGY 2015; 21:465-78. [PMID: 26600674 PMCID: PMC4646871 DOI: 10.1007/s12298-015-0325-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2015] [Revised: 09/30/2015] [Accepted: 10/05/2015] [Indexed: 05/21/2023]
Abstract
Papaver somniferum L. is an herbaceous, annual and diploid plant that is important from pharmacological and strategic point of view. The cDNA clones of two putative MYB and WRKY genes were isolated (GeneBank accession numbers KP411870 and KP203854, respectively) from this plant, via the nested-PCR method, and characterized. The MYB transcription factor (TF) comprises 342 amino acids, and exhibits the structural features of the R2R3MYB protein family. The WRKY TF, a 326 amino acid-long polypeptide, falls structurally into the group II of WRKY protein family. Quantitative real-time PCR (qRT-PCR) analyses indicate the presence of these TFs in all organs of P. somniferum L. and Papaver bracteatum L. Highest expression levels of these two TFs were observed in the leaf tissues of P. somniferum L. while in P. bracteatum L. the espression levels were highest in the root tissues. Promoter analysis of the 10 co-expressed gene clustered involved in noscapine biosynthesis pathway in P. somniferum L. suggested that not only these 10 genes are co-expressed, but also share common regulatory motifs and TFs including MYB and WRKY TFs, and that may explain their common regulation.
Collapse
Affiliation(s)
- Tayebeh Kakeshpour
- Plant Breeding and Biotechnology Department, Faculty of Agriculture, Tarbiat Modares University, Tehran, Iran
| | - Shadi Nayebi
- Plant Breeding and Biotechnology Department, Faculty of Agriculture, Tarbiat Modares University, Tehran, Iran
| | - Sajad Rashidi Monfared
- Plant Breeding and Biotechnology Department, Faculty of Agriculture, Tarbiat Modares University, Tehran, Iran
| | - Ahmad Moieni
- Plant Breeding and Biotechnology Department, Faculty of Agriculture, Tarbiat Modares University, Tehran, Iran
| | - Ghasem Karimzadeh
- Plant Breeding and Biotechnology Department, Faculty of Agriculture, Tarbiat Modares University, Tehran, Iran
| |
Collapse
|
31
|
De Witte D, Van de Velde J, Decap D, Van Bel M, Audenaert P, Demeester P, Dhoedt B, Vandepoele K, Fostier J. BLSSpeller: exhaustive comparative discovery of conserved cis-regulatory elements. Bioinformatics 2015; 31:3758-66. [PMID: 26254488 PMCID: PMC4653392 DOI: 10.1093/bioinformatics/btv466] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2014] [Accepted: 08/03/2015] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The accurate discovery and annotation of regulatory elements remains a challenging problem. The growing number of sequenced genomes creates new opportunities for comparative approaches to motif discovery. Putative binding sites are then considered to be functional if they are conserved in orthologous promoter sequences of multiple related species. Existing methods for comparative motif discovery usually rely on pregenerated multiple sequence alignments, which are difficult to obtain for more diverged species such as plants. As a consequence, misaligned regulatory elements often remain undetected. RESULTS We present a novel algorithm that supports both alignment-free and alignment-based motif discovery in the promoter sequences of related species. Putative motifs are exhaustively enumerated as words over the IUPAC alphabet and screened for conservation using the branch length score. Additionally, a confidence score is established in a genome-wide fashion. In order to take advantage of a cloud computing infrastructure, the MapReduce programming model is adopted. The method is applied to four monocotyledon plant species and it is shown that high-scoring motifs are significantly enriched for open chromatin regions in Oryza sativa and for transcription factor binding sites inferred through protein-binding microarrays in O.sativa and Zea mays. Furthermore, the method is shown to recover experimentally profiled ga2ox1-like KN1 binding sites in Z.mays. AVAILABILITY AND IMPLEMENTATION BLSSpeller was written in Java. Source code and manual are available at http://bioinformatics.intec.ugent.be/blsspeller CONTACT Klaas.Vandepoele@psb.vib-ugent.be or jan.fostier@intec.ugent.be. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dieter De Witte
- Department of Information Technology (INTEC), Ghent University-iMinds, Ghent, Belgium
| | - Jan Van de Velde
- Department of Plant Systems Biology, VIB and Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
| | - Dries Decap
- Department of Information Technology (INTEC), Ghent University-iMinds, Ghent, Belgium
| | - Michiel Van Bel
- Department of Plant Systems Biology, VIB and Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
| | - Pieter Audenaert
- Department of Information Technology (INTEC), Ghent University-iMinds, Ghent, Belgium
| | - Piet Demeester
- Department of Information Technology (INTEC), Ghent University-iMinds, Ghent, Belgium
| | - Bart Dhoedt
- Department of Information Technology (INTEC), Ghent University-iMinds, Ghent, Belgium
| | - Klaas Vandepoele
- Department of Plant Systems Biology, VIB and Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
| | - Jan Fostier
- Department of Information Technology (INTEC), Ghent University-iMinds, Ghent, Belgium
| |
Collapse
|
32
|
Zarns K, Desell T, Nechaev S, Dhasarathy A. Searching the Human Genome for Snail and Slug With DNA@Home. PROCEEDINGS ... IEEE INTERNATIONAL CONFERENCE ON ESCIENCE. IEEE INTERNATIONAL CONFERENCE ON ESCIENCE 2015; 2015:429-438. [PMID: 26998498 PMCID: PMC4794263 DOI: 10.1109/escience.2015.27] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
DNA@Home is a volunteer computing project that aims to use Gibbs Sampling for the identification and location of DNA control signals on full genome-scale datasets. A fault tolerant and asynchronous implementation of Gibbs sampling using the Berkeley Open Infrastructure for Network Computing (BOINC) was used to identify the location of binding sites of the SNAI1 (Snail) and SNAI2 (Slug) transcription factors across the human genome. Genes regulated by Slug but not Snail, and genes regulated by Snail but not Slug provided two datasets with known motifs. These datasets contained up to 994 DNA sequences which to our knowledge is largest scale use of Gibbs sampling for discovery of binding sites. 1000 parallel sampling walks were used to search for the presence of 1, 2 or 3 possible motifs using small, medium, and full size sets of these sequences. These runs were performed over a period of two months using over 1500 volunteered computing hosts and generated over 2.2 Terabytes of sampling data. High performance computing resources were used for post processing. This paper presents intra and inter walk analyses used to determine walk convergence. The results were validated against current biological knowledge of the Snail and Slug promoter regions and present avenues for further biological study.
Collapse
Affiliation(s)
- Kristopher Zarns
- Department of Computer Science, University of North Dakota, Grand Forks, North Dakota 58202-9015
| | - Travis Desell
- Department of Computer Science, University of North Dakota, Grand Forks, North Dakota 58202-9015
| | - Sergei Nechaev
- Department of Basic Sciences, University of North Dakota, Grand Forks, North Dakota 58202-9061
| | - Archana Dhasarathy
- Department of Basic Sciences, University of North Dakota, Grand Forks, North Dakota 58202-9061
| |
Collapse
|
33
|
An integrated approach to reconstructing genome-scale transcriptional regulatory networks. PLoS Comput Biol 2015; 11:e1004103. [PMID: 25723545 PMCID: PMC4344238 DOI: 10.1371/journal.pcbi.1004103] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2014] [Accepted: 12/23/2014] [Indexed: 11/24/2022] Open
Abstract
Transcriptional regulatory networks (TRNs) program cells to dynamically alter their gene expression in response to changing internal or environmental conditions. In this study, we develop a novel workflow for generating large-scale TRN models that integrates comparative genomics data, global gene expression analyses, and intrinsic properties of transcription factors (TFs). An assessment of this workflow using benchmark datasets for the well-studied γ-proteobacterium Escherichia coli showed that it outperforms expression-based inference approaches, having a significantly larger area under the precision-recall curve. Further analysis indicated that this integrated workflow captures different aspects of the E. coli TRN than expression-based approaches, potentially making them highly complementary. We leveraged this new workflow and observations to build a large-scale TRN model for the α-Proteobacterium Rhodobacter sphaeroides that comprises 120 gene clusters, 1211 genes (including 93 TFs), 1858 predicted protein-DNA interactions and 76 DNA binding motifs. We found that ~67% of the predicted gene clusters in this TRN are enriched for functions ranging from photosynthesis or central carbon metabolism to environmental stress responses. We also found that members of many of the predicted gene clusters were consistent with prior knowledge in R. sphaeroides and/or other bacteria. Experimental validation of predictions from this R. sphaeroides TRN model showed that high precision and recall was also obtained for TFs involved in photosynthesis (PpsR), carbon metabolism (RSP_0489) and iron homeostasis (RSP_3341). In addition, this integrative approach enabled generation of TRNs with increased information content relative to R. sphaeroides TRN models built via other approaches. We also show how this approach can be used to simultaneously produce TRN models for each related organism used in the comparative genomics analysis. Our results highlight the advantages of integrating comparative genomics of closely related organisms with gene expression data to assemble large-scale TRN models with high-quality predictions. The ever growing amount of genomic data enables the assembly of large-scale network models that can provide important new insights into living systems. However, assembly and validation of such large-scale models can be challenging, since we often lack sufficient information to make accurate predictions. This work describes a new approach for constructing large-scale transcriptional regulatory networks of individual cells. We show that the reconstructed network captures a significantly larger fraction of cellular regulatory processes than networks generated by other existing approaches. We predict this approach, with appropriate refinements, will allow reconstruction of large-scale transcriptional network models for a variety of other organisms. As we work towards modeling the function of cells or complex ecosystems, individually reconstructed network models of signaling, information transfer and metabolism, can be integrated to provide high information predictions and insights not otherwise obtainable.
Collapse
|
34
|
Mahdevar G, Nowzari-Dalini A, Sadeghi M. Inferring gene correlation networks from transcription factor binding sites. Genes Genet Syst 2014; 88:301-9. [PMID: 24694393 DOI: 10.1266/ggs.88.301] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Gene expression is a highly regulated biological process that is fundamental to the existence of phenotypes of any living organism. The regulatory relations are usually modeled as a network; simply, every gene is modeled as a node and relations are shown as edges between two related genes. This paper presents a novel method for inferring correlation networks, networks constructed by connecting co-expressed genes, through predicting co-expression level from genes promoter's sequences. According to the results, this method works well on biological data and its outcome is comparable to the methods that use microarray as input. The method is written in C++ language and is available upon request from the corresponding author.
Collapse
Affiliation(s)
- Ghasem Mahdevar
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran
| | | | | |
Collapse
|
35
|
Appel HM, Fescemyer H, Ehlting J, Weston D, Rehrig E, Joshi T, Xu D, Bohlmann J, Schultz J. Transcriptional responses of Arabidopsis thaliana to chewing and sucking insect herbivores. FRONTIERS IN PLANT SCIENCE 2014; 5:565. [PMID: 25452759 PMCID: PMC4231836 DOI: 10.3389/fpls.2014.00565] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2014] [Accepted: 10/01/2014] [Indexed: 05/22/2023]
Abstract
We tested the hypothesis that Arabidopsis can recognize and respond differentially to insect species at the transcriptional level using a genome wide microarray. Transcriptional reprogramming was characterized using co-expression analysis in damaged and undamaged leaves at two times in response to mechanical wounding and four insect species. In all, 2778 (10.6%) of annotated genes on the array were differentially expressed in at least one treatment. Responses differed mainly between aphid and caterpillar and sampling times. Responses to aphids and caterpillars shared only 10% of up-regulated and 8% of down-regulated genes. Responses to two caterpillars shared 21 and 12% of up- and down-regulated genes, whereas responses to the two aphids shared only 7 and 4% of up-regulated and down-regulated genes. Overlap in genes expressed between 6 and 24 h was 3-15%, and depended on the insect species. Responses in attacked and unattacked leaves differed at 6 h but converged by 24 h. Genes responding to the insects are also responsive to many stressors and included primary metabolism. Aphids down-regulated amino acid catabolism; caterpillars stimulated production of amino acids involved in glucosinolate synthesis. Co-expression analysis revealed 17 response networks. Transcription factors were a major portion of differentially expressed genes throughout and responsive genes shared most of the known or postulated binding sites. However, cis-element composition of genes down regulated by the aphid M. persicae was unique, as were those of genes down-regulated by caterpillars. As many as 20 cis-elements were over-represented in one or more treatments, including some from well-characterized classes and others as yet uncharacterized. We suggest that transcriptional changes elicited by wounding and insects are heavily influenced by transcription factors and involve both enrichment of a common set of cis-elements and a unique enrichment of a few cis-elements in responding genes.
Collapse
Affiliation(s)
- Heidi M. Appel
- Bond Life Sciences Center and Division of Plant Sciences, University of MissouriColumbia, MO, USA
| | - Howard Fescemyer
- Department of Biology, The Pennsylvania State UniversityUniversity Park, PA, USA
| | - Juergen Ehlting
- Michael Smith Laboratories, University of British ColumbiaVancouver, BC, Canada
- Department of Biology, University of VictoriaVictoria, BC, Canada
| | - David Weston
- Biosciences Division, Oak Ridge National LaboratoryOak Ridge, TN, USA
| | - Erin Rehrig
- Biology and Chemistry Department, Fitchburg State UniversityFitchburg, MA, USA
| | - Trupti Joshi
- Department of Computer Science, Bond Life Sciences Center, Informatics Institute, University of MissouriColumbia, MO, USA
| | - Dong Xu
- Department of Computer Science, Bond Life Sciences Center, Informatics Institute, University of MissouriColumbia, MO, USA
| | - Joerg Bohlmann
- Michael Smith Laboratories, University of British ColumbiaVancouver, BC, Canada
| | - Jack Schultz
- Bond Life Sciences Center and Division of Plant Sciences, University of MissouriColumbia, MO, USA
| |
Collapse
|
36
|
Liu W, Mazarei M, Peng Y, Fethe MH, Rudis MR, Lin J, Millwood RJ, Arelli PR, Stewart CN. Computational discovery of soybean promoter cis-regulatory elements for the construction of soybean cyst nematode-inducible synthetic promoters. PLANT BIOTECHNOLOGY JOURNAL 2014; 12:1015-26. [PMID: 24893752 DOI: 10.1111/pbi.12206] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2014] [Revised: 04/14/2014] [Accepted: 04/23/2014] [Indexed: 05/03/2023]
Abstract
Computational methods offer great hope but limited accuracy in the prediction of functional cis-regulatory elements; improvements are needed to enable synthetic promoter design. We applied an ensemble strategy for de novo soybean cyst nematode (SCN)-inducible motif discovery among promoters of 18 co-expressed soybean genes that were selected from six reported microarray studies involving a compatible soybean-SCN interaction. A total of 116 overlapping motif regions (OMRs) were discovered bioinformatically that were identified by at least four out of seven bioinformatic tools. Using synthetic promoters, the inducibility of each OMR or motif itself was evaluated by co-localization of gain of function of an orange fluorescent protein reporter and the presence of SCN in transgenic soybean hairy roots. Among 16 OMRs detected from two experimentally confirmed SCN-inducible promoters, 11 OMRs (i.e. 68.75%) were experimentally confirmed to be SCN-inducible, leading to the discovery of 23 core motifs of 5- to 7-bp length, of which 14 are novel in plants. We found that a combination of the three best tools (i.e. SCOPE, W-AlignACE and Weeder) could detect all 23 core motifs. Thus, this strategy is a high-throughput approach for de novo motif discovery in soybean and offers great potential for novel motif discovery and synthetic promoter engineering for any plant and trait in crop biotechnology.
Collapse
Affiliation(s)
- Wusheng Liu
- Department of Plant Sciences, The University of Tennessee, Knoxville, TN, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Stepančič Z. Enhancing Gibbs sampling method for motif finding in DNA with initial graph representation of sequences. J Comput Biol 2014; 21:741-52. [PMID: 25121709 DOI: 10.1089/cmb.2014.0106] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Finding short patterns with residue variation in a set of sequences is still an open problem in genetics, since motif-finding techniques on DNA and protein sequences are inconclusive on real data sets and their performance varies on different species. Hence, finding new algorithms and evolving established methods are vital to further understanding of genome properties and the mechanisms of protein development. In this work, we present an approach to finding functional motifs in DNA sequences in connection to Gibbs sampling method. Starting points in the search space are partly determined via graphical representation of input sequences opposed to completely random initial points with the standard Gibbs sampling. Our algorithm is evaluated on synthetic as well as on real data sets by using several statistics, such as sensitivity, positive predictive value, specificity, performance, and correlation coefficient. Additionally, a comparison between our algorithm and the basic standard Gibbs sampling algorithm is made to show improvement in accuracy, repeatability, and performance.
Collapse
|
38
|
Azmi AM, Al-Ssulami A. Encoded expansion: an efficient algorithm to discover identical string motifs. PLoS One 2014; 9:e95148. [PMID: 24871320 PMCID: PMC4037181 DOI: 10.1371/journal.pone.0095148] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2013] [Accepted: 03/24/2014] [Indexed: 11/19/2022] Open
Abstract
A major task in computational biology is the discovery of short recurring string patterns known as motifs. Most of the schemes to discover motifs are either stochastic or combinatorial in nature. Stochastic approaches do not guarantee finding the correct motifs, while the combinatorial schemes tend to have an exponential time complexity with respect to motif length. To alleviate the cost, the combinatorial approach exploits dynamic data structures such as trees or graphs. Recently (Karci (2009) Efficient automatic exact motif discovery algorithms for biological sequences, Expert Systems with Applications 36:7952-7963) devised a deterministic algorithm that finds all the identical copies of string motifs of all sizes [Formula: see text] in theoretical time complexity of [Formula: see text] and a space complexity of [Formula: see text] where [Formula: see text] is the length of the input sequence and [Formula: see text] is the length of the longest possible string motif. In this paper, we present a significant improvement on Karci's original algorithm. The algorithm that we propose reports all identical string motifs of sizes [Formula: see text] that occur at least [Formula: see text] times. Our algorithm starts with string motifs of size 2, and at each iteration it expands the candidate string motifs by one symbol throwing out those that occur less than [Formula: see text] times in the entire input sequence. We use a simple array and data encoding to achieve theoretical worst-case time complexity of [Formula: see text] and a space complexity of [Formula: see text] Encoding of the substrings can speed up the process of comparison between string motifs. Experimental results on random and real biological sequences confirm that our algorithm has indeed a linear time complexity and it is more scalable in terms of sequence length than the existing algorithms.
Collapse
Affiliation(s)
- Aqil M. Azmi
- Department of Computer Science, College of Computer & Information Sciences, King Saud University, Riyadh, Saudi Arabia
- * E-mail:
| | - Abdulrakeeb Al-Ssulami
- Department of Computer Science, College of Computer & Information Sciences, King Saud University, Riyadh, Saudi Arabia
| |
Collapse
|
39
|
Latorre M, Galloway-Peña J, Roh JH, Budinich M, Reyes-Jara A, Murray BE, Maass A, González M. Enterococcus faecalis reconfigures its transcriptional regulatory network activation at different copper levels. Metallomics 2014; 6:572-81. [PMID: 24382465 DOI: 10.1039/c3mt00288h] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
A global transcriptional regulatory network was generated in the pathogenic bacterium Enterococcus faecalis in order to understand how this organism can activate and coordinate its expression at different copper concentrations. The topological evaluation of the network showed common patterns described in other organisms. Integrating microarray experiments allowed the identification of two sub-networks activated at low (0.05 mM CuSO4) and high (0.5 mM CuSO4) concentrations of copper. The analysis indicates the presence of specific functionally activated modules induced by copper levels, highlighting the regulons LysR and ArgR as global regulators and CopY, Fur and LexA as local regulators. Taking advantage of the fact that E. faecalis presented a homeostatic module, we produced an in vivo intervention by removing this system from the cell without affecting the connectivity of the global transcriptional network. This strategy led us to find that this bacterium can reconfigure its gene expression to maintain cellular homeostasis, activating new modules principally related to glucose metabolism and transcriptional processes. Finally, these results position E. faecalis as the most complete and controllable systemic model organism for copper homeostasis available to date.
Collapse
Affiliation(s)
- Mauricio Latorre
- Laboratorio de Bioinformática y Expresión Génica, INTA, Universidad de Chile, El Líbano 5524, Santiago 11, Chile. ,
| | | | | | | | | | | | | | | |
Collapse
|
40
|
Ibrahim R, Ghanem N, Ismail MA. Context-aware semi-supervised motif detection approach. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2014; 2014:3953-3956. [PMID: 25570857 DOI: 10.1109/embc.2014.6944489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Motif detection has raised as an important task in bioinformatics. Recently, the discovery of motifs that are localized relative to a certain biological area has become an important task in many applications. For example, it is used to discover regulatory sequences beside the transcription start site and the neighborhood of known transcription factor binding sites [1]. Therefore, the idea of context aware motif detection approach is needed. Moreover, there is an interest to use both labeled and unlabeled sets to enhance the motif detection approaches. In this paper, three novel context aware semi-supervised motif detection approaches are proposed, which are self-learning, context aware and co-training context aware systems. In self-learning motif Hidden Markov Model (HMM) is enhanced independently using unlabeled sets. While in co-training, three different models are trained based on three different views which are pre-motif sequences, motif sequences and post-motif sequences. Moreover, our co-training context aware system is suitable for parallelization to enhance its execution time. The approaches were evaluated using human motif sequences and the results show that co-training context aware system has achieved the best results. The results also show that our approach outperforms other related works in [1], [2] and [3].
Collapse
|
41
|
Hodar C, Zuñiga A, Pulgar R, Travisany D, Chacon C, Pino M, Maass A, Cambiazo V. Comparative gene expression analysis of Dtg, a novel target gene of Dpp signaling pathway in the early Drosophila melanogaster embryo. Gene 2013; 535:210-7. [PMID: 24321690 DOI: 10.1016/j.gene.2013.11.032] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2013] [Revised: 10/30/2013] [Accepted: 11/14/2013] [Indexed: 10/25/2022]
Abstract
In the early Drosophila melanogaster embryo, Dpp, a secreted molecule that belongs to the TGF-β superfamily of growth factors, activates a set of downstream genes to subdivide the dorsal region into amnioserosa and dorsal epidermis. Here, we examined the expression pattern and transcriptional regulation of Dtg, a new target gene of Dpp signaling pathway that is required for proper amnioserosa differentiation. We showed that the expression of Dtg was controlled by Dpp and characterized a 524-bp enhancer that mediated expression in the dorsal midline, as well as, in the differentiated amnioserosa in transgenic reporter embryos. This enhancer contained a highly conserved region of 48-bp in which bioinformatic predictions and in vitro assays identified three Mad binding motifs. Mutational analysis revealed that these three motifs were necessary for proper expression of a reporter gene in transgenic embryos, suggesting that short and highly conserved genomic sequences may be indicative of functional regulatory regions in D. melanogaster genes. Dtg orthologs were not detected in basal lineages of Dipterans, which unlike D. melanogaster develop two extra-embryonic membranes, amnion and serosa, nevertheless Dtg orthologs were identified in the transcriptome of Musca domestica, in which dorsal ectoderm patterning leads to the formation of a single extra-embryonic membrane. These results suggest that Dtg was recruited as a new component of the network that controls dorsal ectoderm patterning in the lineage leading to higher Cyclorrhaphan flies, such as D. melanogaster and M. domestica.
Collapse
Affiliation(s)
- Christian Hodar
- Laboratorio de Bioinformática y Expresión Génica, INTA-Universidad de Chile, El Líbano 5524, Santiago, Chile; Fondap Center for Genome Regulation (CGR), Universidad de Chile, Santiago, Chile
| | - Alejandro Zuñiga
- Laboratorio de Bioinformática y Expresión Génica, INTA-Universidad de Chile, El Líbano 5524, Santiago, Chile; Fondap Center for Genome Regulation (CGR), Universidad de Chile, Santiago, Chile
| | - Rodrigo Pulgar
- Laboratorio de Bioinformática y Expresión Génica, INTA-Universidad de Chile, El Líbano 5524, Santiago, Chile; Fondap Center for Genome Regulation (CGR), Universidad de Chile, Santiago, Chile
| | - Dante Travisany
- Laboratorio de Bioinformática y Matemática del Genoma, Center for Mathematical Modeling, FCFM-Universidad de Chile, Santiago, Chile; Fondap Center for Genome Regulation (CGR), Universidad de Chile, Santiago, Chile
| | - Carlos Chacon
- Laboratorio de Bioinformática y Expresión Génica, INTA-Universidad de Chile, El Líbano 5524, Santiago, Chile
| | - Michael Pino
- Laboratorio de Bioinformática y Expresión Génica, INTA-Universidad de Chile, El Líbano 5524, Santiago, Chile
| | - Alejandro Maass
- Laboratorio de Bioinformática y Matemática del Genoma, Center for Mathematical Modeling, FCFM-Universidad de Chile, Santiago, Chile; Fondap Center for Genome Regulation (CGR), Universidad de Chile, Santiago, Chile; Department of Mathematical Engineering, FCFM-Universidad de Chile, Santiago, Chile
| | - Verónica Cambiazo
- Laboratorio de Bioinformática y Expresión Génica, INTA-Universidad de Chile, El Líbano 5524, Santiago, Chile; Fondap Center for Genome Regulation (CGR), Universidad de Chile, Santiago, Chile.
| |
Collapse
|
42
|
Carvalho L. Bayesian centroid estimation for motif discovery. PLoS One 2013; 8:e80511. [PMID: 24324603 PMCID: PMC3855595 DOI: 10.1371/journal.pone.0080511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2013] [Accepted: 10/03/2013] [Indexed: 11/29/2022] Open
Abstract
Biological sequences may contain patterns that signal important biomolecular functions; a classical example is regulation of gene expression by transcription factors that bind to specific patterns in genomic promoter regions. In motif discovery we are given a set of sequences that share a common motif and aim to identify not only the motif composition, but also the binding sites in each sequence of the set. We propose a new centroid estimator that arises from a refined and meaningful loss function for binding site inference. We discuss the main advantages of centroid estimation for motif discovery, including computational convenience, and how its principled derivation offers further insights about the posterior distribution of binding site configurations. We also illustrate, using simulated and real datasets, that the centroid estimator can differ from the traditional maximum a posteriori or maximum likelihood estimators.
Collapse
Affiliation(s)
- Luis Carvalho
- Department of Mathematics and Statistics, Boston University, Boston, Massachusetts, United States of America
| |
Collapse
|
43
|
IFNβ-dependent increases in STAT1, STAT2, and IRF9 mediate resistance to viruses and DNA damage. EMBO J 2013; 32:2751-63. [PMID: 24065129 PMCID: PMC3801437 DOI: 10.1038/emboj.2013.203] [Citation(s) in RCA: 258] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Accepted: 08/13/2013] [Indexed: 12/18/2022] Open
Abstract
A single high dose of interferon-β (IFNβ) activates powerful cellular responses, in which many anti-viral, pro-apoptotic, and anti-proliferative proteins are highly expressed. Since some of these proteins are deleterious, cells downregulate this initial response rapidly. However, the expression of many anti-viral proteins that do no harm is sustained, prolonging a substantial part of the initial anti-viral response for days and also providing resistance to DNA damage. While the transcription factor ISGF3 (IRF9 and tyrosine-phosphorylated STATs 1 and 2) drives the first rapid response phase, the related factor un-phosphorylated ISGF3 (U-ISGF3), formed by IFNβ-induced high levels of IRF9 and STATs 1 and 2 without tyrosine phosphorylation, drives the second prolonged response. The U-ISGF3-induced anti-viral genes that show prolonged expression are driven by distinct IFN stimulated response elements (ISREs). Continuous exposure of cells to a low level of IFNβ, often seen in cancers, leads to steady-state increased expression of only the U-ISGF3-dependent proteins, with no sustained increase in other IFNβ-induced proteins, and to constitutive resistance to DNA damage. IFNβ induces the formation of a novel transcriptional complex, U-ISGF3, which contains un-phosphorylated STATs. U-ISGF3 regulates the expression of a subset of IFNβ-stimulated genes to promote resistance to virus infection and DNA damage.
Collapse
|
44
|
Yamashita A, Shichino Y, Tanaka H, Hiriart E, Touat-Todeschini L, Vavasseur A, Ding DQ, Hiraoka Y, Verdel A, Yamamoto M. Hexanucleotide motifs mediate recruitment of the RNA elimination machinery to silent meiotic genes. Open Biol 2013; 2:120014. [PMID: 22645662 PMCID: PMC3352096 DOI: 10.1098/rsob.120014] [Citation(s) in RCA: 96] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2012] [Accepted: 02/28/2012] [Indexed: 11/28/2022] Open
Abstract
The selective elimination system blocks the accumulation of meiosis-specific mRNAs during the mitotic cell cycle in fission yeast. These mRNAs harbour a region, the determinant of selective removal (DSR), which is recognized by a YTH-family RNA-binding protein, Mmi1. Mmi1 directs target transcripts to destruction in association with nuclear exosomes. Hence, the interaction between DSR and Mmi1 is crucial to discriminate mitosis from meiosis. Here, we show that Mmi1 interacts with repeats of the hexanucleotide U(U/C)AAAC that are enriched in the DSR. Disruption of this ‘DSR core motif’ in a target mRNA inhibits its elimination. Tandem repeats of the motif can function as an artificial DSR. Mmi1 binds to it in vitro. Thus, a core motif cluster is responsible for the DSR activity. Furthermore, certain variant hexanucleotide motifs can augment the function of the DSR core motif. Notably, meiRNA, which composes the nuclear Mei2 dot required to suppress Mmi1 activity during meiosis, carries numerous copies of the core/augmenting motifs on its tail and is indeed degraded by the Mmi1/exosome system, indicating its likely role as decoy bait for Mmi1.
Collapse
Affiliation(s)
- Akira Yamashita
- Department of Biophysics and Biochemistry, Graduate School of Science, University of Tokyo, Hongo, Tokyo 113-0033, Japan.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Wenger AM, Clarke SL, Notwell JH, Chung T, Tuteja G, Guturu H, Schaar BT, Bejerano G. The enhancer landscape during early neocortical development reveals patterns of dense regulation and co-option. PLoS Genet 2013; 9:e1003728. [PMID: 24009522 PMCID: PMC3757057 DOI: 10.1371/journal.pgen.1003728] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2013] [Accepted: 07/03/2013] [Indexed: 11/18/2022] Open
Abstract
Genetic studies have identified a core set of transcription factors and target genes that control the development of the neocortex, the region of the human brain responsible for higher cognition. The specific regulatory interactions between these factors, many key upstream and downstream genes, and the enhancers that mediate all these interactions remain mostly uncharacterized. We perform p300 ChIP-seq to identify over 6,600 candidate enhancers active in the dorsal cerebral wall of embryonic day 14.5 (E14.5) mice. Over 95% of the peaks we measure are conserved to human. Eight of ten (80%) candidates tested using mouse transgenesis drive activity in restricted laminar patterns within the neocortex. GREAT based computational analysis reveals highly significant correlation with genes expressed at E14.5 in key areas for neocortex development, and allows the grouping of enhancers by known biological functions and pathways for further studies. We find that multiple genes are flanked by dozens of candidate enhancers each, including well-known key neocortical genes as well as suspected and novel genes. Nearly a quarter of our candidate enhancers are conserved well beyond mammals. Human and zebrafish regions orthologous to our candidate enhancers are shown to most often function in other aspects of central nervous system development. Finally, we find strong evidence that specific interspersed repeat families have contributed potentially key developmental enhancers via co-option. Our analysis expands the methodologies available for extracting the richness of information found in genome-wide functional maps.
Collapse
Affiliation(s)
- Aaron M. Wenger
- Department of Computer Science, Stanford University, Stanford, California, United States of America
| | - Shoa L. Clarke
- Department of Genetics, Stanford University, Stanford, California, United States of America
| | - James H. Notwell
- Department of Computer Science, Stanford University, Stanford, California, United States of America
| | - Tisha Chung
- Department of Developmental Biology, Stanford University, Stanford, California, United States of America
| | - Geetu Tuteja
- Department of Developmental Biology, Stanford University, Stanford, California, United States of America
| | - Harendra Guturu
- Department of Electrical Engineering, Stanford University, Stanford, California, United States of America
| | - Bruce T. Schaar
- Department of Developmental Biology, Stanford University, Stanford, California, United States of America
| | - Gill Bejerano
- Department of Computer Science, Stanford University, Stanford, California, United States of America
- Department of Developmental Biology, Stanford University, Stanford, California, United States of America
- * E-mail:
| |
Collapse
|
46
|
Liu W, Chen H, Chen L. An ant colony optimization based algorithm for identifying gene regulatory elements. Comput Biol Med 2013; 43:922-32. [PMID: 23746735 DOI: 10.1016/j.compbiomed.2013.04.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2011] [Revised: 04/10/2013] [Accepted: 04/11/2013] [Indexed: 11/15/2022]
Abstract
It is one of the most important tasks in bioinformatics to identify the regulatory elements in gene sequences. Most of the existing algorithms for identifying regulatory elements are inclined to converge into a local optimum, and have high time complexity. Ant Colony Optimization (ACO) is a meta-heuristic method based on swarm intelligence and is derived from a model inspired by the collective foraging behavior of real ants. Taking advantage of the ACO in traits such as self-organization and robustness, this paper designs and implements an ACO based algorithm named ACRI (ant-colony-regulatory-identification) for identifying all possible binding sites of transcription factor from the upstream of co-expressed genes. To accelerate the ants' searching process, a strategy of local optimization is presented to adjust the ants' start positions on the searched sequences. By exploiting the powerful optimization ability of ACO, the algorithm ACRI can not only improve precision of the results, but also achieve a very high speed. Experimental results on real world datasets show that ACRI can outperform other traditional algorithms in the respects of speed and quality of solutions.
Collapse
Affiliation(s)
- Wei Liu
- Department of Computer Science and Engineering, Southeast University, Nanjing 210096, China.
| | | | | |
Collapse
|
47
|
Cox RS, Nishikata K, Shimoyama S, Yoshida Y, Matsui M, Makita Y, Toyoda T. PromoterCAD: Data-driven design of plant regulatory DNA. Nucleic Acids Res 2013; 41:W569-74. [PMID: 23766287 PMCID: PMC3692106 DOI: 10.1093/nar/gkt518] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Synthetic promoters can control the timing, location and amount of gene expression for any organism. PromoterCAD is a web application for designing synthetic promoters with altered transcriptional regulation. We use a data-first approach, using published high-throughput expression and motif data from for Arabidopsis thaliana to guide DNA design. We demonstrate data mining tools for finding motifs related to circadian oscillations and tissue-specific expression patterns. PromoterCAD is built on the LinkData open platform for data publication and rapid web application development, allowing new data to be easily added, and the source code modified to add new functionality. PromoterCAD URL: http://promotercad.org. LinkData URL: http://linkdata.org.
Collapse
Affiliation(s)
- Robert Sidney Cox
- Bioinformatics and Systems Engineering Division, RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | | | | | | | | | | | | |
Collapse
|
48
|
Spycher C, Herman EK, Morf L, Qi W, Rehrauer H, Aquino Fournier C, Dacks JB, Hehl AB. An ER-directed transcriptional response to unfolded protein stress in the absence of conserved sensor-transducer proteins inGiardia lamblia. Mol Microbiol 2013; 88:754-71. [DOI: 10.1111/mmi.12218] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/27/2013] [Indexed: 01/22/2023]
Affiliation(s)
- Cornelia Spycher
- Institute of Parasitology; University of Zurich; 8057; Zurich; Switzerland
| | - Emily K. Herman
- Department of Cell Biology; University of Alberta; Edmonton; AB; T6G 2H7; Canada
| | - Laura Morf
- Institute of Parasitology; University of Zurich; 8057; Zurich; Switzerland
| | - Weihong Qi
- Functional Genomics Center Zurich; 8057; Zurich; Switzerland
| | - Hubert Rehrauer
- Functional Genomics Center Zurich; 8057; Zurich; Switzerland
| | | | - Joel B. Dacks
- Department of Cell Biology; University of Alberta; Edmonton; AB; T6G 2H7; Canada
| | - Adrian B. Hehl
- Institute of Parasitology; University of Zurich; 8057; Zurich; Switzerland
| |
Collapse
|
49
|
Droll D, Minia I, Fadda A, Singh A, Stewart M, Queiroz R, Clayton C. Post-transcriptional regulation of the trypanosome heat shock response by a zinc finger protein. PLoS Pathog 2013; 9:e1003286. [PMID: 23592996 PMCID: PMC3616968 DOI: 10.1371/journal.ppat.1003286] [Citation(s) in RCA: 75] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2012] [Accepted: 02/19/2013] [Indexed: 12/30/2022] Open
Abstract
In most organisms, the heat-shock response involves increased heat-shock gene transcription. In Kinetoplastid protists, however, virtually all control of gene expression is post-transcriptional. Correspondingly, Trypanosoma brucei heat-shock protein 70 (HSP70) synthesis after heat shock depends on regulation of HSP70 mRNA turnover. We here show that the T. brucei CCCH zinc finger protein ZC3H11 is a post-transcriptional regulator of trypanosome chaperone mRNAs. ZC3H11 is essential in bloodstream-form trypanosomes and for recovery of insect-form trypanosomes from heat shock. ZC3H11 binds to mRNAs encoding heat-shock protein homologues, with clear specificity for the subset of trypanosome chaperones that is required for protein refolding. In procyclic forms, ZC3H11 was required for stabilisation of target chaperone-encoding mRNAs after heat shock, and the HSP70 mRNA was also decreased upon ZC3H11 depletion in bloodstream forms. Many mRNAs bound to ZC3H11 have a consensus AUU repeat motif in the 3'-untranslated region. ZC3H11 bound preferentially to AUU repeats in vitro, and ZC3H11 regulation of HSP70 mRNA in bloodstream forms depended on its AUU repeat region. Tethering of ZC3H11 to a reporter mRNA increased reporter expression, showing that it is capable of actively stabilizing an mRNA. These results show that expression of trypanosome heat-shock genes is controlled by a specific RNA-protein interaction. They also show that heat-shock-induced chaperone expression in procyclic trypanosome enhances parasite survival at elevated temperatures.
Collapse
Affiliation(s)
- Dorothea Droll
- Zentrum für Molekulare Biologie der Universität Heidelberg (ZMBH), DKFZ-ZMBH Alliance, Heidelberg, Germany
| | - Igor Minia
- Zentrum für Molekulare Biologie der Universität Heidelberg (ZMBH), DKFZ-ZMBH Alliance, Heidelberg, Germany
| | - Abeer Fadda
- Zentrum für Molekulare Biologie der Universität Heidelberg (ZMBH), DKFZ-ZMBH Alliance, Heidelberg, Germany
| | - Aditi Singh
- Zentrum für Molekulare Biologie der Universität Heidelberg (ZMBH), DKFZ-ZMBH Alliance, Heidelberg, Germany
| | - Mhairi Stewart
- Zentrum für Molekulare Biologie der Universität Heidelberg (ZMBH), DKFZ-ZMBH Alliance, Heidelberg, Germany
| | - Rafael Queiroz
- Zentrum für Molekulare Biologie der Universität Heidelberg (ZMBH), DKFZ-ZMBH Alliance, Heidelberg, Germany
| | - Christine Clayton
- Zentrum für Molekulare Biologie der Universität Heidelberg (ZMBH), DKFZ-ZMBH Alliance, Heidelberg, Germany
- * E-mail:
| |
Collapse
|
50
|
Wang D, Tapan S. MISCORE: a new scoring function for characterizing DNA regulatory motifs in promoter sequences. BMC SYSTEMS BIOLOGY 2012; 6 Suppl 2:S4. [PMID: 23282090 PMCID: PMC3521183 DOI: 10.1186/1752-0509-6-s2-s4] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Background Computational approaches for finding DNA regulatory motifs in promoter sequences are useful to biologists in terms of reducing the experimental costs and speeding up the discovery process of de novo binding sites. It is important for rule-based or clustering-based motif searching schemes to effectively and efficiently evaluate the similarity between a k-mer (a k-length subsequence) and a motif model, without assuming the independence of nucleotides in motif models or without employing computationally expensive Markov chain models to estimate the background probabilities of k-mers. Also, it is interesting and beneficial to use a priori knowledge in developing advanced searching tools. Results This paper presents a new scoring function, termed as MISCORE, for functional motif characterization and evaluation. Our MISCORE is free from: (i) any assumption on model dependency; and (ii) the use of Markov chain model for background modeling. It integrates the compositional complexity of motif instances into the function. Performance evaluations with comparison to the well-known Maximum a Posteriori (MAP) score and Information Content (IC) have shown that MISCORE has promising capabilities to separate and recognize functional DNA motifs and its instances from non-functional ones. Conclusions MISCORE is a fast computational tool for candidate motif characterization, evaluation and selection. It enables to embed priori known motif models for computing motif-to-motif similarity, which is more advantageous than IC and MAP score. In addition to these merits mentioned above, MISCORE can automatically filter out some repetitive k-mers from a motif model due to the introduction of the compositional complexity in the function. Consequently, the merits of our proposed MISCORE in terms of both motif signal modeling power and computational efficiency will make it more applicable in the development of computational motif discovery tools.
Collapse
Affiliation(s)
- Dianhui Wang
- Department of Computer Science and Computer Engineering, La Trobe University, Melbourne, Victoria 3086, Australia.
| | | |
Collapse
|