1
|
Taboada-Castro H, Hernández-Álvarez AJ, Castro-Mondragón JA, Encarnación-Guevara S. RhizoBindingSites v2.0 Is a Bioinformatic Database of DNA Motifs Potentially Involved in Transcriptional Regulation Deduced From Their Genomic Sites. Bioinform Biol Insights 2024; 18:11779322241272395. [PMID: 39246685 PMCID: PMC11380129 DOI: 10.1177/11779322241272395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Accepted: 07/12/2024] [Indexed: 09/10/2024] Open
Abstract
RhizoBindingSites is a de novo depurified database of conserved DNA motifs potentially involved in the transcriptional regulation of the Rhizobium, Sinorhizobium, Bradyrhizobium, Azorhizobium, and Mesorhizobium genera covering 9 representative symbiotic species, deduced from the upstream regulatory sequences of orthologous genes (O-matrices) from the Rhizobiales taxon. The sites collected with O-matrices per gene per genome from RhizoBindingSites were used to deduce matrices using the dyad-Regulatory Sequence Analysis Tool (RSAT) method, giving rise to novel S-matrices for the construction of the RizoBindingSites v2.0 database. A comparison of the S-matrix logos showed a greater frequency and/or re-definition of specific-position nucleotides found in the O-matrices. Moreover, S-matrices were better at detecting genes in the genome, and there was a more significant number of transcription factors (TFs) in the vicinity than O-matrices, corresponding to a more significant genomic coverage for S-matrices. O-matrices of 3187 TFs and S-matrices of 2754 TFs from 9 species were deposited in RhizoBindingSites and RhizoBindingSites v2.0, respectively. The homology between the matrices of TFs from a genome showed inter-regulation between the clustered TFs. In addition, matrices of AraC, ArsR, GntR, and LysR ortholog TFs showed different motifs, suggesting distinct regulation. Benchmarking showed 72%, 68%, and 81% of common genes per regulon for O-matrices and approximately 14% less common genes with S-matrices of Rhizobium etli CFN42, Rhizobium leguminosarum bv. viciae 3841, and Sinorhizobium meliloti 1021. These data were deposited in RhizoBindingSites and the RhizoBindingSites v2.0 database (http://rhizobindingsites.ccg.unam.mx/).
Collapse
|
2
|
Rauluseviciute I, Launay T, Barzaghi G, Nikumbh S, Lenhard B, Krebs AR, Castro-Mondragon JA, Mathelier A. Identification of transcription factor co-binding patterns with non-negative matrix factorization. Nucleic Acids Res 2024:gkae743. [PMID: 39217462 DOI: 10.1093/nar/gkae743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 07/12/2024] [Accepted: 08/15/2024] [Indexed: 09/04/2024] Open
Abstract
Transcription factor (TF) binding to DNA is critical to transcription regulation. Although the binding properties of numerous individual TFs are well-documented, a more detailed comprehension of how TFs interact cooperatively with DNA is required. We present COBIND, a novel method based on non-negative matrix factorization (NMF) to identify TF co-binding patterns automatically. COBIND applies NMF to one-hot encoded regions flanking known TF binding sites (TFBSs) to pinpoint enriched DNA patterns at fixed distances. We applied COBIND to 5699 TFBS datasets from UniBind for 401 TFs in seven species. The method uncovered already established co-binding patterns and new co-binding configurations not yet reported in the literature and inferred through motif similarity and protein-protein interaction knowledge. Our extensive analyses across species revealed that 67% of the TFs shared a co-binding motif with other TFs from the same structural family. The co-binding patterns captured by COBIND are likely functionally relevant as they harbor higher evolutionarily conservation than isolated TFBSs. Open chromatin data from matching human cell lines further supported the co-binding predictions. Finally, we used single-molecule footprinting data from mouse embryonic stem cells to confirm that the COBIND-predicted co-binding events associated with some TFs likely occurred on the same DNA molecules.
Collapse
Affiliation(s)
- Ieva Rauluseviciute
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Timothée Launay
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Guido Barzaghi
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
- Collaboration for Joint Ph.D. degree between EMBL and Heidelberg University, Heidelberg, Germany
| | - Sarvesh Nikumbh
- MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London W12 0NN, UK
| | - Boris Lenhard
- MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK
- Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Hammersmith Hospital Campus, Du Cane Road, London W12 0NN, UK
| | - Arnaud Regis Krebs
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Jaime A Castro-Mondragon
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Anthony Mathelier
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
- Department of Medical Genetics, Institute of Clinical Medicine, University of Oslo and Oslo University Hospital, Oslo, Norway
- Center for Bioinformatics, Department of Informatics, Faculty of Mathematics and Natural Sciences, University of Oslo, Oslo, Norway
| |
Collapse
|
3
|
Taboada-Castro H, Hernández-Álvarez AJ, Escorcia-Rodríguez JM, Freyre-González JA, Galán-Vásquez E, Encarnación-Guevara S. Rhizobium etli CFN42 and Sinorhizobium meliloti 1021 bioinformatic transcriptional regulatory networks from culture and symbiosis. FRONTIERS IN BIOINFORMATICS 2024; 4:1419274. [PMID: 39263245 PMCID: PMC11387232 DOI: 10.3389/fbinf.2024.1419274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Accepted: 07/24/2024] [Indexed: 09/13/2024] Open
Abstract
Rhizobium etli CFN42 proteome-transcriptome mixed data of exponential growth and nitrogen-fixing bacteroids, as well as Sinorhizobium meliloti 1021 transcriptome data of growth and nitrogen-fixing bacteroids, were integrated into transcriptional regulatory networks (TRNs). The one-step construction network consisted of a matrix-clustering analysis of matrices of the gene profile and all matrices of the transcription factors (TFs) of their genome. The networks were constructed with the prediction of regulatory network application of the RhizoBindingSites database (http://rhizobindingsites.ccg.unam.mx/). The deduced free-living Rhizobium etli network contained 1,146 genes, including 380 TFs and 12 sigma factors. In addition, the bacteroid R. etli CFN42 network contained 884 genes, where 364 were TFs, and 12 were sigma factors, whereas the deduced free-living Sinorhizobium meliloti 1021 network contained 643 genes, where 259 were TFs and seven were sigma factors, and the bacteroid Sinorhizobium meliloti 1021 network contained 357 genes, where 210 were TFs and six were sigma factors. The similarity of these deduced condition-dependent networks and the biological E. coli and B. subtilis independent condition networks segregates from the random Erdös-Rényi networks. Deduced networks showed a low average clustering coefficient. They were not scale-free, showing a gradually diminishing hierarchy of TFs in contrast to the hierarchy role of the sigma factor rpoD in the E. coli K12 network. For rhizobia networks, partitioning the genome in the chromosome, chromids, and plasmids, where essential genes are distributed, and the symbiotic ability that is mostly coded in plasmids, may alter the structure of these deduced condition-dependent networks. It provides potential TF gen-target relationship data for constructing regulons, which are the basic units of a TRN.
Collapse
Affiliation(s)
| | | | | | | | - Edgardo Galán-Vásquez
- Institute of Applied Mathematics and in Systems (IIMAS), National Autonomous University of México, Mexico City, Mexico
| | | |
Collapse
|
4
|
Smet D, Opdebeeck H, Vandepoele K. Predicting transcriptional responses to heat and drought stress from genomic features using a machine learning approach in rice. FRONTIERS IN PLANT SCIENCE 2023; 14:1212073. [PMID: 37528982 PMCID: PMC10390317 DOI: 10.3389/fpls.2023.1212073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 06/16/2023] [Indexed: 08/03/2023]
Abstract
Plants have evolved various mechanisms to adapt to adverse environmental stresses, such as the modulation of gene expression. Expression of stress-responsive genes is controlled by specific regulators, including transcription factors (TFs), that bind to sequence-specific binding sites, representing key components of cis-regulatory elements and regulatory networks. Our understanding of the underlying regulatory code remains, however, incomplete. Recent studies have shown that, by training machine learning (ML) algorithms on genomic sequence features, it is possible to predict which genes will transcriptionally respond to a specific stress. By identifying the most important features for gene expression prediction, these trained ML models allow, in theory, to further elucidate the regulatory code underlying the transcriptional response to abiotic stress. Here, we trained random forest ML models to predict gene expression in rice (Oryza sativa) in response to heat or drought stress. Apart from thoroughly assessing model performance and robustness across various input training data, the importance of promoter and gene body sequence features to train ML models was evaluated. The use of enriched promoter oligomers, complementing known TF binding sites, allowed us to gain novel insights in DNA motifs contributing to the stress regulatory code. By comparing genomic feature importance scores for drought and heat stress over time, general and stress-specific genomic features contributing to the performance of the learned models and their temporal variation were identified. This study provides a solid foundation to build and interpret ML models accurately predicting transcriptional responses and enables novel insights in biological sequence features that are important for abiotic stress responses.
Collapse
Affiliation(s)
- Dajo Smet
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Center for Plant Systems Biology, Vlaams Instituut voor Biotechnologie (VIB), Ghent, Belgium
| | - Helder Opdebeeck
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Center for Plant Systems Biology, Vlaams Instituut voor Biotechnologie (VIB), Ghent, Belgium
| | - Klaas Vandepoele
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Center for Plant Systems Biology, Vlaams Instituut voor Biotechnologie (VIB), Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| |
Collapse
|
5
|
Brocal-Ruiz R, Esteve-Serrano A, Mora-Martínez C, Franco-Rivadeneira ML, Swoboda P, Tena JJ, Vilar M, Flames N. Forkhead transcription factor FKH-8 cooperates with RFX in the direct regulation of sensory cilia in Caenorhabditis elegans. eLife 2023; 12:e89702. [PMID: 37449480 PMCID: PMC10393296 DOI: 10.7554/elife.89702] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 07/07/2023] [Indexed: 07/18/2023] Open
Abstract
Cilia, either motile or non-motile (a.k.a primary or sensory), are complex evolutionarily conserved eukaryotic structures composed of hundreds of proteins required for their assembly, structure and function that are collectively known as the ciliome. Ciliome gene mutations underlie a group of pleiotropic genetic diseases known as ciliopathies. Proper cilium function requires the tight coregulation of ciliome gene transcription, which is only fragmentarily understood. RFX transcription factors (TF) have an evolutionarily conserved role in the direct activation of ciliome genes both in motile and non-motile cilia cell-types. In vertebrates, FoxJ1 and FoxN4 Forkhead (FKH) TFs work with RFX in the direct activation of ciliome genes, exclusively in motile cilia cell-types. No additional TFs have been described to act together with RFX in primary cilia cell-types in any organism. Here we describe FKH-8, a FKH TF, as a direct regulator of the sensory ciliome genes in Caenorhabditis elegans. FKH-8 is expressed in all ciliated neurons in C. elegans, binds the regulatory regions of ciliome genes, regulates ciliome gene expression, cilium morphology and a wide range of behaviors mediated by sensory ciliated neurons. FKH-8 and DAF-19 (C. elegans RFX) physically interact and synergistically regulate ciliome gene expression. C. elegans FKH-8 function can be replaced by mouse FOXJ1 and FOXN4 but not by other members of other mouse FKH subfamilies. In conclusion, RFX and FKH TF families act jointly as direct regulators of ciliome genes also in sensory ciliated cell types suggesting that this regulatory logic could be an ancient trait predating functional cilia sub-specialization.
Collapse
Affiliation(s)
- Rebeca Brocal-Ruiz
- Developmental Neurobiology Unit, Instituto de Biomedicina de Valencia IBV-CSICValenciaSpain
| | - Ainara Esteve-Serrano
- Developmental Neurobiology Unit, Instituto de Biomedicina de Valencia IBV-CSICValenciaSpain
| | - Carlos Mora-Martínez
- Developmental Neurobiology Unit, Instituto de Biomedicina de Valencia IBV-CSICValenciaSpain
| | | | - Peter Swoboda
- Department of Biosciences and Nutrition. Karolinska Institute. Campus FlemingsbergStockholmSweden
| | - Juan J Tena
- Centro Andaluz de Biología del Desarrollo (CABD), Consejo Superior de Investigaciones Científicas/Universidad Pablo de OlavideSevilleSpain
| | - Marçal Vilar
- Molecular Basis of Neurodegeneration Unit, Instituto de Biomedicina de Valencia IBV-CSICValenciaSpain
| | - Nuria Flames
- Developmental Neurobiology Unit, Instituto de Biomedicina de Valencia IBV-CSICValenciaSpain
| |
Collapse
|
6
|
Kitchen SA, Jiang D, Harii S, Satoh N, Weis VM, Shinzato C. Coral larvae suppress heat stress response during the onset of symbiosis decreasing their odds of survival. Mol Ecol 2022; 31:5813-5830. [PMID: 36168983 DOI: 10.1111/mec.16708] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 09/06/2022] [Accepted: 09/15/2022] [Indexed: 01/13/2023]
Abstract
The endosymbiosis between most corals and their photosynthetic dinoflagellate partners begins early in the host life history, when corals are larvae or juvenile polyps. The capacity of coral larvae to buffer climate-induced stress while in the process of symbiont acquisition could come with physiological trade-offs that alter behaviour, development, settlement and survivorship. Here we examined the joint effects of thermal stress and symbiosis onset on colonization dynamics, survival, metamorphosis and host gene expression of Acropora digitifera larvae. We found that thermal stress decreased symbiont colonization of hosts by 50% and symbiont density by 98.5% over 2 weeks. Temperature and colonization also influenced larval survival and metamorphosis in an additive manner, where colonized larvae fared worse or prematurely metamorphosed more often than noncolonized larvae under thermal stress. Transcriptomic responses to colonization and thermal stress treatments were largely independent, while the interaction of these treatments revealed contrasting expression profiles of genes that function in the stress response, immunity, inflammation and cell cycle regulation. The combined treatment either cancelled or lowered the magnitude of expression of heat-stress responsive genes in the presence of symbionts, revealing a physiological cost to acquiring symbionts at the larval stage with elevated temperatures. In addition, host immune suppression, a hallmark of symbiosis onset under ambient temperature, turned to immune activation under heat stress. Thus, by integrating the physical environment and biotic pressures that mediate presettlement event in corals, our results suggest that colonization may hinder larval survival and recruitment under projected climate scenarios.
Collapse
Affiliation(s)
- Sheila A Kitchen
- Department of Integrative Biology, Oregon State University, Corvallis, Oregon, USA
| | - Duo Jiang
- Statistics Department, Oregon State University, Corvallis, Oregon, USA
| | - Saki Harii
- Tropical Biosphere Research Center, University of the Ryukyus, Okinawa, Japan
| | - Noriyuki Satoh
- Marine Genomics Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| | - Virginia M Weis
- Department of Integrative Biology, Oregon State University, Corvallis, Oregon, USA
| | - Chuya Shinzato
- Marine Genomics Unit, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| |
Collapse
|
7
|
Taboada-Castro H, Gil J, Gómez-Caudillo L, Escorcia-Rodríguez JM, Freyre-González JA, Encarnación-Guevara S. Rhizobium etli CFN42 proteomes showed isoenzymes in free-living and symbiosis with a different transcriptional regulation inferred from a transcriptional regulatory network. Front Microbiol 2022; 13:947678. [PMID: 36312930 PMCID: PMC9611204 DOI: 10.3389/fmicb.2022.947678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Accepted: 09/05/2022] [Indexed: 11/13/2022] Open
Abstract
A comparative proteomic study at 6 h of growth in minimal medium (MM) and bacteroids at 18 days of symbiosis of Rhizobium etli CFN42 with the Phaseolus vulgaris leguminous plant was performed. A gene ontology classification of proteins in MM and bacteroid, showed 31 and 10 pathways with higher or equal than 30 and 20% of proteins with respect to genome content per pathway, respectively. These pathways were for energy and environmental compound metabolism, contributing to understand how Rhizobium is adapted to the different conditions. Metabolic maps based on orthology of the protein profiles, showed 101 and 74 functional homologous proteins in the MM and bacteroid profiles, respectively, which were grouped in 34 different isoenzymes showing a great impact in metabolism by covering 60 metabolic pathways in MM and symbiosis. Taking advantage of co-expression of transcriptional regulators (TF’s) in the profiles, by selection of genes whose matrices were clustered with matrices of TF’s, Transcriptional Regulatory networks (TRN´s) were deduced by the first time for these metabolic stages. In these clustered TF-MM and clustered TF-bacteroid networks, containing 654 and 246 proteins, including 93 and 46 TFs, respectively, showing valuable information of the TF’s and their regulated genes with high stringency. Isoenzymes were specific for adaptation to the different conditions and a different transcriptional regulation for MM and bacteroid was deduced. The parameters of the TRNs of these expected biological networks and biological networks of E. coli and B. subtilis segregate from the random theoretical networks. These are useful data to design experiments on TF gene–target relationships for bases to construct a TRN.
Collapse
Affiliation(s)
- Hermenegildo Taboada-Castro
- Proteomics Laboratory, Program of Functional Genomics of Prokaryotes, Center for Genomic Sciences, National Autonomous University of Mexico, Cuernavaca, Morelos, Mexico
| | - Jeovanis Gil
- Division of Oncology, Section for Clinical Chemistry, Department of Translational Medicine, Lund University, Lund, Sweden
| | - Leopoldo Gómez-Caudillo
- Proteomics Laboratory, Program of Functional Genomics of Prokaryotes, Center for Genomic Sciences, National Autonomous University of Mexico, Cuernavaca, Morelos, Mexico
| | - Juan Miguel Escorcia-Rodríguez
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, National Autonomous University of Mexico, Mexico City, Mexico
| | - Julio Augusto Freyre-González
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, National Autonomous University of Mexico, Mexico City, Mexico
| | - Sergio Encarnación-Guevara
- Proteomics Laboratory, Program of Functional Genomics of Prokaryotes, Center for Genomic Sciences, National Autonomous University of Mexico, Cuernavaca, Morelos, Mexico
- *Correspondence: Sergio Encarnacion Guevara,
| |
Collapse
|
8
|
Santana-Garcia W, Castro-Mondragon JA, Padilla-Gálvez M, Nguyen NT, Elizondo-Salas A, Ksouri N, Gerbes F, Thieffry D, Vincens P, Contreras-Moreira B, van Helden J, Thomas-Chollier M, Medina-Rivera A. RSAT 2022: regulatory sequence analysis tools. Nucleic Acids Res 2022; 50:W670-W676. [PMID: 35544234 PMCID: PMC9252783 DOI: 10.1093/nar/gkac312] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 04/12/2022] [Accepted: 04/20/2022] [Indexed: 11/12/2022] Open
Abstract
RSAT (Regulatory Sequence Analysis Tools) enables the detection and the analysis of cis-regulatory elements in genomic sequences. This software suite performs (i) de novo motif discovery (including from genome-wide datasets like ChIP-seq/ATAC-seq) (ii) genomic sequences scanning with known motifs, (iii) motif analysis (quality assessment, comparisons and clustering), (iv) analysis of regulatory variations and (v) comparative genomics. RSAT comprises 50 tools. Six public Web servers (including a teaching server) are offered to meet the needs of different biological communities. RSAT philosophy and originality are: (i) a multi-modal access depending on the user needs, through web forms, command-line for local installation and programmatic web services, (ii) a support for virtually any genome (animals, bacteria, plants, totalizing over 10 000 genomes directly accessible). Since the 2018 NAR Web Software Issue, we have developed a large REST API, extended the support for additional genomes and external motif collections, enhanced some tools and Web forms, and developed a novel tool that builds or refine gene regulatory networks using motif scanning (network-interactions). The RSAT website provides extensive documentation, tutorials and published protocols. RSAT code is under open-source license and now hosted in GitHub. RSAT is available at http://www.rsat.eu/.
Collapse
Affiliation(s)
- Walter Santana-Garcia
- Institut de biologie de l’Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Université Paris, 75005 Paris, France
| | - Jaime A Castro-Mondragon
- Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Mónica Padilla-Gálvez
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, 76230 Santiago de Querétaro, México
| | - Nga Thi Thuy Nguyen
- Institut de biologie de l’Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Université Paris, 75005 Paris, France
| | - Ana Elizondo-Salas
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, 76230 Santiago de Querétaro, México
| | - Najla Ksouri
- Estación Experimental de Aula Dei-CSIC, 50059 Zaragoza, Spain
| | - François Gerbes
- CNRS, Institut Français de Bioinformatique, IFB-core, UMS 3601, Evry, France
| | - Denis Thieffry
- Institut de biologie de l’Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Université Paris, 75005 Paris, France
| | - Pierre Vincens
- Institut de biologie de l’Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Université Paris, 75005 Paris, France
| | | | - Jacques van Helden
- CNRS, Institut Français de Bioinformatique, IFB-core, UMS 3601, Evry, France
- Aix-Marseille Univ, INSERM UMR_S 1090, Lab Theory and Approaches of Genome Complexity (TAGC), F-13288 Marseille, France
| | - Morgane Thomas-Chollier
- Institut de biologie de l’Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Université Paris, 75005 Paris, France
| | - Alejandra Medina-Rivera
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, 76230 Santiago de Querétaro, México
| |
Collapse
|
9
|
Web-Based Bioinformatics Approach Towards Analysis of Regulatory Sequences. Adv Bioinformatics 2021. [DOI: 10.1007/978-981-33-6191-1_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
10
|
Pirayre A, Duval L, Blugeon C, Firmo C, Perrin S, Jourdier E, Margeot A, Bidard F. Glucose-lactose mixture feeds in industry-like conditions: a gene regulatory network analysis on the hyperproducing Trichoderma reesei strain Rut-C30. BMC Genomics 2020; 21:885. [PMID: 33302864 PMCID: PMC7731781 DOI: 10.1186/s12864-020-07281-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Accepted: 11/25/2020] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND The degradation of cellulose and hemicellulose molecules into simpler sugars such as glucose is part of the second generation biofuel production process. Hydrolysis of lignocellulosic substrates is usually performed by enzymes produced and secreted by the fungus Trichoderma reesei. Studies identifying transcription factors involved in the regulation of cellulase production have been conducted but no overview of the whole regulation network is available. A transcriptomic approach with mixtures of glucose and lactose, used as a substrate for cellulase induction, was used to help us decipher missing parts in the network of T. reesei Rut-C30. RESULTS Experimental results on the Rut-C30 hyperproducing strain confirmed the impact of sugar mixtures on the enzymatic cocktail composition. The transcriptomic study shows a temporal regulation of the main transcription factors and a lactose concentration impact on the transcriptional profile. A gene regulatory network built using BRANE Cut software reveals three sub-networks related to i) a positive correlation between lactose concentration and cellulase production, ii) a particular dependence of the lactose onto the β-glucosidase regulation and iii) a negative regulation of the development process and growth. CONCLUSIONS This work is the first investigating a transcriptomic study regarding the effects of pure and mixed carbon sources in a fed-batch mode. Our study expose a co-orchestration of xyr1, clr2 and ace3 for cellulase and hemicellulase induction and production, a fine regulation of the β-glucosidase and a decrease of growth in favor of cellulase production. These conclusions provide us with potential targets for further genetic engineering leading to better cellulase-producing strains in industry-like conditions.
Collapse
Affiliation(s)
- Aurélie Pirayre
- IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, Rueil-Malmaison, 92852, France.
| | - Laurent Duval
- IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, Rueil-Malmaison, 92852, France
- Laboratoire d'Informatique Gaspard-Monge (LIGM), ESIEE Paris, Université-Gustave Eiffel, Marne-la-Vallée, F-77454, France
| | - Corinne Blugeon
- Genomic facility, Institut de Biologie de l'ENS (IBENS), Département de biologie, École normale supérieure, CNRS, INSERM, Université PSL, Paris, 75005, France
| | - Cyril Firmo
- Genomic facility, Institut de Biologie de l'ENS (IBENS), Département de biologie, École normale supérieure, CNRS, INSERM, Université PSL, Paris, 75005, France
| | - Sandrine Perrin
- Genomic facility, Institut de Biologie de l'ENS (IBENS), Département de biologie, École normale supérieure, CNRS, INSERM, Université PSL, Paris, 75005, France
| | - Etienne Jourdier
- IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, Rueil-Malmaison, 92852, France
| | - Antoine Margeot
- IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, Rueil-Malmaison, 92852, France
| | - Frédérique Bidard
- IFP Energies nouvelles, 1 et 4 avenue de Bois-Préau, Rueil-Malmaison, 92852, France
| |
Collapse
|
11
|
Taboada-Castro H, Castro-Mondragón JA, Aguilar-Vera A, Hernández-Álvarez AJ, van Helden J, Encarnación-Guevara S. RhizoBindingSites, a Database of DNA-Binding Motifs in Nitrogen-Fixing Bacteria Inferred Using a Footprint Discovery Approach. Front Microbiol 2020; 11:567471. [PMID: 33250866 PMCID: PMC7674921 DOI: 10.3389/fmicb.2020.567471] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 10/13/2020] [Indexed: 11/30/2022] Open
Abstract
Basic knowledge of transcriptional regulation is needed to understand the mechanisms governing biological processes, i.e., nitrogen fixation by Rhizobiales bacteria in symbiosis with leguminous plants. The RhizoBindingSites database is a computer-assisted framework providing motif-gene-associated conserved sequences potentially implicated in transcriptional regulation in nine symbiotic species. A dyad analysis algorithm was used to deduce motifs in the upstream regulatory region of orthologous genes, and only motifs also located in the gene seed promoter with a p-value of 1e-4 were accepted. A genomic scan analysis of the upstoream sequences with these motifs was performed. These predicted binding sites were categorized according to low, medium and high homology between the matrix and the upstream regulatory sequence. On average, 62.7% of the genes had a motif, accounting for 80.44% of the genes per genome, with 19613 matrices (a matrix is a representation of a motif). The RhizoBindingSites database provides motif and gene information, motif conservation in the order Rhizobiales, matrices, motif logos, regulatory networks constructed from theoretical or experimental data, a criterion for selecting motifs and a guide for users. The RhizoBindingSites database is freely available online at rhizobindingsites.ccg.unam.mx.
Collapse
Affiliation(s)
| | | | - Alejandro Aguilar-Vera
- Center for Genomic Sciences, National Autonomous University of Mexico, Cuernavaca, Mexico
| | | | - Jacques van Helden
- CNRS, IFB-core, UMS 3601, Institut Français de Bioinformatique, Évry, France.,Laboratoire Theory and Approaches of Genome Complexity (TAGC), Inserm, Aix-Marseille Univ, Marseille, France
| | | |
Collapse
|
12
|
Nguyen NTT, Contreras-Moreira B, Castro-Mondragon JA, Santana-Garcia W, Ossio R, Robles-Espinoza CD, Bahin M, Collombet S, Vincens P, Thieffry D, van Helden J, Medina-Rivera A, Thomas-Chollier M. RSAT 2018: regulatory sequence analysis tools 20th anniversary. Nucleic Acids Res 2019; 46:W209-W214. [PMID: 29722874 PMCID: PMC6030903 DOI: 10.1093/nar/gky317] [Citation(s) in RCA: 133] [Impact Index Per Article: 26.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Accepted: 04/23/2018] [Indexed: 12/27/2022] Open
Abstract
RSAT (Regulatory Sequence Analysis Tools) is a suite of modular tools for the detection and the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, including from genome-wide datasets like ChIP-seq/ATAC-seq, (ii) motif scanning, (iii) motif analysis (quality assessment, comparisons and clustering), (iv) analysis of regulatory variations, (v) comparative genomics. Six public servers jointly support 10 000 genomes from all kingdoms. Six novel or refactored programs have been added since the 2015 NAR Web Software Issue, including updated programs to analyse regulatory variants (retrieve-variation-seq, variation-scan, convert-variations), along with tools to extract sequences from a list of coordinates (retrieve-seq-bed), to select motifs from motif collections (retrieve-matrix), and to extract orthologs based on Ensembl Compara (get-orthologs-compara). Three use cases illustrate the integration of new and refactored tools to the suite. This Anniversary update gives a 20-year perspective on the software suite. RSAT is well-documented and available through Web sites, SOAP/WSDL (Simple Object Access Protocol/Web Services Description Language) web services, virtual machines and stand-alone programs at http://www.rsat.eu/.
Collapse
Affiliation(s)
- Nga Thi Thuy Nguyen
- Institut de biologie de l'Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Université Paris, 75005 Paris, France
| | | | - Jaime A Castro-Mondragon
- Aix-Marseille Univ, INSERM UMR_S 1090, Theory and Approaches of Genome Complexity (TAGC), F-13288 Marseille, France.,Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Walter Santana-Garcia
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, Santiago de Querétaro 76230, México
| | - Raul Ossio
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, Santiago de Querétaro 76230, México
| | - Carla Daniela Robles-Espinoza
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, Santiago de Querétaro 76230, México.,Experimental Cancer Genetics, The Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Mathieu Bahin
- Institut de biologie de l'Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Université Paris, 75005 Paris, France
| | - Samuel Collombet
- Institut de biologie de l'Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Université Paris, 75005 Paris, France
| | - Pierre Vincens
- Institut de biologie de l'Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Université Paris, 75005 Paris, France
| | - Denis Thieffry
- Institut de biologie de l'Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Université Paris, 75005 Paris, France
| | - Jacques van Helden
- Aix-Marseille Univ, INSERM UMR_S 1090, Theory and Approaches of Genome Complexity (TAGC), F-13288 Marseille, France
| | - Alejandra Medina-Rivera
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, Santiago de Querétaro 76230, México
| | - Morgane Thomas-Chollier
- Institut de biologie de l'Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Université Paris, 75005 Paris, France
| |
Collapse
|
13
|
Sharma BS, Swain PK, Verma RJ. A Systematic Bioinformatics Approach to Motif-Based Analysis of Human Locus Control Regions. J Comput Biol 2019; 26:1427-1437. [PMID: 31305132 DOI: 10.1089/cmb.2019.0155] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
Locus control regions (LCRs), cis-acting, noncoding regulatory elements with strong transcription-enhancing activity, are conserved in sequence and organization, and exhibit strict gene-specific expression. LCRs have been reported and studied in several mammalian gene systems, signifying that they play an important role in eukaryotic gene expression control. Their highly regulated, stable, and precise levels of expression have made them a strong candidate for use in gene therapy vectors. In this study, we attempted to determine the unique signatures of human LCRs by analyzing a data set of LCR sequences for the presence of motifs through systematic bioinformatics approach. Using web-based regulatory sequence analysis tools (RSAT), motif-based analysis was performed. Detected significant motifs were analyzed further for their identity using Tomtom tool. RSAT analysis revealed that significant motifs are existent within the LCRs. Identity analysis using Tomtom showed that detected significant motifs were comparable with known transcription factor (TF) binding sites and the top scoring motifs belong to zinc finger-containing proteins, an important group of proteins involved in a variety of cellular activities. Correspondence to segment of known motif indicates the biological relevance of the detected motifs. Motif-based analysis is valuable for analyzing the various characteristics of sequences, notably TF binding models in this study. Owning to their unique expression control abilities, LCRs form an important component of integrating vectors, therefore identification of unique signatures present within LCR sequences will be instrumental in the design of new generation of regulatory elements containing LCR sequences.
Collapse
Affiliation(s)
- B Sharan Sharma
- Life Sciences Research Division, Indrashil Institute of Science and Technology (IIST), Indrashil University (IU), Mehsana, India.,Department of Human Genetics, Zoology and Biomedical Technology, University School of Sciences, Gujarat University, Ahmedabad, India
| | - Prabodha K Swain
- Life Sciences Research Division, Indrashil Institute of Science and Technology (IIST), Indrashil University (IU), Mehsana, India
| | - Ramtej J Verma
- Department of Human Genetics, Zoology and Biomedical Technology, University School of Sciences, Gujarat University, Ahmedabad, India
| |
Collapse
|
14
|
Galli M, Khakhar A, Lu Z, Chen Z, Sen S, Joshi T, Nemhauser JL, Schmitz RJ, Gallavotti A. The DNA binding landscape of the maize AUXIN RESPONSE FACTOR family. Nat Commun 2018; 9:4526. [PMID: 30375394 PMCID: PMC6207667 DOI: 10.1038/s41467-018-06977-6] [Citation(s) in RCA: 90] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2018] [Accepted: 09/23/2018] [Indexed: 01/19/2023] Open
Abstract
AUXIN RESPONSE FACTORS (ARFs) are plant-specific transcription factors (TFs) that couple perception of the hormone auxin to gene expression programs essential to all land plants. As with many large TF families, a key question is whether individual members determine developmental specificity by binding distinct target genes. We use DAP-seq to generate genome-wide in vitro TF:DNA interaction maps for fourteen maize ARFs from the evolutionarily conserved A and B clades. Comparative analysis reveal a high degree of binding site overlap for ARFs of the same clade, but largely distinct clade A and B binding. Many sites are however co-occupied by ARFs from both clades, suggesting transcriptional coordination for many genes. Among these, we investigate known QTLs and use machine learning to predict the impact of cis-regulatory variation. Overall, large-scale comparative analysis of ARF binding suggests that auxin response specificity may be determined by factors other than individual ARF binding site selection.
Collapse
Affiliation(s)
- Mary Galli
- Waksman Institute of Microbiology, Rutgers University, Piscataway, NJ, 08854-8020, USA
| | - Arjun Khakhar
- Department of Biology, University of Washington, Seattle, WA, 98195-1800, USA
| | - Zefu Lu
- Department of Genetics, The University of Georgia, Athens, GA, 30602, USA
| | - Zongliang Chen
- Waksman Institute of Microbiology, Rutgers University, Piscataway, NJ, 08854-8020, USA
| | - Sidharth Sen
- Informatics Institute, University of Missouri, Columbia, MO, 65211, USA
| | - Trupti Joshi
- Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.,Department of Health Management and Informatics and Christopher S. Bond Life Science Center, University of Missouri, Columbia, MO, 65211, USA
| | | | - Robert J Schmitz
- Department of Genetics, The University of Georgia, Athens, GA, 30602, USA
| | - Andrea Gallavotti
- Waksman Institute of Microbiology, Rutgers University, Piscataway, NJ, 08854-8020, USA. .,Department of Plant Biology, Rutgers University, New Brunswick, NJ, 08901, USA.
| |
Collapse
|
15
|
Reimegård J, Kundu S, Pendle A, Irish VF, Shaw P, Nakayama N, Sundström JF, Emanuelsson O. Genome-wide identification of physically clustered genes suggests chromatin-level co-regulation in male reproductive development in Arabidopsis thaliana. Nucleic Acids Res 2017; 45:3253-3265. [PMID: 28175342 PMCID: PMC5389543 DOI: 10.1093/nar/gkx087] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2017] [Accepted: 01/31/2017] [Indexed: 12/02/2022] Open
Abstract
Co-expression of physically linked genes occurs surprisingly frequently in eukaryotes. Such chromosomal clustering may confer a selective advantage as it enables coordinated gene regulation at the chromatin level. We studied the chromosomal organization of genes involved in male reproductive development in Arabidopsis thaliana. We developed an in-silico tool to identify physical clusters of co-regulated genes from gene expression data. We identified 17 clusters (96 genes) involved in stamen development and acting downstream of the transcriptional activator MS1 (MALE STERILITY 1), which contains a PHD domain associated with chromatin re-organization. The clusters exhibited little gene homology or promoter element similarity, and largely overlapped with reported repressive histone marks. Experiments on a subset of the clusters suggested a link between expression activation and chromatin conformation: qRT-PCR and mRNA in situ hybridization showed that the clustered genes were up-regulated within 48 h after MS1 induction; out of 14 chromatin-remodeling mutants studied, expression of clustered genes was consistently down-regulated only in hta9/hta11, previously associated with metabolic cluster activation; DNA fluorescence in situ hybridization confirmed that transcriptional activation of the clustered genes was correlated with open chromatin conformation. Stamen development thus appears to involve transcriptional activation of physically clustered genes through chromatin de-condensation.
Collapse
Affiliation(s)
- Johan Reimegård
- Science for Life Laboratory, School of Biotechnology, Division of Gene Technology, KTH Royal Institute of Technology, Solna SE-171 65, Sweden
| | - Snehangshu Kundu
- Department of Plant Biology, Uppsala BioCenter, Linnean Center for Plant Biology, Swedish University of Agricultural Sciences, Uppsala SE-750 07, Sweden
| | - Ali Pendle
- Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK
| | - Vivian F Irish
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT 06520, USA
| | - Peter Shaw
- Department of Cell and Developmental Biology, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK
| | - Naomi Nakayama
- Institute of Molecular Plant Science, SynthSys Centre for Synthetic and Systems Biology, and Centre for Science at Extreme Conditions, University of Edinburgh, King's Buildings, Edinburgh, UK
| | - Jens F Sundström
- Department of Plant Biology, Uppsala BioCenter, Linnean Center for Plant Biology, Swedish University of Agricultural Sciences, Uppsala SE-750 07, Sweden
| | - Olof Emanuelsson
- Science for Life Laboratory, School of Biotechnology, Division of Gene Technology, KTH Royal Institute of Technology, Solna SE-171 65, Sweden
| |
Collapse
|
16
|
Abstract
Regulation of gene expression ensures an organism responds to stimuli and undergoes proper development. Although the regulatory networks in bacteria have been investigated in model microorganisms, nearly nothing is known about the evolution and plasticity of these networks in obligate, intracellular bacteria. The phylum Chlamydiae contains a vast array of host-associated microbes, including several human pathogens. The Chlamydiae are unique among obligate, intracellular bacteria as they undergo a complex biphasic developmental cycle in which large swaths of genes are temporally regulated. Coupled with the low number of transcription factors, these organisms offer a model to study the evolution of regulatory networks in intracellular organisms. We provide the first comprehensive analysis exploring the diversity and evolution of regulatory networks across the phylum. We utilized a comparative genomics approach to construct predicted coregulatory networks, which unveiled genus- and family-specific regulatory motifs and architectures, most notably those of virulence-associated genes. Surprisingly, our analysis suggests that few regulatory components are conserved across the phylum, and those that are conserved are involved in the exploitation of the intracellular niche. Our study thus lends insight into a component of chlamydial evolution that has otherwise remained largely unexplored.
Collapse
Affiliation(s)
- D Domman
- Department of Microbiology and Ecosystem Science, University of Vienna, Vienna, Austria
| | - M Horn
- Department of Microbiology and Ecosystem Science, University of Vienna, Vienna, Austria
| |
Collapse
|
17
|
Reiss DJ, Plaisier CL, Wu WJ, Baliga NS. cMonkey2: Automated, systematic, integrated detection of co-regulated gene modules for any organism. Nucleic Acids Res 2015; 43:e87. [PMID: 25873626 PMCID: PMC4513845 DOI: 10.1093/nar/gkv300] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2014] [Revised: 03/05/2015] [Accepted: 03/26/2015] [Indexed: 12/25/2022] Open
Abstract
The cMonkey integrated biclustering algorithm identifies conditionally co-regulated modules of genes (biclusters). cMonkey integrates various orthogonal pieces of information which support evidence of gene co-regulation, and optimizes biclusters to be supported simultaneously by one or more of these prior constraints. The algorithm served as the cornerstone for constructing the first global, predictive Environmental Gene Regulatory Influence Network (EGRIN) model for a free-living cell, and has now been applied to many more organisms. However, due to its computational inefficiencies, long run-time and complexity of various input data types, cMonkey was not readily usable by the wider community. To address these primary concerns, we have significantly updated the cMonkey algorithm and refactored its implementation, improving its usability and extendibility. These improvements provide a fully functioning and user-friendly platform for building co-regulated gene modules and the tools necessary for their exploration and interpretation. We show, via three separate analyses of data for E. coli, M. tuberculosis and H. sapiens, that the updated algorithm and inclusion of novel scoring functions for new data types (e.g. ChIP-seq and transcription factor over-expression [TFOE]) improve discovery of biologically informative co-regulated modules. The complete cMonkey2 software package, including source code, is available at https://github.com/baliga-lab/cmonkey2.
Collapse
Affiliation(s)
- David J Reiss
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA 98109, USA
| | | | - Wei-Ju Wu
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA 98109, USA
| | - Nitin S Baliga
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA 98109, USA Department of Microbiology, University of Washington, Seattle, WA 98103, USA
| |
Collapse
|
18
|
Medina-Rivera A, Defrance M, Sand O, Herrmann C, Castro-Mondragon JA, Delerce J, Jaeger S, Blanchet C, Vincens P, Caron C, Staines DM, Contreras-Moreira B, Artufel M, Charbonnier-Khamvongsa L, Hernandez C, Thieffry D, Thomas-Chollier M, van Helden J. RSAT 2015: Regulatory Sequence Analysis Tools. Nucleic Acids Res 2015; 43:W50-6. [PMID: 25904632 PMCID: PMC4489296 DOI: 10.1093/nar/gkv362] [Citation(s) in RCA: 190] [Impact Index Per Article: 21.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2015] [Accepted: 04/07/2015] [Indexed: 11/13/2022] Open
Abstract
RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/.
Collapse
Affiliation(s)
| | - Matthieu Defrance
- Laboratory of Cancer Epigenetics, Université Libre de Bruxelles, Route de Lennik 808, 1070 Brussels, Belgium
| | - Olivier Sand
- CNRS-UMR8199 Institut de Biologie de Lille, Génomique Intégrative et Modélisation des Maladies Métaboliques, 1, rue du Pr Calmette, 59000 Lille, France European Genomic Institute for Diabetes (EGID), F-3508, 59000 Lille, France
| | - Carl Herrmann
- UMR_S 1090 TAGC, INSERM, Marseille, France; Aix-Marseille Université, Marseille, France Institute of Pharmacy and Molecular Biotechnology, and Bioquant Center, University of Heidelberg, Im Neuenheimer Feld 267, Heidelberg 69120, Germany Division of Theoretical Bioinformatics, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | | | - Jeremy Delerce
- UMR_S 1090 TAGC, INSERM, Marseille, France; Aix-Marseille Université, Marseille, France
| | - Sébastien Jaeger
- Centre d'Immunologie de Marseille-Luminy (CIML), Aix-Marseille University, UM2, Marseille, France Institut National de la Santé et de la Recherche Médicale (Inserm), U1104, Marseille, France Centre National de la Recherche Scientifique (CNRS), UMR7280, Marseille, France
| | - Christophe Blanchet
- CNRS, UMS 3601, Institut Français de Bioinformatique, IFB-core, Avenue de la Terrasse, F-91190 Gif-sur-Yvette, France
| | - Pierre Vincens
- Ecole Normale Supérieure, Institut de Biologie de l'ENS, IBENS, Paris, F-75005, France Inserm, U1024, Paris, F-75005, France CNRS, UMR 8197, Paris, F-75005, France
| | - Christophe Caron
- Station Biologique/Service Informatique et Bio-informatique, Place Georges Teissier - CS 90074, 29688 Roscoff Cedex, France
| | - Daniel M Staines
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Bruno Contreras-Moreira
- Estación Experimental de Aula Dei/CSIC, Av. Montañana 1.005, 50059 Zaragoza, Spain Fundación ARAID, calle María de Luna 11, 50018 Zaragoza, Spain
| | - Marie Artufel
- UMR_S 1090 TAGC, INSERM, Marseille, France; Aix-Marseille Université, Marseille, France
| | | | - Céline Hernandez
- Ecole Normale Supérieure, Institut de Biologie de l'ENS, IBENS, Paris, F-75005, France Inserm, U1024, Paris, F-75005, France CNRS, UMR 8197, Paris, F-75005, France
| | - Denis Thieffry
- Ecole Normale Supérieure, Institut de Biologie de l'ENS, IBENS, Paris, F-75005, France Inserm, U1024, Paris, F-75005, France CNRS, UMR 8197, Paris, F-75005, France
| | - Morgane Thomas-Chollier
- Ecole Normale Supérieure, Institut de Biologie de l'ENS, IBENS, Paris, F-75005, France Inserm, U1024, Paris, F-75005, France CNRS, UMR 8197, Paris, F-75005, France
| | - Jacques van Helden
- European Genomic Institute for Diabetes (EGID), F-3508, 59000 Lille, France Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe), Université Libre de Bruxelles, Campus Plaine, CP 263, Bld du Triomphe, B-1050 Bruxelles, Belgium
| |
Collapse
|
19
|
Ting CS, Dusenbury KH, Pryzant RA, Higgins KW, Pang CJ, Black CE, Beauchamp EM. The Prochlorococcus carbon dioxide-concentrating mechanism: evidence of carboxysome-associated heterogeneity. PHOTOSYNTHESIS RESEARCH 2015; 123:45-60. [PMID: 25193505 DOI: 10.1007/s11120-014-0038-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2014] [Accepted: 08/28/2014] [Indexed: 06/03/2023]
Abstract
The ability of Prochlorococcus to numerically dominate open ocean regions and contribute significantly to global carbon cycles is dependent in large part on its effectiveness in transforming light energy into compounds used in cell growth, maintenance, and division. Integral to these processes is the carbon dioxide-concentrating mechanism (CCM), which enhances photosynthetic CO2 fixation. The CCM involves both active uptake systems that permit intracellular accumulation of inorganic carbon as the pool of bicarbonate and the system of HCO3 (-) conversion into CO2. The latter is located in the carboxysome, a microcompartment designed to promote the carboxylase activity of Rubisco. This study presents a comparative analysis of several facets of the Prochlorococcus CCM. Our analyses indicate that a core set of CCM components is shared, and their genomic organization is relatively well conserved. Moreover, certain elements, including carboxysome shell polypeptides CsoS1 and CsoS4A, exhibit striking conservation. Unexpectedly, our analyses reveal that the carbonic anhydrase (CsoSCA) and CsoS2 shell polypeptide have diversified within the lineage. Differences in csoSCA and csoS2 are consistent with a model of unequal rates of evolution rather than relaxed selection. The csoS2 and csoSCA genes form a cluster in Prochlorococcus genomes, and we identified two conserved motifs directly upstream of this cluster that differ from the motif in marine Synechococcus and could be involved in regulation of gene expression. Although several elements of the CCM remain well conserved in the Prochlorococcus lineage, the evolution of differences in specific carboxysome features could in part reflect optimization of carboxysome-associated processes in dissimilar cellular environments.
Collapse
Affiliation(s)
- Claire S Ting
- Department of Biology, Williams College, Thompson Biology Lab 214, Williamstown, MA, 01267, USA,
| | | | | | | | | | | | | |
Collapse
|
20
|
Farazi TA, Leonhardt CS, Mukherjee N, Mihailovic A, Li S, Max KE, Meyer C, Yamaji M, Cekan P, Jacobs NC, Gerstberger S, Bognanni C, Larsson E, Ohler U, Tuschl T. Identification of the RNA recognition element of the RBPMS family of RNA-binding proteins and their transcriptome-wide mRNA targets. RNA (NEW YORK, N.Y.) 2014; 20:1090-102. [PMID: 24860013 PMCID: PMC4114688 DOI: 10.1261/rna.045005.114] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
Recent studies implicated the RNA-binding protein with multiple splicing (RBPMS) family of proteins in oocyte, retinal ganglion cell, heart, and gastrointestinal smooth muscle development. These RNA-binding proteins contain a single RNA recognition motif (RRM), and their targets and molecular function have not yet been identified. We defined transcriptome-wide RNA targets using photoactivatable-ribonucleoside-enhanced crosslinking and immunoprecipitation (PAR-CLIP) in HEK293 cells, revealing exonic mature and intronic pre-mRNA binding sites, in agreement with the nuclear and cytoplasmic localization of the proteins. Computational and biochemical approaches defined the RNA recognition element (RRE) as a tandem CAC trinucleotide motif separated by a variable spacer region. Similar to other mRNA-binding proteins, RBPMS family of proteins relocalized to cytoplasmic stress granules under oxidative stress conditions suggestive of a support function for mRNA localization in large and/or multinucleated cells where it is preferentially expressed.
Collapse
Affiliation(s)
- Thalia A. Farazi
- Laboratory of RNA Molecular Biology, Howard Hughes Medical Institute, The Rockefeller University, New York, New York 10065, USA
| | - Carl S. Leonhardt
- Laboratory of RNA Molecular Biology, Howard Hughes Medical Institute, The Rockefeller University, New York, New York 10065, USA
| | - Neelanjan Mukherjee
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, 13125 Berlin, Germany
| | - Aleksandra Mihailovic
- Laboratory of RNA Molecular Biology, Howard Hughes Medical Institute, The Rockefeller University, New York, New York 10065, USA
| | - Song Li
- Biology Department, Duke University, Durham, North Carolina 27708, USA
| | - Klaas E.A. Max
- Laboratory of RNA Molecular Biology, Howard Hughes Medical Institute, The Rockefeller University, New York, New York 10065, USA
| | - Cindy Meyer
- Laboratory of RNA Molecular Biology, Howard Hughes Medical Institute, The Rockefeller University, New York, New York 10065, USA
| | - Masashi Yamaji
- Laboratory of RNA Molecular Biology, Howard Hughes Medical Institute, The Rockefeller University, New York, New York 10065, USA
| | - Pavol Cekan
- Laboratory of RNA Molecular Biology, Howard Hughes Medical Institute, The Rockefeller University, New York, New York 10065, USA
| | - Nicholas C. Jacobs
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, 13125 Berlin, Germany
| | - Stefanie Gerstberger
- Laboratory of RNA Molecular Biology, Howard Hughes Medical Institute, The Rockefeller University, New York, New York 10065, USA
| | - Claudia Bognanni
- Laboratory of RNA Molecular Biology, Howard Hughes Medical Institute, The Rockefeller University, New York, New York 10065, USA
| | - Erik Larsson
- Institute of Biomedicine, The Sahlgrenska Academy, University of Gothenburg, Gothenburg, SE-405 30, Sweden
| | - Uwe Ohler
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, 13125 Berlin, Germany
| | - Thomas Tuschl
- Laboratory of RNA Molecular Biology, Howard Hughes Medical Institute, The Rockefeller University, New York, New York 10065, USA
- Corresponding authorE-mail
| |
Collapse
|
21
|
Characterization of global gene expression during assurance of lifespan extension by caloric restriction in budding yeast. Exp Gerontol 2013; 48:1455-68. [PMID: 24126084 DOI: 10.1016/j.exger.2013.10.001] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2013] [Revised: 09/28/2013] [Accepted: 10/03/2013] [Indexed: 12/22/2022]
Abstract
Caloric restriction (CR) is the best-studied intervention known to delay aging and extend lifespan in evolutionarily distant organisms ranging from yeast to mammals in the laboratory. Although the effect of CR on lifespan extension has been investigated for nearly 80years, the molecular mechanisms of CR are still elusive. Consequently, it is important to understand the fundamental mechanisms of when and how lifespan is affected by CR. In this study, we first identified the time-windows during which CR assured cellular longevity by switching cells from culture media containing 2% or 0.5% glucose to water, which allows us to observe CR and non-calorically-restricted cells under the same conditions. We also constructed time-dependent gene expression profiles and selected 646 genes that showed significant changes and correlations with the lifespan-extending effect of CR. The positively correlated genes participated in transcriptional regulation, ribosomal RNA processing and nuclear genome stability, while the negatively correlated genes were involved in the regulation of several metabolic pathways, endoplasmic reticulum function, stress response and cell cycle progression. Furthermore, we discovered major upstream regulators of those significantly changed genes, including AZF1 (YOR113W), HSF1 (YGL073W) and XBP1 (YIL101C). Deletions of two genes, AZF1 and XBP1 (HSF1 is essential and was thus not tested), were confirmed to lessen the lifespan extension mediated by CR. The absence of these genes in the tor1Δ and ras2Δ backgrounds did show non-overlapping effects with regard to CLS, suggesting differences between the CR mechanism for Tor and Ras signaling.
Collapse
|
22
|
Darbo E, Herrmann C, Lecuit T, Thieffry D, van Helden J. Transcriptional and epigenetic signatures of zygotic genome activation during early Drosophila embryogenesis. BMC Genomics 2013; 14:226. [PMID: 23560912 PMCID: PMC3706223 DOI: 10.1186/1471-2164-14-226] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2012] [Accepted: 02/28/2013] [Indexed: 01/25/2023] Open
Abstract
Background In all Metazoa, transcription is inactive during the first mitotic cycles after fertilisation. In Drosophila melanogaster, Zygotic Genome Activation (ZGA) occurs in two waves, starting respectively at mitotic cycles 8 (approximately 60 genes) and 14 (over a thousand genes). The regulatory mechanisms underlying these drastic transcriptional changes remain largely unknown. Results We developed an original gene clustering method based on discretized transition profiles, and applied it to datasets from three landmark early embryonic transcriptome studies. We identified 417 genes significantly up-regulated during ZGA. De novo motif discovery returned nine motifs over-represented in their non-coding sequences (upstream, introns, UTR), three of which correspond to previously known transcription factors: Zelda, Tramtrack and Trithorax-like (Trl). The nine discovered motifs were combined to scan ZGA-associated regions and predict about 1300 putative cis-regulatory modules. The fact that Trl is known to act as chromatin remodelling factor suggests that epigenetic regulation might play an important role in zygotic genome activation. We thus systematically compared the locations of predicted CRMs with ChIP-seq profiles for various transcription factors, 38 epigenetic marks from ModENCODE, and DNAse1 accessibility profiles. This analysis highlighted a strong and specific enrichment of predicted ZGA-associated CRMs for Zelda, CBP, Trl binding sites, as well as for histone marks associated with active enhancers (H3K4me1) and for open chromatin regions. Conclusion Based on the results of our computational analyses, we suggest a temporal model explaining the onset of zygotic genome activation by the combined action of transcription factors and epigenetic signals. Although this study is mainly based on the analysis of publicly available transcriptome and ChiP-seq datasets, the resulting model suggests novel mechanisms that underly the coordinated activation of several hundreds genes at a precise time point during embryonic development.
Collapse
Affiliation(s)
- Elodie Darbo
- Technological Advances for Genomics and Clinics (TAGC), INSERM U1090, Université de la Méditerranée, Campus de Luminy, 13288 Marseille Cedex 9, France.
| | | | | | | | | |
Collapse
|
23
|
A complete workflow for the analysis of full-size ChIP-seq (and similar) data sets using peak-motifs. Nat Protoc 2012; 7:1551-68. [DOI: 10.1038/nprot.2012.088] [Citation(s) in RCA: 75] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
|
24
|
Bessoltane N, Toffano-Nioche C, Solignac M, Mougel F. Fine scale analysis of crossover and non-crossover and detection of recombination sequence motifs in the honeybee (Apis mellifera). PLoS One 2012; 7:e36229. [PMID: 22567142 PMCID: PMC3342173 DOI: 10.1371/journal.pone.0036229] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2012] [Accepted: 03/28/2012] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Meiotic exchanges are non-uniformly distributed across the genome of most studied organisms. This uneven distribution suggests that recombination is initiated by specific signals and/or regulations. Some of these signals were recently identified in humans and mice. However, it is unclear whether or not sequence signals are also involved in chromosomal recombination of insects. METHODOLOGY We analyzed recombination frequencies in the honeybee, in which genome sequencing provided a large amount of SNPs spread over the entire set of chromosomes. As the genome sequences were obtained from a pool of haploid males, which were the progeny of a single queen, an oocyte method (study of recombination on haploid males that develop from unfertilized eggs and hence are the direct reflect of female gametes haplotypes) was developed to detect recombined pairs of SNP sites. Sequences were further compared between recombinant and non-recombinant fragments to detect recombination-specific motifs. CONCLUSIONS Recombination events between adjacent SNP sites were detected at an average distance of 92 bp and revealed the existence of high rates of recombination events. This study also shows the presence of conversion without crossover (i. e. non-crossover) events, the number of which largely outnumbers that of crossover events. Furthermore the comparison of sequences that have undergone recombination with sequences that have not, led to the discovery of sequence motifs (CGCA, GCCGC, CCGCA), which may correspond to recombination signals.
Collapse
Affiliation(s)
- Nadia Bessoltane
- Laboratoire Evolution Génomes Spéciation, CNRS, Gif-sur-Yvette, France
- Université Paris-Sud and CNRS, Institut de Génétique et Microbiologie, UMR8621, Orsay, France
| | - Claire Toffano-Nioche
- Université Paris-Sud and CNRS, Institut de Génétique et Microbiologie, UMR8621, Orsay, France
| | - Michel Solignac
- Laboratoire Evolution Génomes Spéciation, CNRS, Gif-sur-Yvette, France
- Université Paris Sud, Orsay, France
| | - Florence Mougel
- Laboratoire Evolution Génomes Spéciation, CNRS, Gif-sur-Yvette, France
- Université Paris Sud, Orsay, France
| |
Collapse
|
25
|
Weber SDS, Sant'Anna FH, Schrank IS. Unveiling Mycoplasma hyopneumoniae promoters: sequence definition and genomic distribution. DNA Res 2012; 19:103-15. [PMID: 22334569 PMCID: PMC3325076 DOI: 10.1093/dnares/dsr045] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Several Mycoplasma species have had their genome completely sequenced, including four strains of the swine pathogen Mycoplasma hyopneumoniae. Nevertheless, little is known about the nucleotide sequences that control transcriptional initiation in these microorganisms. Therefore, with the objective of investigating the promoter sequences of M. hyopneumoniae, 23 transcriptional start sites (TSSs) of distinct genes were mapped. A pattern that resembles the σ70 promoter −10 element was found upstream of the TSSs. However, no −35 element was distinguished. Instead, an AT-rich periodic signal was identified. About half of the experimentally defined promoters contained the motif 5′-TRTGn-3′, which was identical to the −16 element usually found in Gram-positive bacteria. The defined promoters were utilized to build position-specific scoring matrices in order to scan putative promoters upstream of all coding sequences (CDSs) in the M. hyopneumoniae genome. Two hundred and one signals were found associated with 169 CDSs. Most of these sequences were located within 100 nucleotides of the start codons. This study has shown that the number of promoter-like sequences in the M. hyopneumoniae genome is more frequent than expected by chance, indicating that most of the sequences detected are probably biologically functional.
Collapse
Affiliation(s)
- Shana de Souto Weber
- Centro de Biotecnologia, Programa de Pós-graduação em Biologia Celular e Molecular, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, RS, Brazil
| | | | | |
Collapse
|
26
|
Aerts S. Computational strategies for the genome-wide identification of cis-regulatory elements and transcriptional targets. Curr Top Dev Biol 2012; 98:121-45. [PMID: 22305161 DOI: 10.1016/b978-0-12-386499-4.00005-7] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Transcription factors (TFs) are key proteins that decode the information in our genome to express a precise and unique set of proteins and RNA molecules in each cell type in our body. These factors play a pivotal role in all biological processes, including the determination of a cell's fate during development and the maintenance of a cell's physiological function. To achieve this, a TF binds to specific DNA sequences in the noncoding part of the genome, recruits chromatin modifiers and cofactors, and directs the transcription initiation rate of its "target genes." Therefore, a key challenge in deciphering a transcriptional switch is to identify the direct target genes of the master regulators that control the switch, the cis-regulatory elements implementing (auto-)regulatory loops, and the target genes of all the TFs in the downstream regulatory network. A better knowledge of a TF's targetome during specification and differentiation of a particular cell type will generate mechanistic insight into its developmental program. Here, I review computational strategies and methods to predict transcriptional targets by genome-wide searches for TF binding sites using position weight matrices, motif clusters, phylogenetic footprinting, chromatin binding and accessibility data, enhancer classification, motif enrichment, and gene expression signatures.
Collapse
Affiliation(s)
- Stein Aerts
- Laboratory of Computational Biology, Center for Human Genetics, Katholieke Universiteit Leuven, Leuven, Belgium
| |
Collapse
|
27
|
Thomas-Chollier M, Defrance M, Medina-Rivera A, Sand O, Herrmann C, Thieffry D, van Helden J. RSAT 2011: regulatory sequence analysis tools. Nucleic Acids Res 2011; 39:W86-91. [PMID: 21715389 PMCID: PMC3125777 DOI: 10.1093/nar/gkr377] [Citation(s) in RCA: 192] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
RSAT (Regulatory Sequence Analysis Tools) comprises a wide collection of modular tools for the detection of cis-regulatory elements in genome sequences. Thirteen new programs have been added to the 30 described in the 2008 NAR Web Software Issue, including an automated sequence retrieval from EnsEMBL (retrieve-ensembl-seq), two novel motif discovery algorithms (oligo-diff and info-gibbs), a 100-times faster version of matrix-scan enabling the scanning of genome-scale sequence sets, and a series of facilities for random model generation and statistical evaluation (random-genome-fragments, random-motifs, random-sites, implant-sites, sequence-probability, permute-matrix). Our most recent work also focused on motif comparison (compare-matrices) and evaluation of motif quality (matrix-quality) by combining theoretical and empirical measures to assess the predictive capability of position-specific scoring matrices. To process large collections of peak sequences obtained from ChIP-seq or related technologies, RSAT provides a new program (peak-motifs) that combines several efficient motif discovery algorithms to predict transcription factor binding motifs, match them against motif databases and predict their binding sites. Availability (web site, stand-alone programs and SOAP/WSDL (Simple Object Access Protocol/Web Services Description Language) web services): http://rsat.ulb.ac.be/rsat/.
Collapse
Affiliation(s)
- Morgane Thomas-Chollier
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestrasse 73, 14195 Berlin, Germany.
| | | | | | | | | | | | | |
Collapse
|
28
|
Thomas-Chollier M, Hufton A, Heinig M, O'Keeffe S, Masri NE, Roider HG, Manke T, Vingron M. Transcription factor binding predictions using TRAP for the analysis of ChIP-seq data and regulatory SNPs. Nat Protoc 2011; 6:1860-9. [PMID: 22051799 DOI: 10.1038/nprot.2011.409] [Citation(s) in RCA: 176] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The transcription factor affinity prediction (TRAP) method calculates the affinity of transcription factors for DNA sequences on the basis of a biophysical model. This method has proven to be useful for several applications, including for determining the putative target genes of a given factor. This protocol covers two other applications: (i) determining which transcription factors have the highest affinity in a set of sequences (illustrated with chromatin immunoprecipitation-sequencing (ChIP-seq) peaks), and (ii) finding which factor is the most affected by a regulatory single-nucleotide polymorphism. The protocol describes how to use the TRAP web tools to address these questions, and it also presents a way to run TRAP on random control sequences to better estimate the significance of the results. All of the tools are fully available online and do not need any additional installation. The complete protocol takes about 45 min, but each individual tool runs in a few minutes.
Collapse
Affiliation(s)
- Morgane Thomas-Chollier
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany.
| | | | | | | | | | | | | | | |
Collapse
|
29
|
Abstract
The Bacillus thuringiensis temperate phage GIL01 does not integrate into the host chromosome but exists stably as an independent linear replicon within the cell. Similar to that of the lambdoid prophages, the lytic cycle of GIL01 is induced as part of the cellular SOS response to DNA damage. However, no CI-like maintenance repressor has been detected in the phage genome, suggesting that GIL01 uses a novel mechanism to maintain lysogeny. To gain insights into the GIL01 regulatory circuit, we isolated and characterized a set of 17 clear plaque (cp) mutants that are unable to lysogenize. Two phage-encoded proteins, gp1 and gp7, are required for stable lysogen formation. Analysis of cp mutants also identified a 14-bp palindromic dinBox1 sequence within the P1-P2 promoter region that resembles the known LexA-binding site of Gram-positive bacteria. Mutations at conserved positions in dinBox1 result in a cp phenotype. Genomic analysis identified a total of three dinBox sites within GIL01 promoter regions. To investigate the possibility that the host LexA regulates GIL01, phage induction was measured in a host carrying a noncleavable lexA (Ind(-)) mutation. GIL01 formed stable lysogens in this host, but lytic growth could not be induced by treatment with mitomycin C. Also, mitomycin C induced β-galactosidase expression from GIL01-lacZ promoter fusions, and induction was similarly blocked in the lexA (Ind(-)) mutant host. These data support a model in which host LexA binds to dinBox sequences in GIL01, repressing phage gene expression during lysogeny and providing the switch necessary to enter lytic development.
Collapse
|
30
|
Goudot C, Etchebest C, Devaux F, Lelandais G. The reconstruction of condition-specific transcriptional modules provides new insights in the evolution of yeast AP-1 proteins. PLoS One 2011; 6:e20924. [PMID: 21695268 PMCID: PMC3111461 DOI: 10.1371/journal.pone.0020924] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2011] [Accepted: 05/15/2011] [Indexed: 11/19/2022] Open
Abstract
AP-1 proteins are transcription factors (TFs) that belong to the basic leucine zipper family, one of the largest families of TFs in eukaryotic cells. Despite high homology between their DNA binding domains, these proteins are able to recognize diverse DNA motifs. In yeasts, these motifs are referred as YRE (Yap Response Element) and are either seven (YRE-Overlap) or eight (YRE-Adjacent) base pair long. It has been proposed that the AP-1 DNA binding motif preference relies on a single change in the amino acid sequence of the yeast AP-1 TFs (an arginine in the YRE-O binding factors being replaced by a lysine in the YRE-A binding Yaps). We developed a computational approach to infer condition-specific transcriptional modules associated to the orthologous AP-1 protein Yap1p, Cgap1p and Cap1p, in three yeast species: the model yeast Saccharomyces cerevisiae and two pathogenic species Candida glabrata and Candida albicans. Exploitation of these modules in terms of predictions of the protein/DNA regulatory interactions changed our vision of AP-1 protein evolution. Cis-regulatory motif analyses revealed the presence of a conserved adenine in 5' position of the canonical YRE sites. While Yap1p, Cgap1p and Cap1p shared a remarkably low number of target genes, an impressive conservation was observed in the YRE sequences identified by Yap1p and Cap1p. In Candida glabrata, we found that Cgap1p, unlike Yap1p and Cap1p, recognizes YRE-O and YRE-A motifs. These findings were supported by structural data available for the transcription factor Pap1p (Schizosaccharomyces pombe). Thus, whereas arginine and lysine substitutions in Cgap1p and Yap1p proteins were reported as responsible for a specific YRE-O or YRE-A preference, our analyses rather suggest that the ancestral yeast AP-1 protein could recognize both YRE-O and YRE-A motifs and that the arginine/lysine exchange is not the only determinant of the specialization of modern Yaps for one motif or another.
Collapse
Affiliation(s)
- Christel Goudot
- Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), INSERM, U665, Paris, France
- Université Paris Diderot, Sorbonne Paris Cité, UMR-S665, Paris, France
- INTS, Paris, France
| | - Catherine Etchebest
- Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), INSERM, U665, Paris, France
- Université Paris Diderot, Sorbonne Paris Cité, UMR-S665, Paris, France
- INTS, Paris, France
| | - Frédéric Devaux
- Laboratoire de Génomique des Microorganismes, UMR7238 CNRS, Université Pierre et Marie Curie, Paris, France
| | - Gaëlle Lelandais
- Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), INSERM, U665, Paris, France
- Université Paris Diderot, Sorbonne Paris Cité, UMR-S665, Paris, France
- INTS, Paris, France
| |
Collapse
|
31
|
Bartonicek N, Enright AJ. SylArray: a web server for automated detection of miRNA effects from expression data. Bioinformatics 2010; 26:2900-1. [DOI: 10.1093/bioinformatics/btq545] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
|
32
|
Janssen PJ, Van Houdt R, Moors H, Monsieurs P, Morin N, Michaux A, Benotmane MA, Leys N, Vallaeys T, Lapidus A, Monchy S, Médigue C, Taghavi S, McCorkle S, Dunn J, van der Lelie D, Mergeay M. The complete genome sequence of Cupriavidus metallidurans strain CH34, a master survivalist in harsh and anthropogenic environments. PLoS One 2010; 5:e10433. [PMID: 20463976 PMCID: PMC2864759 DOI: 10.1371/journal.pone.0010433] [Citation(s) in RCA: 199] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2010] [Accepted: 03/29/2010] [Indexed: 11/21/2022] Open
Abstract
Many bacteria in the environment have adapted to the presence of toxic heavy metals. Over the last 30 years, this heavy metal tolerance was the subject of extensive research. The bacterium Cupriavidus metallidurans strain CH34, originally isolated by us in 1976 from a metal processing factory, is considered a major model organism in this field because it withstands milli-molar range concentrations of over 20 different heavy metal ions. This tolerance is mostly achieved by rapid ion efflux but also by metal-complexation and -reduction. We present here the full genome sequence of strain CH34 and the manual annotation of all its genes. The genome of C. metallidurans CH34 is composed of two large circular chromosomes CHR1 and CHR2 of, respectively, 3,928,089 bp and 2,580,084 bp, and two megaplasmids pMOL28 and pMOL30 of, respectively, 171,459 bp and 233,720 bp in size. At least 25 loci for heavy-metal resistance (HMR) are distributed over the four replicons. Approximately 67% of the 6,717 coding sequences (CDSs) present in the CH34 genome could be assigned a putative function, and 9.1% (611 genes) appear to be unique to this strain. One out of five proteins is associated with either transport or transcription while the relay of environmental stimuli is governed by more than 600 signal transduction systems. The CH34 genome is most similar to the genomes of other Cupriavidus strains by correspondence between the respective CHR1 replicons but also displays similarity to the genomes of more distantly related species as a result of gene transfer and through the presence of large genomic islands. The presence of at least 57 IS elements and 19 transposons and the ability to take in and express foreign genes indicates a very dynamic and complex genome shaped by evolutionary forces. The genome data show that C. metallidurans CH34 is particularly well equipped to live in extreme conditions and anthropogenic environments that are rich in metals.
Collapse
Affiliation(s)
- Paul J Janssen
- Molecular and Cellular Biology, Belgian Nuclear Research Center SCK*CEN, Mol, Belgium.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Contreras-Moreira B, Sancho J, Angarica VE. Comparison of DNA binding across protein superfamilies. Proteins 2010; 78:52-62. [PMID: 19731374 DOI: 10.1002/prot.22525] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Specific protein-DNA interactions are central to a wide group of processes in the cell and have been studied both experimentally and computationally over the years. Despite the increasing collection of protein-DNA complexes, so far only a few studies have aimed at dissecting the structural characteristics of DNA binding among evolutionarily related proteins. Some questions that remain to be answered are: (a) what is the contribution of the different readout mechanisms in members of a given structural superfamily, (b) what is the degree of interface similarity among superfamily members and how this affects binding specificity, (c) how DNA-binding protein superfamilies distribute across taxa, and (d) is there a general or family-specific code for the recognition of DNA. We have recently developed a straightforward method to dissect the interface of protein-DNA complexes at the atomic level and here we apply it to study 175 proteins belonging to nine representative superfamilies. Our results indicate that evolutionarily unrelated DNA-binding domains broadly conserve specificity statistics, such as the ratio of indirect/direct readout and the frequency of atomic interactions, therefore supporting the existence of a set of recognition rules. It is also found that interface conservation follows trends that are superfamily-specific. Finally, this article identifies tendencies in the phylogenetic distribution of transcription factors, which might be related to the evolution of regulatory networks, and postulates that the modular nature of zinc finger proteins can explain its role in large genomes, as it allows for larger binding interfaces in a single protein molecule.
Collapse
Affiliation(s)
- Bruno Contreras-Moreira
- Estación Experimental de Aula Dei, Consejo Superior de Investigaciones Científicas, Av. Montañana 1.005, Zaragoza, Spain.
| | | | | |
Collapse
|
34
|
Xue Y, Zhou Y, Wu T, Zhu T, Ji X, Kwon YS, Zhang C, Yeo G, Black DL, Sun H, Fu XD, Zhang Y. Genome-wide analysis of PTB-RNA interactions reveals a strategy used by the general splicing repressor to modulate exon inclusion or skipping. Mol Cell 2010; 36:996-1006. [PMID: 20064465 DOI: 10.1016/j.molcel.2009.12.003] [Citation(s) in RCA: 359] [Impact Index Per Article: 25.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2009] [Revised: 08/28/2009] [Accepted: 11/03/2009] [Indexed: 10/20/2022]
Abstract
Recent transcriptome analysis indicates that > 90% of human genes undergo alternative splicing, underscoring the contribution of differential RNA processing to diverse proteomes in higher eukaryotic cells. The polypyrimidine tract-binding protein PTB is a well-characterized splicing repressor, but PTB knockdown causes both exon inclusion and skipping. Genome-wide mapping of PTB-RNA interactions and construction of a functional RNA map now reveal that dominant PTB binding near a competing constitutive splice site generally induces exon inclusion, whereas prevalent binding close to an alternative site often causes exon skipping. This positional effect was further demonstrated by disrupting or creating a PTB-binding site on minigene constructs and testing their responses to PTB knockdown or overexpression. These findings suggest a mechanism for PTB to modulate splice site competition to produce opposite functional consequences, which may be generally applicable to RNA-binding splicing factors to positively or negatively regulate alternative splicing in mammalian cells.
Collapse
|
35
|
Armstrong KR, Chamberlin HM. Coordinate regulation of gene expression in the C. elegans excretory cell by the POU domain protein CEH-6. Mol Genet Genomics 2009; 283:73-87. [PMID: 19921263 DOI: 10.1007/s00438-009-0497-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2009] [Accepted: 10/23/2009] [Indexed: 11/24/2022]
Abstract
Excretory renal organs are critical in animals for osmoregulation and the elimination of waste. Renal organs across a range of species exhibit cellular and molecular similarities. For example, class III POU-homeodomain transcription factors are expressed in the renal organs of many invertebrates and vertebrates. However, the functional role for these factors is not well characterized. To better understand the role of class III POU-homeodomain proteins in animal excretory systems, we have characterized a set of genes expressed in the Caenorhabditis elegans excretory cell, and determined their regulation by the POU-III transcription factor CEH-6. Our molecular and biochemical studies show that CEH-6 regulates a subset of genes expressed in the excretory cell. Additionally, we find that the CEH-6-dependent genes share two molecular features: they contain at least one octamer regulatory element and they encode for transport and channel proteins. This work suggests that a role for POU-III factors in renal organs is to coordinate the expression of a set of functionally related genes.
Collapse
Affiliation(s)
- Kristin R Armstrong
- Department of Molecular Genetics, Ohio State University, 938 Biological Sciences Building, 484 W. 12th Avenue, Columbus, OH 43210, USA
| | | |
Collapse
|
36
|
HOU L, QIAN MP, ZHU YP, DENG MH. Advances on bioinformatic research in transcription factor binding sites. YI CHUAN = HEREDITAS 2009; 31:365-73. [DOI: 10.3724/sp.j.1005.2009.00365] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
37
|
Abstract
Network Analysis Tools (NeAT) is a suite of computer tools that integrate various algorithms for the analysis of biological networks: comparison between graphs, between clusters, or between graphs and clusters; network randomization; analysis of degree distribution; network-based clustering and path finding. The tools are interconnected to enable a stepwise analysis of the network through a complete analytical workflow. In this protocol, we present a typical case of utilization, where the tasks above are combined to decipher a protein-protein interaction network retrieved from the STRING database. The results returned by NeAT are typically subnetworks, networks enriched with additional information (i.e., clusters or paths) or tables displaying statistics. Typical networks comprising several thousands of nodes and arcs can be analyzed within a few minutes. The complete protocol can be read and executed in approximately 1 h.
Collapse
|
38
|
Turatsinze JV, Thomas-Chollier M, Defrance M, van Helden J. Using RSAT to scan genome sequences for transcription factor binding sites and cis-regulatory modules. Nat Protoc 2008; 3:1578-88. [PMID: 18802439 DOI: 10.1038/nprot.2008.97] [Citation(s) in RCA: 201] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
This protocol shows how to detect putative cis-regulatory elements and regions enriched in such elements with the regulatory sequence analysis tools (RSAT) web server (http://rsat.ulb.ac.be/rsat/). The approach applies to known transcription factors, whose binding specificity is represented by position-specific scoring matrices, using the program matrix-scan. The detection of individual binding sites is known to return many false predictions. However, results can be strongly improved by estimating P value, and by searching for combinations of sites (homotypic and heterotypic models). We illustrate the detection of sites and enriched regions with a study case, the upstream sequence of the Drosophila melanogaster gene even-skipped. This protocol is also tested on random control sequences to evaluate the reliability of the predictions. Each task requires a few minutes of computation time on the server. The complete protocol can be executed in about one hour.
Collapse
Affiliation(s)
- Jean-Valery Turatsinze
- Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe), Université Libre de Bruxelles CP 263, Campus Plaine, Boulevard du Triomphe, Bruxelles, Belgium
| | | | | | | |
Collapse
|
39
|
|
40
|
Analyzing multiple data sets by interconnecting RSAT programs via SOAP Web services—an example with ChIP-chip data. Nat Protoc 2008; 3:1604-15. [DOI: 10.1038/nprot.2008.99] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
|