1
|
Santana-Garcia W, Castro-Mondragon JA, Padilla-Gálvez M, Nguyen NT, Elizondo-Salas A, Ksouri N, Gerbes F, Thieffry D, Vincens P, Contreras-Moreira B, van Helden J, Thomas-Chollier M, Medina-Rivera A. OUP accepted manuscript. Nucleic Acids Res 2022; 50:W670-W676. [PMID: 35544234 PMCID: PMC9252783 DOI: 10.1093/nar/gkac312] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 04/12/2022] [Accepted: 04/20/2022] [Indexed: 11/12/2022] Open
Abstract
RSAT (Regulatory Sequence Analysis Tools) enables the detection and the analysis of cis-regulatory elements in genomic sequences. This software suite performs (i) de novo motif discovery (including from genome-wide datasets like ChIP-seq/ATAC-seq) (ii) genomic sequences scanning with known motifs, (iii) motif analysis (quality assessment, comparisons and clustering), (iv) analysis of regulatory variations and (v) comparative genomics. RSAT comprises 50 tools. Six public Web servers (including a teaching server) are offered to meet the needs of different biological communities. RSAT philosophy and originality are: (i) a multi-modal access depending on the user needs, through web forms, command-line for local installation and programmatic web services, (ii) a support for virtually any genome (animals, bacteria, plants, totalizing over 10 000 genomes directly accessible). Since the 2018 NAR Web Software Issue, we have developed a large REST API, extended the support for additional genomes and external motif collections, enhanced some tools and Web forms, and developed a novel tool that builds or refine gene regulatory networks using motif scanning (network-interactions). The RSAT website provides extensive documentation, tutorials and published protocols. RSAT code is under open-source license and now hosted in GitHub. RSAT is available at http://www.rsat.eu/.
Collapse
Affiliation(s)
| | | | - Mónica Padilla-Gálvez
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, 76230 Santiago de Querétaro, México
| | - Nga Thi Thuy Nguyen
- Institut de biologie de l’Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Université Paris, 75005 Paris, France
| | - Ana Elizondo-Salas
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, 76230 Santiago de Querétaro, México
| | - Najla Ksouri
- Estación Experimental de Aula Dei-CSIC, 50059 Zaragoza, Spain
| | - François Gerbes
- CNRS, Institut Français de Bioinformatique, IFB-core, UMS 3601, Evry, France
| | - Denis Thieffry
- Institut de biologie de l’Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Université Paris, 75005 Paris, France
| | - Pierre Vincens
- Institut de biologie de l’Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Université Paris, 75005 Paris, France
| | | | | | | | | |
Collapse
|
2
|
Inference of plant gene regulatory networks using data-driven methods: A practical overview. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2019; 1863:194447. [PMID: 31678628 DOI: 10.1016/j.bbagrm.2019.194447] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2019] [Revised: 10/08/2019] [Accepted: 10/31/2019] [Indexed: 11/20/2022]
Abstract
Transcriptional regulation is a complex and dynamic process that plays a vital role in plant growth and development. A key component in the regulation of genes is transcription factors (TFs), which coordinate the transcriptional control of gene activity. A gene regulatory network (GRN) is a collection of regulatory interactions between TFs and their target genes. The accurate delineation of GRNs offers a significant contribution to our understanding about how plant cells are organized and function, and how individual genes are regulated in various conditions, organs or cell types. During the past decade, important progress has been made in the identification of GRNs using experimental and computational approaches. However, a detailed overview of available platforms supporting the analysis of GRNs in plants is missing. Here, we review current databases, platforms and tools that perform data-driven analyses of gene regulation in Arabidopsis. The platforms are categorized into two sections, 1) promoter motif analysis tools that use motif mapping approaches to find TF motifs in the regulatory sequences of genes of interest and 2) network analysis tools that identify potential regulators for a set of input genes using a range of data types in order to generate GRNs. We discuss the diverse datasets integrated and highlight the strengths and caveats of different platforms. Finally, we shed light on the limitations of the above approaches and discuss future perspectives, including the need for integrative approaches to unravel complex GRNs in plants.
Collapse
|
3
|
Nguyen NTT, Contreras-Moreira B, Castro-Mondragon JA, Santana-Garcia W, Ossio R, Robles-Espinoza CD, Bahin M, Collombet S, Vincens P, Thieffry D, van Helden J, Medina-Rivera A, Thomas-Chollier M. RSAT 2018: regulatory sequence analysis tools 20th anniversary. Nucleic Acids Res 2019; 46:W209-W214. [PMID: 29722874 PMCID: PMC6030903 DOI: 10.1093/nar/gky317] [Citation(s) in RCA: 129] [Impact Index Per Article: 25.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Accepted: 04/23/2018] [Indexed: 12/27/2022] Open
Abstract
RSAT (Regulatory Sequence Analysis Tools) is a suite of modular tools for the detection and the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, including from genome-wide datasets like ChIP-seq/ATAC-seq, (ii) motif scanning, (iii) motif analysis (quality assessment, comparisons and clustering), (iv) analysis of regulatory variations, (v) comparative genomics. Six public servers jointly support 10 000 genomes from all kingdoms. Six novel or refactored programs have been added since the 2015 NAR Web Software Issue, including updated programs to analyse regulatory variants (retrieve-variation-seq, variation-scan, convert-variations), along with tools to extract sequences from a list of coordinates (retrieve-seq-bed), to select motifs from motif collections (retrieve-matrix), and to extract orthologs based on Ensembl Compara (get-orthologs-compara). Three use cases illustrate the integration of new and refactored tools to the suite. This Anniversary update gives a 20-year perspective on the software suite. RSAT is well-documented and available through Web sites, SOAP/WSDL (Simple Object Access Protocol/Web Services Description Language) web services, virtual machines and stand-alone programs at http://www.rsat.eu/.
Collapse
Affiliation(s)
- Nga Thi Thuy Nguyen
- Institut de biologie de l'Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Université Paris, 75005 Paris, France
| | | | - Jaime A Castro-Mondragon
- Aix-Marseille Univ, INSERM UMR_S 1090, Theory and Approaches of Genome Complexity (TAGC), F-13288 Marseille, France.,Centre for Molecular Medicine Norway (NCMM), Nordic EMBL Partnership, University of Oslo, 0318 Oslo, Norway
| | - Walter Santana-Garcia
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, Santiago de Querétaro 76230, México
| | - Raul Ossio
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, Santiago de Querétaro 76230, México
| | - Carla Daniela Robles-Espinoza
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, Santiago de Querétaro 76230, México.,Experimental Cancer Genetics, The Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Mathieu Bahin
- Institut de biologie de l'Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Université Paris, 75005 Paris, France
| | - Samuel Collombet
- Institut de biologie de l'Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Université Paris, 75005 Paris, France
| | - Pierre Vincens
- Institut de biologie de l'Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Université Paris, 75005 Paris, France
| | - Denis Thieffry
- Institut de biologie de l'Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Université Paris, 75005 Paris, France
| | - Jacques van Helden
- Aix-Marseille Univ, INSERM UMR_S 1090, Theory and Approaches of Genome Complexity (TAGC), F-13288 Marseille, France
| | - Alejandra Medina-Rivera
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Campus Juriquilla, Blvd Juriquilla 3001, Santiago de Querétaro 76230, México
| | - Morgane Thomas-Chollier
- Institut de biologie de l'Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Université Paris, 75005 Paris, France
| |
Collapse
|
4
|
Rioualen C, Charbonnier-Khamvongsa L, Collado-Vides J, van Helden J. Integrating Bacterial ChIP-seq and RNA-seq Data With SnakeChunks. ACTA ACUST UNITED AC 2019; 66:e72. [PMID: 30786165 DOI: 10.1002/cpbi.72] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Next-generation sequencing (NGS) is becoming a routine approach in most domains of the life sciences. To ensure reproducibility of results, there is a crucial need to improve the automation of NGS data processing and enable forthcoming studies relying on big datasets. Although user-friendly interfaces now exist, there remains a strong need for accessible solutions that allow experimental biologists to analyze and explore their results in an autonomous and flexible way. The protocols here describe a modular system that enable a user to compose and fine-tune workflows based on SnakeChunks, a library of rules for the Snakemake workflow engine. They are illustrated using a study combining ChIP-seq and RNA-seq to identify target genes of the global transcription factor FNR in Escherichia coli, which has the advantage that results can be compared with the most up-to-date collection of existing knowledge about transcriptional regulation in this model organism, extracted from the RegulonDB database. © 2019 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Claire Rioualen
- Aix-Marseille University, INSERM, Laboratory of Theory and Approaches of Genome Complexity (TAGC), Marseille, France.,Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, México
| | - Lucie Charbonnier-Khamvongsa
- Aix-Marseille University, INSERM, Laboratory of Theory and Approaches of Genome Complexity (TAGC), Marseille, France
| | - Julio Collado-Vides
- Programa de Genómica Computacional, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, México.,Department of Biomedical Engineering, Boston University, Boston, Massachusetts
| | - Jacques van Helden
- Aix-Marseille University, INSERM, Laboratory of Theory and Approaches of Genome Complexity (TAGC), Marseille, France.,Institut Français de Bioinformatique (IFB), UMS 3601-CNRS, Université Paris-Saclay, Orsay, France
| |
Collapse
|