1
|
Fernandes M, Mario de Andrade E, Reis da Silva SG, Romagnoli VDS, Ortega JM, Antônio de Oliveira Mendes T. Geneapp: A web application for visualizing alternative splicing for biomedicine. Comput Biol Med 2024; 178:108789. [PMID: 38936077 DOI: 10.1016/j.compbiomed.2024.108789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 06/17/2024] [Accepted: 06/18/2024] [Indexed: 06/29/2024]
Abstract
Alternative Splicing (AS) is an essential mechanism for eukaryotes. However, the consequences of deleting a single exon can be dramatic for the organism and can lead to cancer in humans. Additionally, alternative 5' and 3' splice sites, which define the boundaries of exons, also play key roles to human disorders. Therefore, Investigating AS events is crucial for understanding the molecular basis of human diseases and developing therapeutic strategies. Workflow for AS event analysis can be sampling followed by data analysis with bioinformatics to identify the different AS events in the control and case samples, data visualization for curation, and selection of relevant targets for experimental validation. The raw output of the analysis software does not favor the inspection of events by bioinformaticians requiring custom scripts for data visualization. In this work, we propose the Geneapp application with three modules: GeneappScript, GeneappServer, and GeneappExplorer. GeneappScript is a wrapper that assists in identifying AS in samples compared in two different approaches, while GeneappServer integrates data from AS analysis already performed by the user. In GeneappExplorer, the user visualizes the previous dataset by exploring AS events in genes with functional annotation. This targeted screens that Geneapp allows to perform helps in the identification of targets for experimental validation to confirm the hypotheses under study. The Geneapp is freely available for non-commercial use at https://geneapp.net to advance research on AS for bioinformatics.
Collapse
Affiliation(s)
- Miquéias Fernandes
- Postgraduation Program in Bioinformatics, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil; Department of Biochemistry and Molecular Biology, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil; Institute of Applied Biotechnology to Agriculture (BIOAGRO), Universidade Federal de Viçosa, Minas Gerais, Brazil.
| | - Edson Mario de Andrade
- Postgraduation Program in Bioinformatics, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil; Department of Biochemistry and Molecular Biology, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil; Institute of Applied Biotechnology to Agriculture (BIOAGRO), Universidade Federal de Viçosa, Minas Gerais, Brazil
| | - Saymon Gazolla Reis da Silva
- Department of Biochemistry and Molecular Biology, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil; Institute of Applied Biotechnology to Agriculture (BIOAGRO), Universidade Federal de Viçosa, Minas Gerais, Brazil
| | - Vinícius Dos Santos Romagnoli
- Department of Biochemistry and Molecular Biology, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil; Institute of Applied Biotechnology to Agriculture (BIOAGRO), Universidade Federal de Viçosa, Minas Gerais, Brazil
| | - José Miguel Ortega
- Postgraduation Program in Bioinformatics, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Tiago Antônio de Oliveira Mendes
- Postgraduation Program in Bioinformatics, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil; Department of Biochemistry and Molecular Biology, Universidade Federal de Viçosa, Viçosa, Minas Gerais, Brazil; Institute of Applied Biotechnology to Agriculture (BIOAGRO), Universidade Federal de Viçosa, Minas Gerais, Brazil.
| |
Collapse
|
2
|
Bar N, Nikparvar B, Jayavelu ND, Roessler FK. Constrained Fourier estimation of short-term time-series gene expression data reduces noise and improves clustering and gene regulatory network predictions. BMC Bioinformatics 2022; 23:330. [PMID: 35945515 PMCID: PMC9364503 DOI: 10.1186/s12859-022-04839-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 07/12/2022] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND Biological data suffers from noise that is inherent in the measurements. This is particularly true for time-series gene expression measurements. Nevertheless, in order to to explore cellular dynamics, scientists employ such noisy measurements in predictive and clustering tools. However, noisy data can not only obscure the genes temporal patterns, but applying predictive and clustering tools on noisy data may yield inconsistent, and potentially incorrect, results. RESULTS To reduce the noise of short-term (< 48 h) time-series expression data, we relied on the three basic temporal patterns of gene expression: waves, impulses and sustained responses. We constrained the estimation of the true signals to these patterns by estimating the parameters of first and second-order Fourier functions and using the nonlinear least-squares trust-region optimization technique. Our approach lowered the noise in at least 85% of synthetic time-series expression data, significantly more than the spline method ([Formula: see text]). When the data contained a higher signal-to-noise ratio, our method allowed downstream network component analyses to calculate consistent and accurate predictions, particularly when the noise variance was high. Conversely, these tools led to erroneous results from untreated noisy data. Our results suggest that at least 5-7 time points are required to efficiently de-noise logarithmic scaled time-series expression data. Investing in sampling additional time points provides little benefit to clustering and prediction accuracy. CONCLUSIONS Our constrained Fourier de-noising method helps to cluster noisy gene expression and interpret dynamic gene networks more accurately. The benefit of noise reduction is large and can constitute the difference between a successful application and a failing one.
Collapse
Affiliation(s)
- Nadav Bar
- grid.5947.f0000 0001 1516 2393Department of Chemical Engineering, Norwegian University of Science and Technology (NTNU), Sem Sælandsvei 4, Trondheim, NO-7491 Norway
| | - Bahareh Nikparvar
- grid.5947.f0000 0001 1516 2393Department of Chemical Engineering, Norwegian University of Science and Technology (NTNU), Sem Sælandsvei 4, Trondheim, NO-7491 Norway
| | - Naresh Doni Jayavelu
- grid.34477.330000000122986657Division of Medical Genetics, Department of Medicine, University of Washington Seattle, Seattle, WA 98195-7720 USA
| | - Fabienne Krystin Roessler
- grid.5947.f0000 0001 1516 2393Department of Chemical Engineering, Norwegian University of Science and Technology (NTNU), Sem Sælandsvei 4, Trondheim, NO-7491 Norway
| |
Collapse
|
3
|
Liang Y, Kelemen A. Dynamic modeling and network approaches for omics time course data: overview of computational approaches and applications. Brief Bioinform 2019; 19:1051-1068. [PMID: 28430854 DOI: 10.1093/bib/bbx036] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Indexed: 12/23/2022] Open
Abstract
Inferring networks and dynamics of genes, proteins, cells and other biological entities from high-throughput biological omics data is a central and challenging issue in computational and systems biology. This is essential for understanding the complexity of human health, disease susceptibility and pathogenesis for Predictive, Preventive, Personalized and Participatory (P4) system and precision medicine. The delineation of the possible interactions of all genes/proteins in a genome/proteome is a task for which conventional experimental techniques are ill suited. Urgently needed are rapid and inexpensive computational and statistical methods that can identify interacting candidate disease genes or drug targets out of thousands that can be further investigated or validated by experimentations. Moreover, identifying biological dynamic systems, and simultaneously estimating the important kinetic structural and functional parameters, which may not be experimentally accessible could be important directions for drug-disease-gene network studies. In this article, we present an overview and comparison of recent developments of dynamic modeling and network approaches for time-course omics data, and their applications to various biological systems, health conditions and disease statuses. Moreover, various data reduction and analytical schemes ranging from mathematical to computational to statistical methods are compared including their merits, drawbacks and limitations. The most recent software, associated web resources and other potentials for the compared methods are also presented and discussed in detail.
Collapse
Affiliation(s)
- Yulan Liang
- Department of Family and Community Health, University of Maryland, Baltimore, MD, USA
| | - Arpad Kelemen
- Department of Family and Community Health, University of Maryland, Baltimore, MD, USA
| |
Collapse
|
4
|
Abstract
Single-cell RNA-seq (scRNA-seq) provides a comprehensive measurement of stochasticity in transcription, but the limitations of the technology have prevented its application to dissect variability in RNA processing events such as splicing. In this chapter, we review the challenges in splicing isoform quantification in scRNA-seq data and discuss BRIE (Bayesian regression for isoform estimation), a recently proposed Bayesian hierarchical model which resolves these problems by learning an informative prior distribution from sequence features. We illustrate the usage of BRIE with a case study on 130 mouse cells during gastrulation.
Collapse
|
5
|
Nueda MJ, Martorell-Marugan J, Martí C, Tarazona S, Conesa A. Identification and visualization of differential isoform expression in RNA-seq time series. Bioinformatics 2018; 34:524-526. [PMID: 28968682 PMCID: PMC5860359 DOI: 10.1093/bioinformatics/btx578] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Accepted: 09/12/2017] [Indexed: 11/14/2022] Open
Abstract
Motivation As sequencing technologies improve their capacity to detect distinct transcripts of the same gene and to address complex experimental designs such as longitudinal studies, there is a need to develop statistical methods for the analysis of isoform expression changes in time series data. Results Iso-maSigPro is a new functionality of the R package maSigPro for transcriptomics time series data analysis. Iso-maSigPro identifies genes with a differential isoform usage across time. The package also includes new clustering and visualization functions that allow grouping of genes with similar expression patterns at the isoform level, as well as those genes with a shift in major expressed isoform. Availability and implementation The package is freely available under the LGPL license from the Bioconductor web site. Contact mj.nueda@ua.es or aconesa@ufl.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- María José Nueda
- Mathematics Department, University of Alicante, Alicante 03690, Spain
| | - Jordi Martorell-Marugan
- Genomics of Gene Expression Laboratory, Centro de Investigación Príncipe Felipe, Valencia 42012, Spain
| | - Cristina Martí
- Genomics of Gene Expression Laboratory, Centro de Investigación Príncipe Felipe, Valencia 42012, Spain
| | - Sonia Tarazona
- Genomics of Gene Expression Laboratory, Centro de Investigación Príncipe Felipe, Valencia 42012, Spain.,Applied Statistics, Operational Research and Quality Department, Politechnic University of Valencia, Valencia 46020, Spain
| | - Ana Conesa
- Genomics of Gene Expression Laboratory, Centro de Investigación Príncipe Felipe, Valencia 42012, Spain.,Microbiology and Cell Science Department, Institute for Food and Agricultural Research, University of Florida, FL 32611, USA
| |
Collapse
|
6
|
Transcription rate strongly affects splicing fidelity and cotranscriptionality in budding yeast. Genome Res 2017; 28:203-213. [PMID: 29254943 PMCID: PMC5793784 DOI: 10.1101/gr.225615.117] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Accepted: 12/14/2017] [Indexed: 01/24/2023]
Abstract
The functional consequences of alternative splicing on altering the transcription rate have been the subject of intensive study in mammalian cells but less is known about effects of splicing on changing the transcription rate in yeast. We present several lines of evidence showing that slow RNA polymerase II elongation increases both cotranscriptional splicing and splicing efficiency and that faster elongation reduces cotranscriptional splicing and splicing efficiency in budding yeast, suggesting that splicing is more efficient when cotranscriptional. Moreover, we demonstrate that altering the RNA polymerase II elongation rate in either direction compromises splicing fidelity, and we reveal that splicing fidelity depends largely on intron length together with secondary structure and splice site score. These effects are notably stronger for the highly expressed ribosomal protein coding transcripts. We propose that transcription by RNA polymerase II is tuned to optimize the efficiency and accuracy of ribosomal protein gene expression, while allowing flexibility in splice site choice with the nonribosomal protein transcripts.
Collapse
|
7
|
Huang Y, Sanguinetti G. BRIE: transcriptome-wide splicing quantification in single cells. Genome Biol 2017; 18:123. [PMID: 28655331 PMCID: PMC5488362 DOI: 10.1186/s13059-017-1248-5] [Citation(s) in RCA: 58] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2017] [Accepted: 05/30/2017] [Indexed: 11/12/2022] Open
Abstract
Single-cell RNA-seq (scRNA-seq) provides a comprehensive measurement of stochasticity in transcription, but the limitations of the technology have prevented its application to dissect variability in RNA processing events such as splicing. Here, we present BRIE (Bayesian regression for isoform estimation), a Bayesian hierarchical model that resolves these problems by learning an informative prior distribution from sequence features. We show that BRIE yields reproducible estimates of exon inclusion ratios in single cells and provides an effective tool for differential isoform quantification between scRNA-seq data sets. BRIE, therefore, expands the scope of scRNA-seq experiments to probe the stochasticity of RNA processing.
Collapse
Affiliation(s)
- Yuanhua Huang
- School of Informatics, University of Edinburgh, Edinburgh, EH8 9AB, UK
| | - Guido Sanguinetti
- School of Informatics, University of Edinburgh, Edinburgh, EH8 9AB, UK. .,Centre for Synthetic and Systems Biology (SynthSys), University of Edinburgh, Edinburgh, EH9 3BF, UK.
| |
Collapse
|
8
|
Wallace EWJ, Beggs JD. Extremely fast and incredibly close: cotranscriptional splicing in budding yeast. RNA (NEW YORK, N.Y.) 2017; 23:601-610. [PMID: 28153948 PMCID: PMC5393171 DOI: 10.1261/rna.060830.117] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
RNA splicing, an essential part of eukaryotic pre-messenger RNA processing, can be simultaneous with transcription by RNA polymerase II. Here, we compare and review independent next-generation sequencing methods that jointly quantify transcription and splicing in budding yeast. For many yeast transcripts, splicing is fast, taking place within seconds of intron transcription, while polymerase is within a few dozens of nucleotides of the 3' splice site. Ribosomal protein transcripts are spliced particularly fast and cotranscriptionally. However, some transcripts are spliced inefficiently or mainly post-transcriptionally. Intron-mediated regulation of some genes is likely to be cotranscriptional. We suggest that intermediates of the splicing reaction, missing from current data sets, may hold key information about splicing kinetics.
Collapse
Affiliation(s)
- Edward W J Wallace
- School of Informatics, University of Edinburgh, EH8 9AB, United Kingdom
- Wellcome Trust Centre for Cell Biology, University of Edinburgh, EH9 3BF, United Kingdom
| | - Jean D Beggs
- Wellcome Trust Centre for Cell Biology, University of Edinburgh, EH9 3BF, United Kingdom
| |
Collapse
|