1
|
Sun W, Duan T, Ye P, Chen K, Zhang G, Lai M, Zhang H. TSVdb: a web-tool for TCGA splicing variants analysis. BMC Genomics 2018; 19:405. [PMID: 29843604 PMCID: PMC5975414 DOI: 10.1186/s12864-018-4775-x] [Citation(s) in RCA: 70] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2017] [Accepted: 05/10/2018] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Collaborative projects such as The Cancer Genome Atlas (TCGA) have generated various -omics and clinical data on cancer. Many computational tools have been developed to facilitate the study of the molecular characterization of tumors using data from the TCGA. Alternative splicing of a gene produces splicing variants, and accumulating evidence has revealed its essential role in cancer-related processes, implying the urgent need to discover tumor-specific isoforms and uncover their potential functions in tumorigenesis. RESULT We developed TSVdb, a web-based tool, to explore alternative splicing based on TCGA samples with 30 clinical variables from 33 tumors. TSVdb has an integrated and well-proportioned interface for visualization of the clinical data, gene expression, usage of exons/junctions and splicing patterns. Researchers can interpret the isoform expression variations between or across clinical subgroups and estimate the relationships between isoforms and patient prognosis. TSVdb is available at http://www.tsvdb.com , and the source code is available at https://github.com/wenjie1991/TSVdb . CONCLUSION TSVdb will inspire oncologists and accelerate isoform-level advances in cancer research.
Collapse
Affiliation(s)
- Wenjie Sun
- Department of Pathology, Key Laboratory of Disease Proteomics of Zhejiang Province, School of Medicine, Zhejiang University, Hangzhou, 310058 China
| | - Ting Duan
- Department of Toxicology, School of Medicine, Zhejiang University, Hangzhou, 310058 China
| | - Panmeng Ye
- Hikvision Digital Technology, Hangzhou, 310051 China
| | - Kelie Chen
- Department of Pathology, Key Laboratory of Disease Proteomics of Zhejiang Province, School of Medicine, Zhejiang University, Hangzhou, 310058 China
| | - Guanling Zhang
- Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA USA
| | - Maode Lai
- Department of Pathology, Key Laboratory of Disease Proteomics of Zhejiang Province, School of Medicine, Zhejiang University, Hangzhou, 310058 China
| | - Honghe Zhang
- Department of Pathology, Key Laboratory of Disease Proteomics of Zhejiang Province, School of Medicine, Zhejiang University, Hangzhou, 310058 China
| |
Collapse
|
2
|
Rodrigo-Domingo M, Waagepetersen R, Bødker JS, Falgreen S, Kjeldsen MK, Johnsen HE, Dybkær K, Bøgsted M. Reproducible probe-level analysis of the Affymetrix Exon 1.0 ST array with R/Bioconductor. Brief Bioinform 2014; 15:519-33. [PMID: 23603090 PMCID: PMC4103539 DOI: 10.1093/bib/bbt011] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2012] [Accepted: 02/15/2013] [Indexed: 12/22/2022] Open
Abstract
The presence of different transcripts of a gene across samples can be analysed by whole-transcriptome microarrays. Reproducing results from published microarray data represents a challenge owing to the vast amounts of data and the large variety of preprocessing and filtering steps used before the actual analysis is carried out. To guarantee a firm basis for methodological development where results with new methods are compared with previous results, it is crucial to ensure that all analyses are completely reproducible for other researchers. We here give a detailed workflow on how to perform reproducible analysis of the GeneChip®Human Exon 1.0 ST Array at probe and probeset level solely in R/Bioconductor, choosing packages based on their simplicity of use. To exemplify the use of the proposed workflow, we analyse differential splicing and differential gene expression in a publicly available dataset using various statistical methods. We believe this study will provide other researchers with an easy way of accessing gene expression data at different annotation levels and with the sufficient details needed for developing their own tools for reproducible analysis of the GeneChip®Human Exon 1.0 ST Array.
Collapse
|
3
|
Roy B, Haupt LM, Griffiths LR. Review: Alternative Splicing (AS) of Genes As An Approach for Generating Protein Complexity. Curr Genomics 2013; 14:182-94. [PMID: 24179441 PMCID: PMC3664468 DOI: 10.2174/1389202911314030004] [Citation(s) in RCA: 70] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2012] [Revised: 02/08/2013] [Accepted: 02/25/2013] [Indexed: 12/22/2022] Open
Abstract
Prior to the completion of the human genome project, the human genome was thought to have a greater number of genes as it seemed structurally and functionally more complex than other simpler organisms. This along with the belief of “one gene, one protein”, were demonstrated to be incorrect. The inequality in the ratio of gene to protein formation gave rise to the theory of alternative splicing (AS). AS is a mechanism by which one gene gives rise to multiple protein products. Numerous databases and online bioinformatic tools are available for the detection and analysis of AS. Bioinformatics provides an important approach to study mRNA and protein diversity by various tools such as expressed sequence tag (EST) sequences obtained from completely processed mRNA. Microarrays and deep sequencing approaches also aid in the detection of splicing events. Initially it was postulated that AS occurred only in about 5% of all genes but was later found to be more abundant. Using bioinformatic approaches, the level of AS in human genes was found to be fairly high with 35-59% of genes having at least one AS form. Our ability to determine and predict AS is important as disorders in splicing patterns may lead to abnormal splice variants resulting in genetic diseases. In addition, the diversity of proteins produced by AS poses a challenge for successful drug discovery and therefore a greater understanding of AS would be beneficial.
Collapse
Affiliation(s)
- Bishakha Roy
- Genomics Research Centre, Griffith Health Institute, Griffith University Gold Coast, Queensland 4222, Australia
| | | | | |
Collapse
|
4
|
Kim YJ, Kim HS. Alternative splicing and its impact as a cancer diagnostic marker. Genomics Inform 2012; 10:74-80. [PMID: 23105933 PMCID: PMC3480681 DOI: 10.5808/gi.2012.10.2.74] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2012] [Revised: 05/18/2012] [Accepted: 05/21/2012] [Indexed: 01/13/2023] Open
Abstract
Most genes are processed by alternative splicing for gene expression, resulting in the complexity of the transcriptome in eukaryotes. It allows a limited number of genes to encode various proteins with intricate functions. Alternative splicing is regulated by genetic mutations in cis-regulatory factors and epigenetic events. Furthermore, splicing events occur differently according to cell type, developmental stage, and various diseases, including cancer. Genome instability and flexible proteomes by alternative splicing could affect cancer cells to grow and survive, leading to metastasis. Cancer cells that are transformed by aberrant and uncontrolled mechanisms could produce alternative splicing to maintain and spread them continuously. Splicing variants in various cancers represent crucial roles for tumorigenesis. Taken together, the identification of alternative spliced variants as biomarkers to distinguish between normal and cancer cells could cast light on tumorigenesis.
Collapse
Affiliation(s)
- Yun-Ji Kim
- Department of Biological Sciences, College of Natural Sciences, Pusan National University, Busan 609-735, Korea
| | | |
Collapse
|
5
|
Reddy ASN, Rogers MF, Richardson DN, Hamilton M, Ben-Hur A. Deciphering the plant splicing code: experimental and computational approaches for predicting alternative splicing and splicing regulatory elements. FRONTIERS IN PLANT SCIENCE 2012; 3:18. [PMID: 22645572 PMCID: PMC3355732 DOI: 10.3389/fpls.2012.00018] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/27/2011] [Accepted: 01/18/2012] [Indexed: 05/20/2023]
Abstract
Extensive alternative splicing (AS) of precursor mRNAs (pre-mRNAs) in multicellular eukaryotes increases the protein-coding capacity of a genome and allows novel ways to regulate gene expression. In flowering plants, up to 48% of intron-containing genes exhibit AS. However, the full extent of AS in plants is not yet known, as only a few high-throughput RNA-Seq studies have been performed. As the cost of obtaining RNA-Seq reads continues to fall, it is anticipated that huge amounts of plant sequence data will accumulate and help in obtaining a more complete picture of AS in plants. Although it is not an onerous task to obtain hundreds of millions of reads using high-throughput sequencing technologies, computational tools to accurately predict and visualize AS are still being developed and refined. This review will discuss the tools to predict and visualize transcriptome-wide AS in plants using short-reads and highlight their limitations. Comparative studies of AS events between plants and animals have revealed that there are major differences in the most prevalent types of AS events, suggesting that plants and animals differ in the way they recognize exons and introns. Extensive studies have been performed in animals to identify cis-elements involved in regulating AS, especially in exon skipping. However, few such studies have been carried out in plants. Here, we review the current state of research on splicing regulatory elements (SREs) and briefly discuss emerging experimental and computational tools to identify cis-elements involved in regulation of AS in plants. The availability of curated alternative splice forms in plants makes it possible to use computational tools to predict SREs involved in AS regulation, which can then be verified experimentally. Such studies will permit identification of plant-specific features involved in AS regulation and contribute to deciphering the splicing code in plants.
Collapse
Affiliation(s)
- Anireddy S. N. Reddy
- Program in Molecular Plant Biology, Department of Biology, Colorado State UniversityFort Collins, CO, USA
| | - Mark F. Rogers
- Department of Computer Science, Colorado State UniversityFort Collins, CO, USA
| | - Dale N. Richardson
- Centro de Investigação em Biodiversidade e Recursos Genéticos, University of PortoVairão, Portugal
| | - Michael Hamilton
- Department of Computer Science, Colorado State UniversityFort Collins, CO, USA
| | - Asa Ben-Hur
- Department of Computer Science, Colorado State UniversityFort Collins, CO, USA
- Program in Molecular Plant Biology, Colorado State UniversityFort Collins, CO, USA
| |
Collapse
|
6
|
Abstract
OBJECTIVES AND METHODS Alternative splicing provides proteomic diversity that can have profound effects. The extent, pattern, and roles of alternative splicing in pancreatic cancer have not been systematically investigated. We have utilized a spliceoform-specific microarray and polymerase chain reaction to evaluate all known splice variants in human pancreatic cancer cell lines representing a spectrum of differentiation, from near-normal HPDE6 to Capan-1 and poorly differentiated MiaPaCa2 cells. Validation of altered spliceoforms was verified in primary cancer specimens and normal pancreatic ductal cells. In addition, expression of 92 spliceosomal genes was examined to better understand the mechanism for observed differences in mRNA splicing. RESULTS A statistically significant reduction in alternative splicing was found in the pancreatic cancer cell lines compared with HPDE6 cells. Many splice variants identified in Capan-1 and MiaPaCa2 cells were observed in grades 3 and 4 tumors. Analysis of genes encoding spliceosomal proteins revealed that 28 of 92 genes had significantly decreased expression in cancer compared with normal pancreas. CONCLUSIONS Pancreatic cancer has reduced alternative splicing diversity compared with normal pancreas. This is demonstrated in both cell lines and primary tumors, with the loss in splicing diversity correlated with relative reduction in expression of spliceosomal genes.
Collapse
|
7
|
Support vector machines-based identification of alternative splicing in Arabidopsis thaliana from whole-genome tiling arrays. BMC Bioinformatics 2011; 12:55. [PMID: 21324185 PMCID: PMC3051901 DOI: 10.1186/1471-2105-12-55] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2010] [Accepted: 02/16/2011] [Indexed: 11/15/2022] Open
Abstract
Background Alternative splicing (AS) is a process which generates several distinct mRNA isoforms from the same gene by splicing different portions out of the precursor transcript. Due to the (patho-)physiological importance of AS, a complete inventory of AS is of great interest. While this is in reach for human and mammalian model organisms, our knowledge of AS in plants has remained more incomplete. Experimental approaches for monitoring AS are either based on transcript sequencing or rely on hybridization to DNA microarrays. Among the microarray platforms facilitating the discovery of AS events, tiling arrays are well-suited for identifying intron retention, the most prevalent type of AS in plants. However, analyzing tiling array data is challenging, because of high noise levels and limited probe coverage. Results In this work, we present a novel method to detect intron retentions (IR) and exon skips (ES) from tiling arrays. While statistical tests have typically been proposed for this purpose, our method instead utilizes support vector machines (SVMs) which are appreciated for their accuracy and robustness to noise. Existing EST and cDNA sequences served for supervised training and evaluation. Analyzing a large collection of publicly available microarray and sequence data for the model plant A. thaliana, we demonstrated that our method is more accurate than existing approaches. The method was applied in a genome-wide screen which resulted in the discovery of 1,355 IR events. A comparison of these IR events to the TAIR annotation and a large set of short-read RNA-seq data showed that 830 of the predicted IR events are novel and that 525 events (39%) overlap with either the TAIR annotation or the IR events inferred from the RNA-seq data. Conclusions The method developed in this work expands the scarce repertoire of analysis tools for the identification of alternative mRNA splicing from whole-genome tiling arrays. Our predictions are highly enriched with known AS events and complement the A. thaliana genome annotation with respect to AS. Since all predicted AS events can be precisely attributed to experimental conditions, our work provides a basis for follow-up studies focused on the elucidation of the regulatory mechanisms underlying tissue-specific and stress-dependent AS in plants.
Collapse
|
8
|
Integrating multiple genome annotation databases improves the interpretation of microarray gene expression data. BMC Genomics 2010; 11:50. [PMID: 20089164 PMCID: PMC2827411 DOI: 10.1186/1471-2164-11-50] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2009] [Accepted: 01/20/2010] [Indexed: 02/04/2023] Open
Abstract
Background The Affymetrix GeneChip is a widely used gene expression profiling platform. Since the chips were originally designed, the genome databases and gene definitions have been considerably updated. Thus, more accurate interpretation of microarray data requires parallel updating of the specificity of GeneChip probes. We propose a new probe remapping protocol, using the zebrafish GeneChips as an example, by removing nonspecific probes, and grouping the probes into transcript level probe sets using an integrated zebrafish genome annotation. This genome annotation is based on combining transcript information from multiple databases. This new remapping protocol, especially the new genome annotation, is shown here to be an important factor in improving the interpretation of gene expression microarray data. Results Transcript data from the RefSeq, GenBank and Ensembl databases were downloaded from the UCSC genome browser, and integrated to generate a combined zebrafish genome annotation. Affymetrix probes were filtered and remapped according to the new annotation. The influence of transcript collection and gene definition methods was tested using two microarray data sets. Compared to remapping using a single database, this new remapping protocol results in up to 20% more probes being retained in the remapping, leading to approximately 1,000 more genes being detected. The differentially expressed gene lists are consequently increased by up to 30%. We are also able to detect up to three times more alternative splicing events. A small number of the bioinformatics predictions were confirmed using real-time PCR validation. Conclusions By combining gene definitions from multiple databases, it is possible to greatly increase the numbers of genes and splice variants that can be detected in microarray gene expression experiments.
Collapse
|
9
|
Mojica W, Hawthorn L. Normal colon epithelium: a dataset for the analysis of gene expression and alternative splicing events in colon disease. BMC Genomics 2010; 11:5. [PMID: 20047688 PMCID: PMC2823691 DOI: 10.1186/1471-2164-11-5] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2009] [Accepted: 01/04/2010] [Indexed: 12/22/2022] Open
Abstract
Background Studies using microarray analysis of colorectal cancer have been generally beleaguered by the lack of a normal cell population of the same lineage as the tumor cell. One of the main objectives of this study was to generate a reference gene expression data set for normal colonic epithelium which can be used in comparisons with diseased tissues, as well as to provide a dataset that could be used as a baseline for studies in alternative splicing. Results We present a dependable expression reference data set for non-neoplastic colonic epithelial cells. An enriched population of fresh colon epithelial cells were obtained from non-neoplastic, colectomy specimens and analyzed using Affymetrix GeneChip EXON 1.0 ST arrays. For demonstration purposes, we have compared the data derived from these cells to a publically available set of tumor and matched normal colon data. This analysis allowed an assessment of global gene expression alterations and demonstrated that adjacent normal tissues, with a high degree of cellular heterogeneity, are not always representative of normal cells for comparison to tumors which arise from the colon epithelium. We also examined alternative splicing events in tumors compared to normal colon epithelial cells. Conclusions The findings from this study represent the first comprehensive expression profile for non-neoplastic colonic epithelial cells reported. Our analysis of splice variants illustrate that this is a very labor intensive procedure, requiring vigilant examination of the data. It is projected that the contribution of this set of data derived from pure colonic epithelial cells will enhance studies in colon-related disease and offer a vital baseline for studies aimed at elucidating the mechanisms of alternative splicing.
Collapse
Affiliation(s)
- Wilfrido Mojica
- Molecular Oncology Program, Medical College of Georgia Cancer Center, Augusta, GA, USA
| | | |
Collapse
|
10
|
Bingham JL, Carrigan PE, Miller LJ, Srinivasan S. Extent and diversity of human alternative splicing established by complementary database annotation and microarray analysis. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2008; 12:83-92. [PMID: 18266558 DOI: 10.1089/omi.2007.0041] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Alternative splicing generates functional diversity in higher organisms through alternative first and last exons, skipped and included exons, intron retentions and alternative donor, and acceptor sites. In large-scale microarray studies in humans and the mouse, emphasis so far has been placed on exon-skip events, leaving the prevalence and importance of other splice types largely unexplored. Using a new human splice variant database and a genome-wide microarray to probes thousands of splice events of each type, we measured differential expression of splice types across six pair of diverse cell lines and validated the database annotation process. Results suggest that splicing in humans is more complex than simple exon-skip events, which account for a minority of splicing differences. The relative frequency of differential expression of the splice types correlates with what is found by our annotation efforts. In conclusion, alternative splicing in human cells is considerably more complex than the canonical example of the exon skip. The complementary approaches of genome-wide annotation of alternative splicing in human and design of genome-wide splicing microarrays to measure differential splicing in biological samples provide a powerful high-throughput tool to study the role of alternative splicing in human biology.
Collapse
|
11
|
SPACE: an algorithm to predict and quantify alternatively spliced isoforms using microarrays. Genome Biol 2008; 9:R46. [PMID: 18312629 PMCID: PMC2374713 DOI: 10.1186/gb-2008-9-2-r46] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2007] [Revised: 09/19/2007] [Accepted: 02/29/2008] [Indexed: 12/25/2022] Open
Abstract
Exon and exon+junction microarrays are promising tools for studying alternative splicing. Current analytical tools applied to these arrays lack two relevant features: the ability to predict unknown spliced forms and the ability to quantify the concentration of known and unknown isoforms. SPACE is an algorithm that has been developed to (1) estimate the number of different transcripts expressed under several conditions, (2) predict the precursor mRNA splicing structure and (3) quantify the transcript concentrations including unknown forms. The results presented here show its robustness and accuracy for real and simulated data.
Collapse
|
12
|
A procedure for identifying homologous alternative splicing events. BMC Bioinformatics 2007; 8:260. [PMID: 17640387 PMCID: PMC1950890 DOI: 10.1186/1471-2105-8-260] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2007] [Accepted: 07/19/2007] [Indexed: 01/11/2023] Open
Abstract
Background The study of the functional role of alternative splice isoforms of a gene is a very active area of research in biology. The difficulty of the experimental approach (in particular, in its high-throughput version) leaves ample room for the development of bioinformatics tools that can provide a useful first picture of the problem. Among the possible approaches, one of the simplest is to follow classical protein function annotation protocols and annotate target alternative splice events with the information available from conserved events in other species. However, the application of this protocol requires a procedure capable of recognising such events. Here we present a simple but accurate method developed for this purpose. Results We have developed a method for identifying homologous, or equivalent, alternative splicing events, based on the combined use of neural networks and sequence searches. The procedure comprises four steps: (i) BLAST search for homologues of the two isoforms defining the target alternative splicing event; (ii) construction of all possible candidate events; (iii) scoring of the latter with a series of neural networks; and (iv) filtering of the results. When tested in a set of 473 manually annotated pairs of homologous events, our method showed a good performance, with an accuracy of 0.99, a precision of 0.98 and a sensitivity of 0.93. When no candidates were available, the specificity of our method varied between 0.81 and 0.91. Conclusion The method described in this article allows the identification of homologous alternative splicing events, with a good success rate, indicating that such method could be used for the development of functional annotation of alternative splice isoforms.
Collapse
|
13
|
Technologies for the Global Discovery and Analysis of Alternative Splicing. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2007; 623:64-84. [DOI: 10.1007/978-0-387-77374-2_5] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
|