51
|
Yu CY, Liu HJ, Hung LY, Kuo HC, Chuang TJ. Is an observed non-co-linear RNA product spliced in trans, in cis or just in vitro? Nucleic Acids Res 2014; 42:9410-23. [PMID: 25053845 PMCID: PMC4132752 DOI: 10.1093/nar/gku643] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Global transcriptome investigations often result in the detection of an enormous number of transcripts composed of non-co-linear sequence fragments. Such ‘aberrant’ transcript products may arise from post-transcriptional events or genetic rearrangements, or may otherwise be false positives (sequencing/alignment errors or in vitro artifacts). Moreover, post-transcriptionally non-co-linear (‘PtNcl’) transcripts can arise from trans-splicing or back-splicing in cis (to generate so-called ‘circular RNA’). Here, we collected previously-predicted human non-co-linear RNA candidates, and designed a validation procedure integrating in silico filters with multiple experimental validation steps to examine their authenticity. We showed that >50% of the tested candidates were in vitro artifacts, even though some had been previously validated by RT-PCR. After excluding the possibility of genetic rearrangements, we distinguished between trans-spliced and circular RNAs, and confirmed that these two splicing forms can share the same non-co-linear junction. Importantly, the experimentally-confirmed PtNcl RNA events and their corresponding PtNcl splicing types (i.e. trans-splicing, circular RNA, or both sharing the same junction) were all expressed in rhesus macaque, and some were even expressed in mouse. Our study thus describes an essential procedure for confirming PtNcl transcripts, and provides further insight into the evolutionary role of PtNcl RNA events, opening up this important, but understudied, class of post-transcriptional events for comprehensive characterization.
Collapse
Affiliation(s)
- Chun-Ying Yu
- Institute of Cellular and Organismic Biology, Academia Sinica, Taipei 11529, Taiwan
| | - Hsiao-Jung Liu
- Institute of Cellular and Organismic Biology, Academia Sinica, Taipei 11529, Taiwan
| | - Li-Yuan Hung
- Division of Physical and Computational Genomics, Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan
| | - Hung-Chih Kuo
- Institute of Cellular and Organismic Biology, Academia Sinica, Taipei 11529, Taiwan
| | - Trees-Juen Chuang
- Division of Physical and Computational Genomics, Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan
| |
Collapse
|
52
|
Sun H, Xing X, Li J, Zhou F, Chen Y, He Y, Li W, Wei G, Chang X, Jia J, Li Y, Xie L. Identification of gene fusions from human lung cancer mass spectrometry data. BMC Genomics 2013; 14 Suppl 8:S5. [PMID: 24564548 PMCID: PMC4042237 DOI: 10.1186/1471-2164-14-s8-s5] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Background Tandem mass spectrometry (MS/MS) technology has been applied to identify proteins, as an ultimate approach to confirm the original genome annotation. To be able to identify gene fusion proteins, a special database containing peptides that cross over gene fusion breakpoints is needed. Methods It is impractical to construct a database that includes all possible fusion peptides originated from potential breakpoints. Focusing on 6259 reported and predicted gene fusion pairs from ChimerDB 2.0 and Cancer Gene Census, we for the first time created a database CanProFu that comprehensively annotates fusion peptides formed by exon-exon linkage between these pairing genes. Results Applying this database to mass spectrometry datasets of 40 human non-small cell lung cancer (NSCLC) samples and 39 normal lung samples with stringent searching criteria, we were able to identify 19 unique fusion peptides characterizing gene fusion events. Among them 11 gene fusion events were only found in NSCLC samples. And also, 4 alternative splicing events were characterized in cancerous or normal lung samples. Conclusions The database and workflow in this work can be flexibly applied to other MS/MS based human cancer experiments to detect gene fusions as potential disease biomarkers or drug targets.
Collapse
|
53
|
Zhao M, Zhao Z. CNVannotator: a comprehensive annotation server for copy number variation in the human genome. PLoS One 2013; 8:e80170. [PMID: 24244640 PMCID: PMC3828214 DOI: 10.1371/journal.pone.0080170] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2013] [Accepted: 10/09/2013] [Indexed: 12/02/2022] Open
Abstract
Copy number variation (CNV) is one of the most prevalent genetic variations in the genome, leading to an abnormal number of copies of moderate to large genomic regions. High-throughput technologies such as next-generation sequencing often identify thousands of CNVs involved in biological or pathological processes. Despite the growing demand to filter and classify CNVs by factors such as frequency in population, biological features, and function, surprisingly, no online web server for CNV annotations has been made available to the research community. Here, we present CNVannotator, a web server that accepts an input set of human genomic positions in a user-friendly tabular format. CNVannotator can perform genomic overlaps of the input coordinates using various functional features, including a list of the reported 356,817 common CNVs, 181,261 disease CNVs, as well as, 140,342 SNPs from genome-wide association studies. In addition, CNVannotator incorporates 2,211,468 genomic features, including ENCODE regulatory elements, cytoband, segmental duplication, genome fragile site, pseudogene, promoter, enhancer, CpG island, and methylation site. For cancer research community users, CNVannotator can apply various filters to retrieve a subgroup of CNVs pinpointed in hundreds of tumor suppressor genes and oncogenes. In total, 5,277,234 unique genomic coordinates with functional features are available to generate an output in a plain text format that is free to download. In summary, we provide a comprehensive web resource for human CNVs. The annotated results along with the server can be accessed at http://bioinfo.mc.vanderbilt.edu/CNVannotator/.
Collapse
Affiliation(s)
- Min Zhao
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Zhongming Zhao
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Department of Cancer Biology, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Department of Psychiatry, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
- * E-mail:
| |
Collapse
|
54
|
Wu CS, Yu CY, Chuang CY, Hsiao M, Kao CF, Kuo HC, Chuang TJ. Integrative transcriptome sequencing identifies trans-splicing events with important roles in human embryonic stem cell pluripotency. Genome Res 2013; 24:25-36. [PMID: 24131564 PMCID: PMC3875859 DOI: 10.1101/gr.159483.113] [Citation(s) in RCA: 83] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Trans-splicing is a post-transcriptional event that joins exons from separate pre-mRNAs. Detection of trans-splicing is usually severely hampered by experimental artifacts and genetic rearrangements. Here, we develop a new computational pipeline, TSscan, which integrates different types of high-throughput long-/short-read transcriptome sequencing of different human embryonic stem cell (hESC) lines to effectively minimize false positives while detecting trans-splicing. Combining TSscan screening with multiple experimental validation steps revealed that most chimeric RNA products were platform-dependent experimental artifacts of RNA sequencing. We successfully identified and confirmed four trans-spliced RNAs, including the first reported trans-spliced large intergenic noncoding RNA (“tsRMST”). We showed that these trans-spliced RNAs were all highly expressed in human pluripotent stem cells and differentially expressed during hESC differentiation. Our results further indicated that tsRMST can contribute to pluripotency maintenance of hESCs by suppressing lineage-specific gene expression through the recruitment of NANOG and the PRC2 complex factor, SUZ12. Taken together, our findings provide important insights into the role of trans-splicing in pluripotency maintenance of hESCs and help to facilitate future studies into trans-splicing, opening up this important but understudied class of post-transcriptional events for comprehensive characterization.
Collapse
Affiliation(s)
- Chan-Shuo Wu
- Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan
| | | | | | | | | | | | | |
Collapse
|
55
|
Wu J, Zhang W, Huang S, He Z, Cheng Y, Wang J, Lam TW, Peng Z, Yiu SM. SOAPfusion: a robust and effective computational fusion discovery tool for RNA-seq reads. Bioinformatics 2013; 29:2971-8. [DOI: 10.1093/bioinformatics/btt522] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
|
56
|
Shugay M, Ortiz de Mendíbil I, Vizmanos JL, Novo FJ. Oncofuse: a computational framework for the prediction of the oncogenic potential of gene fusions. ACTA ACUST UNITED AC 2013; 29:2539-46. [PMID: 23956304 DOI: 10.1093/bioinformatics/btt445] [Citation(s) in RCA: 78] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
MOTIVATION Gene fusions resulting from chromosomal aberrations are an important cause of cancer. The complexity of genomic changes in certain cancer types has hampered the identification of gene fusions by molecular cytogenetic methods, especially in carcinomas. This is changing with the advent of next-generation sequencing, which is detecting a substantial number of new fusion transcripts in individual cancer genomes. However, this poses the challenge of identifying those fusions with greater oncogenic potential amid a background of 'passenger' fusion sequences. RESULTS In the present work, we have used some recently identified genomic hallmarks of oncogenic fusion genes to develop a pipeline for the classification of fusion sequences, namely, Oncofuse. The pipeline predicts the oncogenic potential of novel fusion genes, calculating the probability that a fusion sequence behaves as 'driver' of the oncogenic process based on features present in known oncogenic fusions. Cross-validation and extensive validation tests on independent datasets suggest a robust behavior with good precision and recall rates. We believe that Oncofuse could become a useful tool to guide experimental validation studies of novel fusion sequences found during next-generation sequencing analysis of cancer transcriptomes. AVAILABILITY AND IMPLEMENTATION Oncofuse is a naive Bayes Network Classifier trained and tested using Weka machine learning package. The pipeline is executed by running a Java/Groovy script, available for download at www.unav.es/genetica/oncofuse.html.
Collapse
Affiliation(s)
- Mikhail Shugay
- Department of Genetics, University of Navarra. 31008 Pamplona, Spain
| | | | | | | |
Collapse
|
57
|
Goecks J, Eberhard C, Too T, Nekrutenko A, Taylor J. Web-based visual analysis for high-throughput genomics. BMC Genomics 2013; 14:397. [PMID: 23758618 PMCID: PMC3691752 DOI: 10.1186/1471-2164-14-397] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2013] [Accepted: 05/31/2013] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Visualization plays an essential role in genomics research by making it possible to observe correlations and trends in large datasets as well as communicate findings to others. Visual analysis, which combines visualization with analysis tools to enable seamless use of both approaches for scientific investigation, offers a powerful method for performing complex genomic analyses. However, there are numerous challenges that arise when creating rich, interactive Web-based visualizations/visual analysis applications for high-throughput genomics. These challenges include managing data flow from Web server to Web browser, integrating analysis tools and visualizations, and sharing visualizations with colleagues. RESULTS We have created a platform simplifies the creation of Web-based visualization/visual analysis applications for high-throughput genomics. This platform provides components that make it simple to efficiently query very large datasets, draw common representations of genomic data, integrate with analysis tools, and share or publish fully interactive visualizations. Using this platform, we have created a Circos-style genome-wide viewer, a generic scatter plot for correlation analysis, an interactive phylogenetic tree, a scalable genome browser for next-generation sequencing data, and an application for systematically exploring tool parameter spaces to find good parameter values. All visualizations are interactive and fully customizable. The platform is integrated with the Galaxy (http://galaxyproject.org) genomics workbench, making it easy to integrate new visual applications into Galaxy. CONCLUSIONS Visualization and visual analysis play an important role in high-throughput genomics experiments, and approaches are needed to make it easier to create applications for these activities. Our framework provides a foundation for creating Web-based visualizations and integrating them into Galaxy. Finally, the visualizations we have created using the framework are useful tools for high-throughput genomics experiments.
Collapse
Affiliation(s)
- Jeremy Goecks
- Department of Biology, Emory University, 1510 Clifton Road NE, Atlanta, GA 30322, USA
| | | | | | | | | | | |
Collapse
|
58
|
Frenkel-Morgenstern M, Valencia A. Novel domain combinations in proteins encoded by chimeric transcripts. ACTA ACUST UNITED AC 2013; 28:i67-74. [PMID: 22689780 PMCID: PMC3371848 DOI: 10.1093/bioinformatics/bts216] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Motivation: Chimeric RNA transcripts are generated by different mechanisms including pre-mRNA trans-splicing, chromosomal translocations and/or gene fusions. It was shown recently that at least some of chimeric transcripts can be translated into functional chimeric proteins. Results: To gain a better understanding of the design principles underlying chimeric proteins, we have analyzed 7,424 chimeric RNAs from humans. We focused on the specific domains present in these proteins, comparing their permutations with those of known human proteins. Our method uses genomic alignments of the chimeras, identification of the gene–gene junction sites and prediction of the protein domains. We found that chimeras contain complete protein domains significantly more often than in random data sets. Specifically, we show that eight different types of domains are over-represented among all chimeras as well as in those chimeras confirmed by RNA-seq experiments. Moreover, we discovered that some chimeras potentially encode proteins with novel and unique domain combinations. Given the observed prevalence of entire protein domains in chimeras, we predict that certain putative chimeras that lack activation domains may actively compete with their parental proteins, thereby exerting dominant negative effects. More generally, the production of chimeric transcripts enables a combinatorial increase in the number of protein products available, which may disturb the function of parental genes and influence their protein–protein interaction network. Availability: our scripts are available upon request. Contact:avalencia@cnio.es Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Milana Frenkel-Morgenstern
- Structural Biology and BioComputing Program, Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| | | |
Collapse
|
59
|
Bruno AE, Miecznikowski JC, Qin M, Wang J, Liu S. FUSIM: a software tool for simulating fusion transcripts. BMC Bioinformatics 2013; 14:13. [PMID: 23323884 PMCID: PMC3637076 DOI: 10.1186/1471-2105-14-13] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2012] [Accepted: 01/11/2013] [Indexed: 12/16/2022] Open
Abstract
Background Gene fusions are the result of chromosomal aberrations and encode chimeric RNA (fusion transcripts) that play an important role in cancer genesis. Recent advances in high throughput transcriptome sequencing have given rise to computational methods for new fusion discovery. The ability to simulate fusion transcripts is essential for testing and improving those tools. Results To facilitate this need, we developed FUSIM (FUsion SIMulator), a software tool for simulating fusion transcripts. The simulation of events known to create fusion genes and their resulting chimeric proteins is supported, including inter-chromosome translocation, trans-splicing, complex chromosomal rearrangements, and transcriptional read through events. Conclusions FUSIM provides the ability to assemble a dataset of fusion transcripts useful for testing and benchmarking applications in fusion gene discovery.
Collapse
Affiliation(s)
- Andrew E Bruno
- Department of Biostatistics, SUNY at Buffalo, Buffalo, NY 14214, USA.
| | | | | | | | | |
Collapse
|
60
|
Casado-Vela J, Lacal JC, Elortza F. Protein chimerism: Novel source of protein diversity in humans adds complexity to bottom-up proteomics. Proteomics 2012; 13:5-11. [DOI: 10.1002/pmic.201200371] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2012] [Revised: 10/04/2012] [Accepted: 10/29/2012] [Indexed: 12/20/2022]
Affiliation(s)
- Juan Casado-Vela
- Centro Nacional de Biotecnología. Lab 115. Dpt. Biología Molecular y Celular; Spanish National Research Council (CSIC); 28049 Madrid Spain
| | - Juan Carlos Lacal
- Translational Oncology Unit; Instituto de Investigaciones Biomédicas ‘Alberto Sols’; Spanish National Research Council (CSIC-UAM); Madrid Spain
| | - Felix Elortza
- Proteomics Platform; CIC bioGUNE; CIBERehd, ProteoRed-ISCIII; Technology Park of Bizkaia; Derio Spain
| |
Collapse
|
61
|
Frenkel-Morgenstern M, Gorohovski A, Lacroix V, Rogers M, Ibanez K, Boullosa C, Andres Leon E, Ben-Hur A, Valencia A. ChiTaRS: a database of human, mouse and fruit fly chimeric transcripts and RNA-sequencing data. Nucleic Acids Res 2012; 41:D142-51. [PMID: 23143107 PMCID: PMC3531201 DOI: 10.1093/nar/gks1041] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Chimeric RNAs that comprise two or more different transcripts have been identified in many cancers and among the Expressed Sequence Tags (ESTs) isolated from different organisms; they might represent functional proteins and produce different disease phenotypes. The ChiTaRS database of Chimeric Transcripts and RNA-Sequencing data (http://chitars.bioinfo.cnio.es/) collects more than 16 000 chimeric RNAs from humans, mice and fruit flies, 233 chimeras confirmed by RNA-seq reads and ∼2000 cancer breakpoints. The database indicates the expression and tissue specificity of these chimeras, as confirmed by RNA-seq data, and it includes mass spectrometry results for some human entries at their junctions. Moreover, the database has advanced features to analyze junction consistency and to rank chimeras based on the evidence of repeated junction sites. Finally, ‘Junction Search’ screens through the RNA-seq reads found at the chimeras’ junction sites to identify putative junctions in novel sequences entered by users. Thus, ChiTaRS is an extensive catalog of human, mouse and fruit fly chimeras that will extend our understanding of the evolution of chimeric transcripts in eukaryotes and can be advantageous in the analysis of human cancer breakpoints.
Collapse
Affiliation(s)
- Milana Frenkel-Morgenstern
- Structural Biology and BioComputing Program, Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain
| | | | | | | | | | | | | | | | | |
Collapse
|
62
|
Kalyana-Sundaram S, Shanmugam A, Chinnaiyan AM. Gene Fusion Markup Language: a prototype for exchanging gene fusion data. BMC Bioinformatics 2012; 13:269. [PMID: 23072312 PMCID: PMC3607969 DOI: 10.1186/1471-2105-13-269] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2011] [Accepted: 10/11/2012] [Indexed: 12/26/2022] Open
Abstract
Background An avalanche of next generation sequencing (NGS) studies has generated an unprecedented amount of genomic structural variation data. These studies have also identified many novel gene fusion candidates with more detailed resolution than previously achieved. However, in the excitement and necessity of publishing the observations from this recently developed cutting-edge technology, no community standardization approach has arisen to organize and represent the data with the essential attributes in an interchangeable manner. As transcriptome studies have been widely used for gene fusion discoveries, the current non-standard mode of data representation could potentially impede data accessibility, critical analyses, and further discoveries in the near future. Results Here we propose a prototype, Gene Fusion Markup Language (GFML) as an initiative to provide a standard format for organizing and representing the significant features of gene fusion data. GFML will offer the advantage of representing the data in a machine-readable format to enable data exchange, automated analysis interpretation, and independent verification. As this database-independent exchange initiative evolves it will further facilitate the formation of related databases, repositories, and analysis tools. The GFML prototype is made available at
http://code.google.com/p/gfml-prototype/. Conclusion The Gene Fusion Markup Language (GFML) presented here could facilitate the development of a standard format for organizing, integrating and representing the significant features of gene fusion data in an inter-operable and query-able fashion that will enable biologically intuitive access to gene fusion findings and expedite functional characterization. A similar model is envisaged for other NGS data analyses.
Collapse
Affiliation(s)
- Shanker Kalyana-Sundaram
- Michigan Center for Translational Pathology, Department of Pathology, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | | | | |
Collapse
|
63
|
Frenkel-Morgenstern M, Lacroix V, Ezkurdia I, Levin Y, Gabashvili A, Prilusky J, Del Pozo A, Tress M, Johnson R, Guigo R, Valencia A. Chimeras taking shape: potential functions of proteins encoded by chimeric RNA transcripts. Genome Res 2012; 22:1231-42. [PMID: 22588898 PMCID: PMC3396365 DOI: 10.1101/gr.130062.111] [Citation(s) in RCA: 106] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Chimeric RNAs comprise exons from two or more different genes and have the potential to encode novel proteins that alter cellular phenotypes. To date, numerous putative chimeric transcripts have been identified among the ESTs isolated from several organisms and using high throughput RNA sequencing. The few corresponding protein products that have been characterized mostly result from chromosomal translocations and are associated with cancer. Here, we systematically establish that some of the putative chimeric transcripts are genuinely expressed in human cells. Using high throughput RNA sequencing, mass spectrometry experimental data, and functional annotation, we studied 7424 putative human chimeric RNAs. We confirmed the expression of 175 chimeric RNAs in 16 human tissues, with an abundance varying from 0.06 to 17 RPKM (Reads Per Kilobase per Million mapped reads). We show that these chimeric RNAs are significantly more tissue-specific than non-chimeric transcripts. Moreover, we present evidence that chimeras tend to incorporate highly expressed genes. Despite the low expression level of most chimeric RNAs, we show that 12 novel chimeras are translated into proteins detectable in multiple shotgun mass spectrometry experiments. Furthermore, we confirm the expression of three novel chimeric proteins using targeted mass spectrometry. Finally, based on our functional annotation of exon organization and preserved domains, we discuss the potential features of chimeric proteins with illustrative examples and suggest that chimeras significantly exploit signal peptides and transmembrane domains, which can alter the cellular localization of cognate proteins. Taken together, these findings establish that some chimeric RNAs are translated into potentially functional proteins in humans.
Collapse
|
64
|
Spatial proximity and similarity of the epigenetic state of genome domains. PLoS One 2012; 7:e33947. [PMID: 22496774 PMCID: PMC3319547 DOI: 10.1371/journal.pone.0033947] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2011] [Accepted: 02/19/2012] [Indexed: 11/19/2022] Open
Abstract
Recent studies demonstrate that the organization of the chromatin within the nuclear space might play a crucial role in the regulation of gene expression. The ongoing progress in determination of the 3D structure of the nuclear chromatin allows one to study correlations between spatial proximity of genome domains and their epigenetic state. We combined the data on three-dimensional architecture of the whole human genome with results of high-throughput studies of the chromatin functional state and observed that fragments of different chromosomes that are spatially close tend to have similar patterns of histone modifications, methylation state, DNAse sensitivity, expression level, and chromatin states in general. Moreover, clustering of genome regions by spatial proximity produced compact clusters characterized by the high level of histone modifications and DNAse sensitivity and low methylation level, and loose clusters with the opposite characteristics. We also associated the spatial proximity data with previously detected chimeric transcripts and the results of RNA-seq experiments and observed that the frequency of formation of chimeric transcripts from fragments of two different chromosomes is higher among spatially proximal genome domains. A fair fraction of these chimeric transcripts seems to arise post-transcriptionally via trans-splicing.
Collapse
|
65
|
Mittal VK, McDonald JF. R-SAP: a multi-threading computational pipeline for the characterization of high-throughput RNA-sequencing data. Nucleic Acids Res 2012; 40:e67. [PMID: 22287631 PMCID: PMC3351179 DOI: 10.1093/nar/gks047] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
The rapid expansion in the quantity and quality of RNA-Seq data requires the development of sophisticated high-performance bioinformatics tools capable of rapidly transforming this data into meaningful information that is easily interpretable by biologists. Currently available analysis tools are often not easily installed by the general biologist and most of them lack inherent parallel processing capabilities widely recognized as an essential feature of next-generation bioinformatics tools. We present here a user-friendly and fully automated RNA-Seq analysis pipeline (R-SAP) with built-in multi-threading capability to analyze and quantitate high-throughput RNA-Seq datasets. R-SAP follows a hierarchical decision making procedure to accurately characterize various classes of transcripts and achieves a near linear decrease in data processing time as a result of increased multi-threading. In addition, RNA expression level estimates obtained using R-SAP display high concordance with levels measured by microarrays.
Collapse
Affiliation(s)
- Vinay K Mittal
- School of Biology, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | | |
Collapse
|
66
|
Suvorova YM, Rudenko VM, Korotkov EV. Detection change points of triplet periodicity of gene. Gene 2011; 491:58-64. [PMID: 21982972 DOI: 10.1016/j.gene.2011.08.032] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2011] [Revised: 08/10/2011] [Accepted: 08/25/2011] [Indexed: 10/17/2022]
Abstract
The triplet periodicity (TP) is a distinguished property of protein coding sequences. There are complex genes with more than one TP type along their sequence. We say that these genes contain a triplet periodicity change point. The aim of the work is to find all genes that contain TP change point and attempt to compare the positions of change point in genes with known biological data. We have developed a mathematical method to identify triplet periodicity changes along a sequence. We have found 311,221 genes with the TP change point in the KEGG/Genes database (version 48). It is about 8% from the total database volume (4013150). We showed that the repetitive sequences are not the only cause of such events. We suppose that the TP change point may indicate a fusion of genes or domains. We performed BLAST analysis to find potential ancestral genes for the parts of genes with TP change point. As a result we found that in 131323 cases sequences with TP change point have proper similarities for one or both parts. The relationship between TP change point and the fusion events in genes is discussed. The program realization of the method is available by request to authors.
Collapse
Affiliation(s)
- Yulia M Suvorova
- Bioinfomatics Laboratory, Centre of Bioengineering, Russian Academy of Sciences, 117312, Moscow, Prospect 60-tya Oktyabrya, 7/1, Russia.
| | | | | |
Collapse
|
67
|
Kong F, Zhu J, Wu J, Peng J, Wang Y, Wang Q, Fu S, Yuan LL, Li T. dbCRID: a database of chromosomal rearrangements in human diseases. Nucleic Acids Res 2010; 39:D895-900. [PMID: 21051346 PMCID: PMC3013658 DOI: 10.1093/nar/gkq1038] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Chromosomal rearrangement (CR) events result from abnormal breaking and rejoining of the DNA molecules, or from crossing-over between repetitive DNA sequences, and they are involved in many tumor and non-tumor diseases. Investigations of disease-associated CR events can not only lead to important discoveries about DNA breakage and repair mechanisms, but also offer important clues about the pathologic causes and the diagnostic/therapeutic targets of these diseases. We have developed a database of Chromosomal Rearrangements In Diseases (dbCRID, http://dbCRID.biolead.org), a comprehensive database of human CR events and their associated diseases. For each reported CR event, dbCRID documents the type of the event, the disease or symptoms associated, and--when possible--detailed information about the CR event including precise breakpoint positions, junction sequences, genes and gene regions disrupted and experimental techniques applied to discover/analyze the CR event. With 2643 records of disease-associated CR events curated from 1172 original studies, dbCRID is a comprehensive and dynamic resource useful for studying DNA breakage and repair mechanisms, and for analyzing the genetic basis of human tumor and non-tumor diseases.
Collapse
Affiliation(s)
- Fanlou Kong
- Biolead.org Research Group, LC Sciences, Houston, TX 77054, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
68
|
Abstract
In this Perspective, we discuss a paper in this issue of Science Translational Medicine, in which Leary and colleagues present a new method based on massive, parallel, and near-complete sequencing of individual tumor genomes. Their findings support the notion that cancer genomes house a spectrum of genetic alterations, many of which are unique to the individual tumor. More validation and a reduction in cost are required for this approach to become common in clinics.
Collapse
Affiliation(s)
- Ludmila Prokunina-Olsson
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20892-4605, USA
| | | |
Collapse
|
69
|
Robison K. Application of second-generation sequencing to cancer genomics. Brief Bioinform 2010; 11:524-34. [DOI: 10.1093/bib/bbq013] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
|