Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Picardi E, Mignone F, Pesole G. EasyCluster: a fast and efficient gene-oriented clustering tool for large-scale transcriptome data. BMC Bioinformatics 2009;10 Suppl 6:S10. [PMID: 19534735 PMCID: PMC2697633 DOI: 10.1186/1471-2105-10-s6-s10] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open

For:	Picardi E, Mignone F, Pesole G. EasyCluster: a fast and efficient gene-oriented clustering tool for large-scale transcriptome data. BMC Bioinformatics 2009;10 Suppl 6:S10. [PMID: 19534735 PMCID: PMC2697633 DOI: 10.1186/1471-2105-10-s6-s10] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open

Number

Cited by Other Article(s)

Rouhiainen A, Zhao X, Vanttola P, Qian K, Kulesskiy E, Kuja-Panula J, Gransalke K, Grönholm M, Unni E, Meistrich M, Tian L, Auvinen P, Rauvala H. HMGB4 is expressed by neuronal cells and affects the expression of genes involved in neural differentiation. Sci Rep 2016;6:32960. [PMID: 27608812 PMCID: PMC5036535 DOI: 10.1038/srep32960] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2015] [Accepted: 08/18/2016] [Indexed: 12/21/2022] Open

Bevilacqua V, Pietroleonardo N, Giannino E, Stroppa F, Simone D, Pesole G, Picardi E. EasyCluster2: an improved tool for clustering and assembling long transcriptome reads. BMC Bioinformatics 2014;15 Suppl 15:S7. [PMID: 25474441 PMCID: PMC4271567 DOI: 10.1186/1471-2105-15-s15-s7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Expressed sequences (e.g. ESTs) are a strong source of evidence to improve gene structures and predict reliable alternative splicing events. When a genome assembly is available, ESTs are suitable to generate gene-oriented clusters through the well-established EasyCluster software. Nowadays, EST-like sequences can be massively produced using Next Generation Sequencing (NGS) technologies. In order to handle genome-scale transcriptome data, we present here EasyCluster2, a reimplementation of EasyCluster able to speed up the creation of gene-oriented clusters and facilitate downstream analyses as the assembly of full-length transcripts and the detection of splicing isoforms.

RESULTS

EasyCluster2 has been developed to facilitate the genome-based clustering of EST-like sequences generated through the NGS 454 technology. Reads mapped onto the reference genome can be uploaded using the standard GFF3 file format. Alignment parsing is initially performed to produce a first collection of pseudo-clusters by grouping reads according to the overlap of their genomic coordinates on the same strand. EasyCluster2 then refines read grouping by including in each cluster only reads sharing at least one splice site and optionally performs a Smith-Waterman alignment in the region surrounding splice sites in order to correct for potential alignment errors. In addition, EasyCluster2 can include unspliced reads, which generally account for >50% of 454 datasets, and collapses overlapping clusters. Finally, EasyCluster2 can assemble full-length transcripts using a Directed-Acyclic-Graph-based strategy, simplifying the identification of alternative splicing isoforms, thanks also to the implementation of the widespread AStalavista methodology. Accuracy and performances have been tested on real as well as simulated datasets.

CONCLUSIONS

EasyCluster2 represents a unique tool to cluster and assemble transcriptome reads produced with 454 technology, as well as ESTs and full-length transcripts. The clustering procedure is enhanced with the employment of genome annotations and unspliced reads. Overall, EasyCluster2 is able to perform an effective detection of splicing isoforms, since it can refine exon-exon junctions and explore alternative splicing without known reference transcripts. Results in GFF3 format can be browsed in the UCSC Genome Browser. Therefore, EasyCluster2 is a powerful tool to generate reliable clusters for gene expression studies, facilitating the analysis also to researchers not skilled in bioinformatics.

Collapse

Sturgeon XH, Gardiner KJ. RCDA: a highly sensitive and specific alternatively spliced transcript assembly tool featuring upstream consecutive exon structures. Genomics 2012;100:357-62. [PMID: 22971325 PMCID: PMC5470730 DOI: 10.1016/j.ygeno.2012.08.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2012] [Revised: 08/14/2012] [Accepted: 08/14/2012] [Indexed: 01/21/2023]

Ng KH, Ho CK, Phon-Amnuaisuk S. A hybrid distance measure for clustering expressed sequence tags originating from the same gene family. PLoS One 2012;7:e47216. [PMID: 23071763 PMCID: PMC3469558 DOI: 10.1371/journal.pone.0047216] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2012] [Accepted: 09/10/2012] [Indexed: 01/22/2023] Open

Wei D, Jiang Q, Wei Y, Wang S. A novel hierarchical clustering algorithm for gene sequences. BMC Bioinformatics 2012;13:174. [PMID: 22823405 PMCID: PMC3443659 DOI: 10.1186/1471-2105-13-174] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2011] [Accepted: 06/30/2012] [Indexed: 11/10/2022] Open

Hazelhurst S, Lipták Z. KABOOM! A new suffix array based algorithm for clustering expression data. ACTA ACUST UNITED AC 2011;27:3348-55. [PMID: 21984769 DOI: 10.1093/bioinformatics/btr560] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Bao E, Jiang T, Kaloshian I, Girke T. SEED: efficient clustering of next-generation sequences. ACTA ACUST UNITED AC 2011;27:2502-9. [PMID: 21810899 PMCID: PMC3167058 DOI: 10.1093/bioinformatics/btr447] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Abstract

Motivation: Similarity clustering of next-generation sequences (NGS) is an important computational problem to study the population sizes of DNA/RNA molecules and to reduce the redundancies in NGS data. Currently, most sequence clustering algorithms are limited by their speed and scalability, and thus cannot handle data with tens of millions of reads.

Results: Here, we introduce SEED—an efficient algorithm for clustering very large NGS sets. It joins sequences into clusters that can differ by up to three mismatches and three overhanging residues from their virtual center. It is based on a modified spaced seed method, called block spaced seeds. Its clustering component operates on the hash tables by first identifying virtual center sequences and then finding all their neighboring sequences that meet the similarity parameters. SEED can cluster 100 million short read sequences in <4 h with a linear time and memory performance. When using SEED as a preprocessing tool on genome/transcriptome assembly data, it was able to reduce the time and memory requirements of the Velvet/Oasis assembler for the datasets used in this study by 60–85% and 21–41%, respectively. In addition, the assemblies contained longer contigs than non-preprocessed data as indicated by 12–27% larger N50 values. Compared with other clustering tools, SEED showed the best performance in generating clusters of NGS data similar to true cluster results with a 2- to 10-fold better time performance. While most of SEED's utilities fall into the preprocessing area of NGS data, our tests also demonstrate its efficiency as stand-alone tool for discovering clusters of small RNA sequences in NGS data from unsequenced organisms.

Availability: The SEED software can be downloaded for free from this site: http://manuals.bioinformatics.ucr.edu/home/seed.

Contact:thomas.girke@ucr.edu

Supplementary information:Supplementary data are available at Bioinformatics online

Collapse

Rao DM, Moler JC, Ozden M, Zhang Y, Liang C, Karro JE. PEACE: Parallel Environment for Assembly and Clustering of Gene Expression. Nucleic Acids Res 2010;38:W737-42. [PMID: 20522511 PMCID: PMC2896108 DOI: 10.1093/nar/gkq470] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open

D'Elia D, Gisel A, Eriksson NE, Kossida S, Mattila K, Klucar L, Bongcam-Rudloff E. The 20th anniversary of EMBnet: 20 years of bioinformatics for the Life Sciences community. BMC Bioinformatics 2009;10 Suppl 6:S1. [PMID: 19534734 PMCID: PMC2697632 DOI: 10.1186/1471-2105-10-s6-s1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open