Ma J, Zhao X, Qi E, Han R, Yu T, Li G. Highly efficient clustering of long-read transcriptomic data with GeLuster.
Bioinformatics 2024;
40:btae059. [PMID:
38310330 PMCID:
PMC10881092 DOI:
10.1093/bioinformatics/btae059]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 01/08/2024] [Accepted: 01/30/2024] [Indexed: 02/05/2024] Open
Abstract
MOTIVATION
The advancement of long-read RNA sequencing technologies leads to a bright future for transcriptome analysis, in which clustering long reads according to their gene family of origin is of great importance. However, existing de novo clustering algorithms require plenty of computing resources.
RESULTS
We developed a new algorithm GeLuster for clustering long RNA-seq reads. Based on our tests on one simulated dataset and nine real datasets, GeLuster exhibited superior performance. On the tested Nanopore datasets it ran 2.9-17.5 times as fast as the second-fastest method with less than one-seventh of memory consumption, while achieving higher clustering accuracy. And on the PacBio data, GeLuster also had a similar performance. It sets the stage for large-scale transcriptome study in future.
AVAILABILITY AND IMPLEMENTATION
GeLuster is freely available at https://github.com/yutingsdu/GeLuster.
Collapse