Ma H, Zhong C, Chen D, He H, Yang F. cnnLSV: detecting structural variants by encoding long-read alignment information and convolutional neural network.
BMC Bioinformatics 2023;
24:119. [PMID:
36977976 PMCID:
PMC10045035 DOI:
10.1186/s12859-023-05243-x]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 03/21/2023] [Indexed: 03/30/2023] Open
Abstract
BACKGROUND
Genomic structural variant detection is a significant and challenging issue in genome analysis. The existing long-read based structural variant detection methods still have space for improvement in detecting multi-type structural variants.
RESULTS
In this paper, we propose a method called cnnLSV to obtain detection results with higher quality by eliminating false positives in the detection results merged from the callsets of existing methods. We design an encoding strategy for four types of structural variants to represent long-read alignment information around structural variants into images, input the images into a constructed convolutional neural network to train a filter model, and load the trained model to remove the false positives to improve the detection performance. We also eliminate mislabeled training samples in the training model phase by using principal component analysis algorithm and unsupervised clustering algorithm k-means. Experimental results on both simulated and real datasets show that our proposed method outperforms existing methods overall in detecting insertions, deletions, inversions, and duplications. The program of cnnLSV is available at https://github.com/mhuidong/cnnLSV .
CONCLUSIONS
The proposed cnnLSV can detect structural variants by using long-read alignment information and convolutional neural network to achieve overall higher performance, and effectively eliminate incorrectly labeled samples by using the principal component analysis and k-means algorithms in training model stage.
Collapse