Lisiecka A, Dojer N. Linearization of genome sequence graphs revisited.
iScience 2021;
24:102755. [PMID:
34278263 PMCID:
PMC8264155 DOI:
10.1016/j.isci.2021.102755]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 05/21/2021] [Accepted: 06/15/2021] [Indexed: 11/28/2022] Open
Abstract
The need to include the genetic variation within a population into a reference genome led to the concept of a genome sequence graph. Nodes of such a graph are labeled with DNA sequences occurring in represented genomes. Due to double-stranded nature of DNA, each node may be oriented in one of two possible ways, resulting in marking one end of the labeling sequence as in-side and the other as out-side. Edges join pairs of sides and reflect adjacency between node sequences in genomes constituting the graph. Linearization of a sequence graph aims at orienting and ordering graph nodes in a way that makes it more efficient for visualization and further analysis, e.g. access and traversal. We propose a new linearization algorithm, called ALIBI – Algorithm for Linearization by Incremental graph BuIlding. The evaluation shows that ALIBI is computationally very efficient and generates high-quality results.
We propose ALIBI – a new algorithm for linearization of genome sequence graphs
ALIBI yields less feedback arcs and reversing joins than competing methods
ALIBI shows high efficiency and scales well to large graphs
Collapse