1
|
Stevenson J, Terauds V, Sumner J. Rearrangement Events on Circular Genomes. Bull Math Biol 2023; 85:107. [PMID: 37749280 PMCID: PMC10520144 DOI: 10.1007/s11538-023-01209-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 08/31/2023] [Indexed: 09/27/2023]
Abstract
Early literature on genome rearrangement modelling views the problem of computing evolutionary distances as an inherently combinatorial one. In particular, attention is given to estimating distances using the minimum number of events required to transform one genome into another. In hindsight, this approach is analogous to early methods for inferring phylogenetic trees from DNA sequences such as maximum parsimony-both are motivated by the principle that the true distance minimises evolutionary change, and both are effective if this principle is a true reflection of reality. Recent literature considers genome rearrangement under statistical models, continuing this parallel with DNA-based methods, with the goal of using model-based methods (for example maximum likelihood techniques) to compute distance estimates that incorporate the large number of rearrangement paths that can transform one genome into another. Crucially, this approach requires one to decide upon a set of feasible rearrangement events and, in this paper, we focus on characterising well-motivated models for signed, uni-chromosomal circular genomes, where the number of regions remains fixed. Since rearrangements are often mathematically described using permutations, we isolate the sets of permutations representing rearrangements that are biologically reasonable in this context, for example inversions and transpositions. We provide precise mathematical expressions for these rearrangements, and then describe them in terms of the set of cuts made in the genome when they are applied. We directly compare cuts to breakpoints, and use this concept to count the distinct rearrangement actions which apply a given number of cuts. Finally, we provide some examples of rearrangement models, and include a discussion of some questions that arise when defining plausible models.
Collapse
Affiliation(s)
| | - Venta Terauds
- University of Tasmania, Hobart, Australia
- University of South Australia, Adelaide, Australia
| | | |
Collapse
|
2
|
Clark C, Jonušas J, Mitchell JD, Francis A. An algebraic model for inversion and deletion in bacterial genome rearrangement. J Math Biol 2023; 87:34. [PMID: 37517046 PMCID: PMC10387463 DOI: 10.1007/s00285-023-01965-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Revised: 07/03/2023] [Accepted: 07/08/2023] [Indexed: 08/01/2023]
Abstract
Inversions, also sometimes called reversals, are a major contributor to variation among bacterial genomes, with studies suggesting that those involving small numbers of regions are more likely than larger inversions. Deletions may arise in bacterial genomes through the same biological mechanism as inversions, and hence a model that incorporates both is desirable. However, while inversion distances between genomes have been well studied, there has yet to be a model which accounts for the combination of both deletions and inversions. To account for both of these operations, we introduce an algebraic model that utilises partial permutations. This leads to an algorithm for calculating the minimum distance to the most recent common ancestor of two bacterial genomes evolving by inversions (of adjacent regions) and deletions. The algebraic model makes the existing short inversion models more complete and realistic by including deletions, and also introduces new algebraic tools into evolutionary distance problems.
Collapse
Affiliation(s)
- Chad Clark
- Centre for Research in Mathematics and Data Science, Western Sydney University, Penrith, NSW, Australia.
| | - Julius Jonušas
- Mathematical Institute, School of Mathematics and Statistics, University of St Andrews, St Andrews, UK
| | - James D Mitchell
- Mathematical Institute, School of Mathematics and Statistics, University of St Andrews, St Andrews, UK
| | - Andrew Francis
- Centre for Research in Mathematics and Data Science, Western Sydney University, Penrith, NSW, Australia
| |
Collapse
|
3
|
Terauds V, Sumner J. A new algebraic approach to genome rearrangement models. J Math Biol 2022; 84:49. [PMID: 35508785 PMCID: PMC9068684 DOI: 10.1007/s00285-022-01744-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 01/24/2022] [Accepted: 03/31/2022] [Indexed: 11/24/2022]
Abstract
We present a unified framework for modelling genomes and their rearrangements in a genome algebra, as elements that simultaneously incorporate all physical symmetries. Building on previous work utilising the group algebra of the symmetric group, we explicitly construct the genome algebra for the case of unsigned circular genomes with dihedral symmetry and show that the maximum likelihood estimate (MLE) of genome rearrangement distance can be validly and more efficiently performed in this setting. We then construct the genome algebra for a more general case, that is, for genomes that may be represented by elements of an arbitrary group and symmetry group, and show that the MLE computations can be performed entirely within this framework. There is no prescribed model in this framework; that is, it allows any choice of rearrangements that preserve the set of regions, along with arbitrary weights. Further, since the likelihood function is built from path probabilities-a generalisation of path counts-the framework may be utilised for any distance measure that is based on path probabilities.
Collapse
Affiliation(s)
- Venta Terauds
- Discipline of Mathematics, School of Natural Sciences, University of Tasmania, Private Bag 37, Sandy Bay, TAS 7001 Australia
| | - Jeremy Sumner
- Discipline of Mathematics, School of Natural Sciences, University of Tasmania, Private Bag 37, Sandy Bay, TAS 7001 Australia
| |
Collapse
|
4
|
Terauds V, Stevenson J, Sumner J. A symmetry-inclusive algebraic approach to genome rearrangement. J Bioinform Comput Biol 2021; 19:2140015. [PMID: 34806949 DOI: 10.1142/s0219720021400151] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Of the many modern approaches to calculating evolutionary distance via models of genome rearrangement, most are tied to a particular set of genomic modeling assumptions and to a restricted class of allowed rearrangements. The "position paradigm", in which genomes are represented as permutations signifying the position (and orientation) of each region, enables a refined model-based approach, where one can select biologically plausible rearrangements and assign to them relative probabilities/costs. Here, one must further incorporate any underlying structural symmetry of the genomes into the calculations and ensure that this symmetry is reflected in the model. In our recently-introduced framework of genome algebras, each genome corresponds to an element that simultaneously incorporates all of its inherent physical symmetries. The representation theory of these algebras then provides a natural model of evolution via rearrangement as a Markov chain. Whilst the implementation of this framework to calculate distances for genomes with "practical" numbers of regions is currently computationally infeasible, we consider it to be a significant theoretical advance: one can incorporate different genomic modeling assumptions, calculate various genomic distances, and compare the results under different rearrangement models. The aim of this paper is to demonstrate some of these features.
Collapse
Affiliation(s)
- Venta Terauds
- Discipline of Mathematics, University of Tasmania, Private Bag 37, Sandy Bay, Tasmania 7001, Australia
| | - Joshua Stevenson
- Discipline of Mathematics, University of Tasmania, Private Bag 37, Sandy Bay, Tasmania 7001, Australia
| | - Jeremy Sumner
- Discipline of Mathematics, University of Tasmania, Private Bag 37, Sandy Bay, Tasmania 7001, Australia
| |
Collapse
|
5
|
Bhatia S, Egri-Nagy A, Serdoz S, Praeger CE, Gebhardt V, Francis A. A Path-Deformation Framework for Determining Weighted Genome Rearrangement Distance. Front Genet 2020; 11:1035. [PMID: 33193592 PMCID: PMC7542183 DOI: 10.3389/fgene.2020.01035] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2020] [Accepted: 08/11/2020] [Indexed: 11/16/2022] Open
Abstract
Measuring the distance between two bacterial genomes under the inversion process is usually done by assuming all inversions to occur with equal probability. Recently, an approach to calculating inversion distance using group theory was introduced, and is effective for the model in which only very short inversions occur. In this paper, we show how to use the group-theoretic framework to establish minimal distance for any weighting on the set of inversions, generalizing previous approaches. To do this we use the theory of rewriting systems for groups, and exploit the Knuth–Bendix algorithm, the first time this theory has been introduced into genome rearrangement problems. The central idea of the approach is to use existing group theoretic methods to find an initial path between two genomes in genome space (for instance using only short inversions), and then to deform this path to optimality using a confluent system of rewriting rules generated by the Knuth–Bendix algorithm.
Collapse
Affiliation(s)
- Sangeeta Bhatia
- Centre for Research in Mathematics and Data Science, Western Sydney University, Sydney, NSW, Australia
| | - Attila Egri-Nagy
- Centre for Research in Mathematics and Data Science, Western Sydney University, Sydney, NSW, Australia
| | - Stuart Serdoz
- Centre for Research in Mathematics and Data Science, Western Sydney University, Sydney, NSW, Australia
| | - Cheryl E Praeger
- School of Physics, Mathematics, and Computing, University of Western Australia, Perth, WA, Australia
| | - Volker Gebhardt
- Centre for Research in Mathematics and Data Science, Western Sydney University, Sydney, NSW, Australia
| | - Andrew Francis
- Centre for Research in Mathematics and Data Science, Western Sydney University, Sydney, NSW, Australia
| |
Collapse
|
6
|
A mean first passage time genome rearrangement distance. J Math Biol 2020; 80:1971-1992. [PMID: 32253463 DOI: 10.1007/s00285-020-01487-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 03/11/2020] [Indexed: 10/24/2022]
Abstract
This paper introduces a new way to define a genome rearrangement distance, using the concept of mean first passage time from probability theory. Crucially, this distance provides a genuine metric on genome space. We develop the theory and introduce a link to a graph-based zeta function. The approach is very general and can be applied to a wide variety of group-theoretic models of genome evolution.
Collapse
|
7
|
Position and Content Paradigms in Genome Rearrangements: The Wild and Crazy World of Permutations in Genomics. Bull Math Biol 2018; 80:3227-3246. [PMID: 30288640 DOI: 10.1007/s11538-018-0514-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2016] [Accepted: 09/24/2018] [Indexed: 10/28/2022]
Abstract
Modellers of large-scale genome rearrangement events, in which segments of DNA are inverted, moved, swapped, or even inserted or deleted, have found a natural syntax in the language of permutations. Despite this, there has been a wide range of modelling choices, assumptions and interpretations that make navigating the literature a significant challenge. Indeed, even authors of papers that use permutations to model genome rearrangement can struggle to interpret each others' work, because of subtle differences in basic assumptions that are often deeply ingrained (and consequently sometimes not even mentioned). In this paper, we describe the different ways in which permutations have been used to model genomes and genome rearrangement events, presenting some features and limitations of each approach, and show how the various models are related. This paper will help researchers navigate the landscape of permutation-based genome rearrangement models and make it easier for authors to present clear and consistent models.
Collapse
|
8
|
Maximum Likelihood Estimates of Rearrangement Distance: Implementing a Representation-Theoretic Approach. Bull Math Biol 2018; 81:535-567. [PMID: 30264286 DOI: 10.1007/s11538-018-0511-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2017] [Accepted: 09/18/2018] [Indexed: 10/28/2022]
Abstract
The calculation of evolutionary distance via models of genome rearrangement has an inherent combinatorial complexity. Various algorithms and estimators have been used to address this; however, many of these set quite specific conditions for the underlying model. A recently proposed technique, applying representation theory to calculate evolutionary distance between circular genomes as a maximum likelihood estimate, reduces the computational load by converting the combinatorial problem into a numerical one. We show that the technique may be applied to models with any choice of rearrangements and relative probabilities thereof; we then investigate the symmetry of circular genome rearrangement models in general. We discuss the practical implementation of the technique and, without introducing any bona fide numerical approximations, give the results of some initial calculations for genomes with up to 11 regions.
Collapse
|
9
|
Arruda TDS, Dias U, Dias Z. A GRASP-Based Heuristic for the Sorting by Length-Weighted Inversions Problem. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:352-363. [PMID: 26336142 DOI: 10.1109/tcbb.2015.2474400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Genome Rearrangements are large-scale mutational events that affect genomes during the evolutionary process. Therefore, these mutations differ from punctual mutations. They can move genes from one place to the other, change the orientation of some genes, or even change the number of chromosomes. In this work, we deal with inversion events which occur when a segment of DNA sequence in the genome is reversed. In our model, each inversion costs the number of elements in the reversed segment. We present a new algorithm for this problem based on the metaheuristic called Greedy Randomized Adaptive Search Procedure (GRASP) that has been routinely used to find solutions for combinatorial optimization problems. In essence, we implemented an iterative process in which each iteration receives a feasible solution whose neighborhood is investigated. Our analysis shows that we outperform any other approach by significant margin. We also use our algorithm to build phylogenetic trees for a subset of species in the Yersinia genus and we compared our trees to other results in the literature.
Collapse
|
10
|
|
11
|
Serdoz S, Egri-Nagy A, Sumner J, Holland BR, Jarvis PD, Tanaka MM, Francis AR. Maximum likelihood estimates of pairwise rearrangement distances. J Theor Biol 2017; 423:31-40. [DOI: 10.1016/j.jtbi.2017.04.015] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2017] [Revised: 04/12/2017] [Accepted: 04/13/2017] [Indexed: 11/16/2022]
|
12
|
Galvao GR, Baudet C, Dias Z. Sorting Circular Permutations by Super Short Reversals. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:620-633. [PMID: 26761858 DOI: 10.1109/tcbb.2016.2515594] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
We consider the problem of sorting a circular permutation by super short reversals (i.e., reversals of length at most 2), a problem that finds application in comparative genomics. Polynomial-time solutions to the unsigned version of this problem are known, but the signed version remained open. In this paper, we present the first polynomial-time solution to the signed version of this problem. Moreover, we perform experiments for inferring phylogenies of two different groups of bacterial species and compare our results with the phylogenies presented in previous works. Finally, to facilitate phylogenetic studies based on the methods studied in this paper, we present a web tool for rearrangement-based phylogenetic inference using short operations, such as super short reversals.
Collapse
|
13
|
Patel S. Drivers of bacterial genomes plasticity and roles they play in pathogen virulence, persistence and drug resistance. INFECTION GENETICS AND EVOLUTION 2016; 45:151-164. [DOI: 10.1016/j.meegid.2016.08.030] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2016] [Revised: 08/26/2016] [Accepted: 08/27/2016] [Indexed: 12/11/2022]
|
14
|
Galvão GR, Lee O, Dias Z. Sorting signed permutations by short operations. Algorithms Mol Biol 2015; 10:12. [PMID: 25838839 PMCID: PMC4383228 DOI: 10.1186/s13015-015-0040-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2014] [Accepted: 02/19/2015] [Indexed: 11/26/2022] Open
Abstract
Background During evolution, global mutations may alter the order and the orientation of the genes in a genome. Such mutations are referred to as rearrangement events, or simply operations. In unichromosomal genomes, the most common operations are reversals, which are responsible for reversing the order and orientation of a sequence of genes, and transpositions, which are responsible for switching the location of two contiguous portions of a genome. The problem of computing the minimum sequence of operations that transforms one genome into another – which is equivalent to the problem of sorting a permutation into the identity permutation – is a well-studied problem that finds application in comparative genomics. There are a number of works concerning this problem in the literature, but they generally do not take into account the length of the operations (i.e. the number of genes affected by the operations). Since it has been observed that short operations are prevalent in the evolution of some species, algorithms that efficiently solve this problem in the special case of short operations are of interest. Results In this paper, we investigate the problem of sorting a signed permutation by short operations. More precisely, we study four flavors of this problem: (i) the problem of sorting a signed permutation by reversals of length at most 2; (ii) the problem of sorting a signed permutation by reversals of length at most 3; (iii) the problem of sorting a signed permutation by reversals and transpositions of length at most 2; and (iv) the problem of sorting a signed permutation by reversals and transpositions of length at most 3. We present polynomial-time solutions for problems (i) and (iii), a 5-approximation for problem (ii), and a 3-approximation for problem (iv). Moreover, we show that the expected approximation ratio of the 5-approximation algorithm is not greater than 3 for random signed permutations with more than 12 elements. Finally, we present experimental results that show that the approximation ratios of the approximation algorithms cannot be smaller than 3. In particular, this means that the approximation ratio of the 3-approximation algorithm is tight.
Collapse
|
15
|
Bhatia S, Egri-Nagy A, Francis AR. Algebraic double cut and join : A group-theoretic approach to the operator on multichromosomal genomes. J Math Biol 2014; 71:1149-78. [PMID: 25502846 DOI: 10.1007/s00285-014-0852-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2014] [Revised: 11/25/2014] [Indexed: 11/27/2022]
Abstract
Establishing a distance between genomes is a significant problem in computational genomics, because its solution can be used to establish evolutionary relationships including phylogeny. The "double cut and join" (DCJ) model of chromosomal rearrangement proposed by Yancopoulos et al. (Bioinformatics 21:3340-3346, 2005) has received attention as it can model inversions, translocations, fusion and fission on a multichromosomal genome that may contain both linear and circular chromosomes. In this paper, we realize the DCJ operator as a group action on the space of multichromosomal genomes. We study this group action, deriving some properties of the group and finding group-theoretic analogues for the key results in the DCJ theory.
Collapse
Affiliation(s)
- Sangeeta Bhatia
- Centre for Research in Mathematics, University of Western Sydney, Penrith, NSW, 2751, Australia.
| | - Attila Egri-Nagy
- Centre for Research in Mathematics, University of Western Sydney, Penrith, NSW, 2751, Australia.
| | - Andrew R Francis
- Centre for Research in Mathematics, University of Western Sydney, Penrith, NSW, 2751, Australia.
| |
Collapse
|
16
|
Francis AR. An algebraic view of bacterial genome evolution. J Math Biol 2013; 69:1693-718. [DOI: 10.1007/s00285-013-0747-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2013] [Revised: 11/23/2013] [Indexed: 10/25/2022]
|