1
|
Stevenson J, Terauds V, Sumner J. Rearrangement Events on Circular Genomes. Bull Math Biol 2023; 85:107. [PMID: 37749280 PMCID: PMC10520144 DOI: 10.1007/s11538-023-01209-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 08/31/2023] [Indexed: 09/27/2023]
Abstract
Early literature on genome rearrangement modelling views the problem of computing evolutionary distances as an inherently combinatorial one. In particular, attention is given to estimating distances using the minimum number of events required to transform one genome into another. In hindsight, this approach is analogous to early methods for inferring phylogenetic trees from DNA sequences such as maximum parsimony-both are motivated by the principle that the true distance minimises evolutionary change, and both are effective if this principle is a true reflection of reality. Recent literature considers genome rearrangement under statistical models, continuing this parallel with DNA-based methods, with the goal of using model-based methods (for example maximum likelihood techniques) to compute distance estimates that incorporate the large number of rearrangement paths that can transform one genome into another. Crucially, this approach requires one to decide upon a set of feasible rearrangement events and, in this paper, we focus on characterising well-motivated models for signed, uni-chromosomal circular genomes, where the number of regions remains fixed. Since rearrangements are often mathematically described using permutations, we isolate the sets of permutations representing rearrangements that are biologically reasonable in this context, for example inversions and transpositions. We provide precise mathematical expressions for these rearrangements, and then describe them in terms of the set of cuts made in the genome when they are applied. We directly compare cuts to breakpoints, and use this concept to count the distinct rearrangement actions which apply a given number of cuts. Finally, we provide some examples of rearrangement models, and include a discussion of some questions that arise when defining plausible models.
Collapse
Affiliation(s)
| | - Venta Terauds
- University of Tasmania, Hobart, Australia
- University of South Australia, Adelaide, Australia
| | | |
Collapse
|
2
|
Clark C, Jonušas J, Mitchell JD, Francis A. An algebraic model for inversion and deletion in bacterial genome rearrangement. J Math Biol 2023; 87:34. [PMID: 37517046 PMCID: PMC10387463 DOI: 10.1007/s00285-023-01965-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Revised: 07/03/2023] [Accepted: 07/08/2023] [Indexed: 08/01/2023]
Abstract
Inversions, also sometimes called reversals, are a major contributor to variation among bacterial genomes, with studies suggesting that those involving small numbers of regions are more likely than larger inversions. Deletions may arise in bacterial genomes through the same biological mechanism as inversions, and hence a model that incorporates both is desirable. However, while inversion distances between genomes have been well studied, there has yet to be a model which accounts for the combination of both deletions and inversions. To account for both of these operations, we introduce an algebraic model that utilises partial permutations. This leads to an algorithm for calculating the minimum distance to the most recent common ancestor of two bacterial genomes evolving by inversions (of adjacent regions) and deletions. The algebraic model makes the existing short inversion models more complete and realistic by including deletions, and also introduces new algebraic tools into evolutionary distance problems.
Collapse
Affiliation(s)
- Chad Clark
- Centre for Research in Mathematics and Data Science, Western Sydney University, Penrith, NSW, Australia.
| | - Julius Jonušas
- Mathematical Institute, School of Mathematics and Statistics, University of St Andrews, St Andrews, UK
| | - James D Mitchell
- Mathematical Institute, School of Mathematics and Statistics, University of St Andrews, St Andrews, UK
| | - Andrew Francis
- Centre for Research in Mathematics and Data Science, Western Sydney University, Penrith, NSW, Australia
| |
Collapse
|
3
|
A new algebraic approach to genome rearrangement models. J Math Biol 2022; 84:49. [PMID: 35508785 PMCID: PMC9068684 DOI: 10.1007/s00285-022-01744-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 01/24/2022] [Accepted: 03/31/2022] [Indexed: 11/24/2022]
Abstract
We present a unified framework for modelling genomes and their rearrangements in a genome algebra, as elements that simultaneously incorporate all physical symmetries. Building on previous work utilising the group algebra of the symmetric group, we explicitly construct the genome algebra for the case of unsigned circular genomes with dihedral symmetry and show that the maximum likelihood estimate (MLE) of genome rearrangement distance can be validly and more efficiently performed in this setting. We then construct the genome algebra for a more general case, that is, for genomes that may be represented by elements of an arbitrary group and symmetry group, and show that the MLE computations can be performed entirely within this framework. There is no prescribed model in this framework; that is, it allows any choice of rearrangements that preserve the set of regions, along with arbitrary weights. Further, since the likelihood function is built from path probabilities—a generalisation of path counts—the framework may be utilised for any distance measure that is based on path probabilities.
Collapse
|
4
|
Terauds V, Stevenson J, Sumner J. A symmetry-inclusive algebraic approach to genome rearrangement. J Bioinform Comput Biol 2021; 19:2140015. [PMID: 34806949 DOI: 10.1142/s0219720021400151] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Of the many modern approaches to calculating evolutionary distance via models of genome rearrangement, most are tied to a particular set of genomic modeling assumptions and to a restricted class of allowed rearrangements. The "position paradigm", in which genomes are represented as permutations signifying the position (and orientation) of each region, enables a refined model-based approach, where one can select biologically plausible rearrangements and assign to them relative probabilities/costs. Here, one must further incorporate any underlying structural symmetry of the genomes into the calculations and ensure that this symmetry is reflected in the model. In our recently-introduced framework of genome algebras, each genome corresponds to an element that simultaneously incorporates all of its inherent physical symmetries. The representation theory of these algebras then provides a natural model of evolution via rearrangement as a Markov chain. Whilst the implementation of this framework to calculate distances for genomes with "practical" numbers of regions is currently computationally infeasible, we consider it to be a significant theoretical advance: one can incorporate different genomic modeling assumptions, calculate various genomic distances, and compare the results under different rearrangement models. The aim of this paper is to demonstrate some of these features.
Collapse
Affiliation(s)
- Venta Terauds
- Discipline of Mathematics, University of Tasmania, Private Bag 37, Sandy Bay, Tasmania 7001, Australia
| | - Joshua Stevenson
- Discipline of Mathematics, University of Tasmania, Private Bag 37, Sandy Bay, Tasmania 7001, Australia
| | - Jeremy Sumner
- Discipline of Mathematics, University of Tasmania, Private Bag 37, Sandy Bay, Tasmania 7001, Australia
| |
Collapse
|
5
|
Bhatia S, Egri-Nagy A, Serdoz S, Praeger CE, Gebhardt V, Francis A. A Path-Deformation Framework for Determining Weighted Genome Rearrangement Distance. Front Genet 2020; 11:1035. [PMID: 33193592 PMCID: PMC7542183 DOI: 10.3389/fgene.2020.01035] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2020] [Accepted: 08/11/2020] [Indexed: 11/16/2022] Open
Abstract
Measuring the distance between two bacterial genomes under the inversion process is usually done by assuming all inversions to occur with equal probability. Recently, an approach to calculating inversion distance using group theory was introduced, and is effective for the model in which only very short inversions occur. In this paper, we show how to use the group-theoretic framework to establish minimal distance for any weighting on the set of inversions, generalizing previous approaches. To do this we use the theory of rewriting systems for groups, and exploit the Knuth–Bendix algorithm, the first time this theory has been introduced into genome rearrangement problems. The central idea of the approach is to use existing group theoretic methods to find an initial path between two genomes in genome space (for instance using only short inversions), and then to deform this path to optimality using a confluent system of rewriting rules generated by the Knuth–Bendix algorithm.
Collapse
Affiliation(s)
- Sangeeta Bhatia
- Centre for Research in Mathematics and Data Science, Western Sydney University, Sydney, NSW, Australia
| | - Attila Egri-Nagy
- Centre for Research in Mathematics and Data Science, Western Sydney University, Sydney, NSW, Australia
| | - Stuart Serdoz
- Centre for Research in Mathematics and Data Science, Western Sydney University, Sydney, NSW, Australia
| | - Cheryl E Praeger
- School of Physics, Mathematics, and Computing, University of Western Australia, Perth, WA, Australia
| | - Volker Gebhardt
- Centre for Research in Mathematics and Data Science, Western Sydney University, Sydney, NSW, Australia
| | - Andrew Francis
- Centre for Research in Mathematics and Data Science, Western Sydney University, Sydney, NSW, Australia
| |
Collapse
|
6
|
A mean first passage time genome rearrangement distance. J Math Biol 2020; 80:1971-1992. [PMID: 32253463 DOI: 10.1007/s00285-020-01487-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 03/11/2020] [Indexed: 10/24/2022]
Abstract
This paper introduces a new way to define a genome rearrangement distance, using the concept of mean first passage time from probability theory. Crucially, this distance provides a genuine metric on genome space. We develop the theory and introduce a link to a graph-based zeta function. The approach is very general and can be applied to a wide variety of group-theoretic models of genome evolution.
Collapse
|
7
|
Maximum Likelihood Estimates of Rearrangement Distance: Implementing a Representation-Theoretic Approach. Bull Math Biol 2018; 81:535-567. [PMID: 30264286 DOI: 10.1007/s11538-018-0511-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2017] [Accepted: 09/18/2018] [Indexed: 10/28/2022]
Abstract
The calculation of evolutionary distance via models of genome rearrangement has an inherent combinatorial complexity. Various algorithms and estimators have been used to address this; however, many of these set quite specific conditions for the underlying model. A recently proposed technique, applying representation theory to calculate evolutionary distance between circular genomes as a maximum likelihood estimate, reduces the computational load by converting the combinatorial problem into a numerical one. We show that the technique may be applied to models with any choice of rearrangements and relative probabilities thereof; we then investigate the symmetry of circular genome rearrangement models in general. We discuss the practical implementation of the technique and, without introducing any bona fide numerical approximations, give the results of some initial calculations for genomes with up to 11 regions.
Collapse
|
8
|
|
9
|
Serdoz S, Egri-Nagy A, Sumner J, Holland BR, Jarvis PD, Tanaka MM, Francis AR. Maximum likelihood estimates of pairwise rearrangement distances. J Theor Biol 2017; 423:31-40. [DOI: 10.1016/j.jtbi.2017.04.015] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2017] [Revised: 04/12/2017] [Accepted: 04/13/2017] [Indexed: 11/16/2022]
|
10
|
Dimensional Reduction for the General Markov Model on Phylogenetic Trees. Bull Math Biol 2017; 79:619-634. [PMID: 28188429 DOI: 10.1007/s11538-017-0249-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2016] [Accepted: 01/19/2017] [Indexed: 10/20/2022]
Abstract
We present a method of dimensional reduction for the general Markov model of sequence evolution on a phylogenetic tree. We show that taking certain linear combinations of the associated random variables (site pattern counts) reduces the dimensionality of the model from exponential in the number of extant taxa, to quadratic in the number of taxa, while retaining the ability to statistically identify phylogenetic divergence events. A key feature is the identification of an invariant subspace which depends only bilinearly on the model parameters, in contrast to the usual multi-linear dependence in the full space. We discuss potential applications including the computation of split (edge) weights on phylogenetic trees from observed sequence data.
Collapse
|
11
|
Bhatia S, Egri-Nagy A, Francis AR. Algebraic double cut and join : A group-theoretic approach to the operator on multichromosomal genomes. J Math Biol 2014; 71:1149-78. [PMID: 25502846 DOI: 10.1007/s00285-014-0852-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2014] [Revised: 11/25/2014] [Indexed: 11/27/2022]
Abstract
Establishing a distance between genomes is a significant problem in computational genomics, because its solution can be used to establish evolutionary relationships including phylogeny. The "double cut and join" (DCJ) model of chromosomal rearrangement proposed by Yancopoulos et al. (Bioinformatics 21:3340-3346, 2005) has received attention as it can model inversions, translocations, fusion and fission on a multichromosomal genome that may contain both linear and circular chromosomes. In this paper, we realize the DCJ operator as a group action on the space of multichromosomal genomes. We study this group action, deriving some properties of the group and finding group-theoretic analogues for the key results in the DCJ theory.
Collapse
Affiliation(s)
- Sangeeta Bhatia
- Centre for Research in Mathematics, University of Western Sydney, Penrith, NSW, 2751, Australia.
| | - Attila Egri-Nagy
- Centre for Research in Mathematics, University of Western Sydney, Penrith, NSW, 2751, Australia.
| | - Andrew R Francis
- Centre for Research in Mathematics, University of Western Sydney, Penrith, NSW, 2751, Australia.
| |
Collapse
|