1
|
Wang Y. Algorithms for the Uniqueness of the Longest Common Subsequence. J Bioinform Comput Biol 2023; 21:2350027. [PMID: 38212873 DOI: 10.1142/s0219720023500270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2024]
Abstract
Given several number sequences, determining the longest common subsequence is a classical problem in computer science. This problem has applications in bioinformatics, especially determining transposable genes. Nevertheless, related works only consider how to find one longest common subsequence. In this paper, we consider how to determine the uniqueness of the longest common subsequence. If there are multiple longest common subsequences, we also determine which number appears in all/some/none of the longest common subsequences. We focus on four scenarios: (1) linear sequences without duplicated numbers; (2) circular sequences without duplicated numbers; (3) linear sequences with duplicated numbers; (4) circular sequences with duplicated numbers. We develop corresponding algorithms and apply them to gene sequencing data.
Collapse
Affiliation(s)
- Yue Wang
- Department of Computational Medicine, University of California, Los Angeles, California, USA
- Irving Institute for Cancer Dynamics and Department of Statistics, Columbia University, New York, New York, USA
| |
Collapse
|
2
|
Stevenson J, Terauds V, Sumner J. Rearrangement Events on Circular Genomes. Bull Math Biol 2023; 85:107. [PMID: 37749280 PMCID: PMC10520144 DOI: 10.1007/s11538-023-01209-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 08/31/2023] [Indexed: 09/27/2023]
Abstract
Early literature on genome rearrangement modelling views the problem of computing evolutionary distances as an inherently combinatorial one. In particular, attention is given to estimating distances using the minimum number of events required to transform one genome into another. In hindsight, this approach is analogous to early methods for inferring phylogenetic trees from DNA sequences such as maximum parsimony-both are motivated by the principle that the true distance minimises evolutionary change, and both are effective if this principle is a true reflection of reality. Recent literature considers genome rearrangement under statistical models, continuing this parallel with DNA-based methods, with the goal of using model-based methods (for example maximum likelihood techniques) to compute distance estimates that incorporate the large number of rearrangement paths that can transform one genome into another. Crucially, this approach requires one to decide upon a set of feasible rearrangement events and, in this paper, we focus on characterising well-motivated models for signed, uni-chromosomal circular genomes, where the number of regions remains fixed. Since rearrangements are often mathematically described using permutations, we isolate the sets of permutations representing rearrangements that are biologically reasonable in this context, for example inversions and transpositions. We provide precise mathematical expressions for these rearrangements, and then describe them in terms of the set of cuts made in the genome when they are applied. We directly compare cuts to breakpoints, and use this concept to count the distinct rearrangement actions which apply a given number of cuts. Finally, we provide some examples of rearrangement models, and include a discussion of some questions that arise when defining plausible models.
Collapse
Affiliation(s)
| | - Venta Terauds
- University of Tasmania, Hobart, Australia
- University of South Australia, Adelaide, Australia
| | | |
Collapse
|
3
|
Terauds V, Sumner J. A new algebraic approach to genome rearrangement models. J Math Biol 2022; 84:49. [PMID: 35508785 PMCID: PMC9068684 DOI: 10.1007/s00285-022-01744-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 01/24/2022] [Accepted: 03/31/2022] [Indexed: 11/24/2022]
Abstract
We present a unified framework for modelling genomes and their rearrangements in a genome algebra, as elements that simultaneously incorporate all physical symmetries. Building on previous work utilising the group algebra of the symmetric group, we explicitly construct the genome algebra for the case of unsigned circular genomes with dihedral symmetry and show that the maximum likelihood estimate (MLE) of genome rearrangement distance can be validly and more efficiently performed in this setting. We then construct the genome algebra for a more general case, that is, for genomes that may be represented by elements of an arbitrary group and symmetry group, and show that the MLE computations can be performed entirely within this framework. There is no prescribed model in this framework; that is, it allows any choice of rearrangements that preserve the set of regions, along with arbitrary weights. Further, since the likelihood function is built from path probabilities-a generalisation of path counts-the framework may be utilised for any distance measure that is based on path probabilities.
Collapse
Affiliation(s)
- Venta Terauds
- Discipline of Mathematics, School of Natural Sciences, University of Tasmania, Private Bag 37, Sandy Bay, TAS 7001 Australia
| | - Jeremy Sumner
- Discipline of Mathematics, School of Natural Sciences, University of Tasmania, Private Bag 37, Sandy Bay, TAS 7001 Australia
| |
Collapse
|
4
|
Terauds V, Stevenson J, Sumner J. A symmetry-inclusive algebraic approach to genome rearrangement. J Bioinform Comput Biol 2021; 19:2140015. [PMID: 34806949 DOI: 10.1142/s0219720021400151] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Of the many modern approaches to calculating evolutionary distance via models of genome rearrangement, most are tied to a particular set of genomic modeling assumptions and to a restricted class of allowed rearrangements. The "position paradigm", in which genomes are represented as permutations signifying the position (and orientation) of each region, enables a refined model-based approach, where one can select biologically plausible rearrangements and assign to them relative probabilities/costs. Here, one must further incorporate any underlying structural symmetry of the genomes into the calculations and ensure that this symmetry is reflected in the model. In our recently-introduced framework of genome algebras, each genome corresponds to an element that simultaneously incorporates all of its inherent physical symmetries. The representation theory of these algebras then provides a natural model of evolution via rearrangement as a Markov chain. Whilst the implementation of this framework to calculate distances for genomes with "practical" numbers of regions is currently computationally infeasible, we consider it to be a significant theoretical advance: one can incorporate different genomic modeling assumptions, calculate various genomic distances, and compare the results under different rearrangement models. The aim of this paper is to demonstrate some of these features.
Collapse
Affiliation(s)
- Venta Terauds
- Discipline of Mathematics, University of Tasmania, Private Bag 37, Sandy Bay, Tasmania 7001, Australia
| | - Joshua Stevenson
- Discipline of Mathematics, University of Tasmania, Private Bag 37, Sandy Bay, Tasmania 7001, Australia
| | - Jeremy Sumner
- Discipline of Mathematics, University of Tasmania, Private Bag 37, Sandy Bay, Tasmania 7001, Australia
| |
Collapse
|
5
|
Rudanella paleaurantiibacter sp. nov., Isolated from Activated Sludge. Curr Microbiol 2020; 77:2016-2022. [PMID: 32372104 DOI: 10.1007/s00284-020-02005-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2019] [Accepted: 04/24/2020] [Indexed: 10/24/2022]
Abstract
A Gram-stain-negative, orange-colored bacterium, designated HX-22-17T, was isolated from activated sludge of an agricultural chemical plant in Maanshan, Anhui province, China (118° 52' E 31° 68' N). The strain was strictly aerobic, non-endospore forming, non-motile, and ellipse. Growth of the strain was observed at 16-42 °C (optimum between 25 and 30 °C) at pH 6.0-9.0 (optimum at pH 7.0) and with 0-6.0% (w/v) NaCl (optimum at 1.0%). 16S rRNA gene sequence analysis showed that the strain was most closely related to Rudanella lutea KACC 12603T (99.5% similarity). The predominant cellular fatty acids were summed feature 3 (C16:1ω7c and/or C16:1ω6c), iso-C15:0, and anteiso-C15:0. The major polar lipids included posphatidylethanolamine (PE), aminolipid (AL), and phospholipids (PL). The genomic DNA G+C content of the strain was 54.1 mol%. The ANI and dDDH values obtained between the genomes of HX-22-17T and R. lutea KACC 12603T were 89.3% and 39.3%, respectively. The phenotypic, chemotaxonomic, and genotypic data clearly showed that strain HX-22-17T represents a novel species of the genus Rudanella, for which the name R. paleaurantiibacter sp. nov. is proposed. The type strain is HX-22-17T (=KCTC 72656T = CCTCC AB 2019347T).
Collapse
|
6
|
Sevillya G, Doerr D, Lerner Y, Stoye J, Steel M, Snir S. Horizontal Gene Transfer Phylogenetics: A Random Walk Approach. Mol Biol Evol 2020; 37:1470-1479. [PMID: 31845962 DOI: 10.1093/molbev/msz302] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The dramatic decrease in time and cost for generating genetic sequence data has opened up vast opportunities in molecular systematics, one of which is the ability to decipher the evolutionary history of strains of a species. Under this fine systematic resolution, the standard markers are too crude to provide a phylogenetic signal. Nevertheless, among prokaryotes, genome dynamics in the form of horizontal gene transfer (HGT) between organisms and gene loss seem to provide far richer information by affecting both gene order and gene content. The "synteny index" (SI) between a pair of genomes combines these latter two factors, allowing comparison of genomes with unequal gene content, together with order considerations of their common genes. Although this approach is useful for classifying close relatives, no rigorous statistical modeling for it has been suggested. Such modeling is valuable, as it allows observed measures to be transformed into estimates of time periods during evolution, yielding the "additivity" of the measure. To the best of our knowledge, there is no other additivity proof for other gene order/content measures under HGT. Here, we provide a first statistical model and analysis for the SI measure. We model the "gene neighborhood" as a "birth-death-immigration" process affected by the HGT activity over the genome, and analytically relate the HGT rate and time to the expected SI. This model is asymptotic and thus provides accurate results, assuming infinite size genomes. Therefore, we also developed a heuristic model following an "exponential decay" function, accounting for biologically realistic values, which performed well in simulations. Applying this model to 1,133 prokaryotes partitioned to 39 clusters by the rank of genus yields that the average number of genome dynamics events per gene in the phylogenetic depth of genus is around half with significant variability between genera. This result extends and confirms similar results obtained for individual genera in different manners.
Collapse
Affiliation(s)
- Gur Sevillya
- Department of Evolutionary Biology, University of Haifa, Haifa, Israel
| | - Daniel Doerr
- Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Yael Lerner
- Department of Evolutionary Biology, University of Haifa, Haifa, Israel
| | - Jens Stoye
- Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Mike Steel
- School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
| | - Sagi Snir
- Department of Evolutionary Biology, University of Haifa, Haifa, Israel
| |
Collapse
|
7
|
A mean first passage time genome rearrangement distance. J Math Biol 2020; 80:1971-1992. [PMID: 32253463 DOI: 10.1007/s00285-020-01487-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 03/11/2020] [Indexed: 10/24/2022]
Abstract
This paper introduces a new way to define a genome rearrangement distance, using the concept of mean first passage time from probability theory. Crucially, this distance provides a genuine metric on genome space. We develop the theory and introduce a link to a graph-based zeta function. The approach is very general and can be applied to a wide variety of group-theoretic models of genome evolution.
Collapse
|