1
|
Hartmann T, Middendorf M, Bernt M. Genome Rearrangement Analysis : Cut and Join Genome Rearrangements and Gene Cluster Preserving Approaches. Methods Mol Biol 2024; 2802:215-245. [PMID: 38819562 DOI: 10.1007/978-1-0716-3838-5_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Genome rearrangements are mutations that change the gene content of a genome or the arrangement of the genes on a genome. Several years of research on genome rearrangements have established different algorithmic approaches for solving some fundamental problems in comparative genomics based on gene order information. This review summarizes the literature on genome rearrangement analysis along two lines of research. The first line considers rearrangement models that are particularly well suited for a theoretical analysis. These models use rearrangement operations that cut chromosomes into fragments and then join the fragments into new chromosomes. The second line works with rearrangement models that reflect several biologically motivated constraints, e.g., the constraint that gene clusters have to be preserved. In this chapter, the border between algorithmically "easy" and "hard" rearrangement problems is sketched and a brief review is given on the available software tools for genome rearrangement analysis.
Collapse
Affiliation(s)
- Tom Hartmann
- Swarm Intelligence and Complex Systems Group, Institute of Computer Science, University Leipzig, Leipzig, Germany
| | - Martin Middendorf
- Swarm Intelligence and Complex Systems Group, Institute of Computer Science, University Leipzig, Leipzig, Germany.
| | | |
Collapse
|
2
|
Hao F, Zhang M, Leong HW. A 2-Approximation Scheme for Sorting Signed Permutations by Reversals, Transpositions, Transreversals, and Block-Interchanges. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1702-1711. [PMID: 28678711 DOI: 10.1109/tcbb.2017.2719681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
We consider the problem of sorting signed permutations by reversals, transpositions, transreversals, and block-interchanges and give a 2-approximation scheme, called the GSB (Genome Sorting by Bridges) scheme. Our result extends 2-approximation algorithm of He and Chen [12] that allowed only reversals and block-interchanges, and also the 1.5 approximation algorithm of Hartman and Sharan [11] that allowed only transreversals and transpositions. We prove this result by introducing three bridge structures in the breakpoint graph, namely, the L-bridge, T-bridge, and X-bridge and show that they model "proper" reversals, transpositions, transreversals, and block-interchanges, respectively. We show that we can always find at least one of these three bridges in any breakpoint graph, thus giving an upper bound on the number of operations needed. We prove a lower bound on the distance and use it to show that GSB has a 2-approximation ratio. An ${\text{O(n}}^{3})$O(n3) algorithm called GSB-I that is based on the GSB approximation scheme presented in this paper has recently been published by Yu, Hao, and Leong in [17] . We note that our 2-approximation scheme admits many possible implementations by varying the order we search for proper rearrangement operations.
Collapse
|
3
|
Chen KT, Lu CL. CSAR-web: a web server of contig scaffolding using algebraic rearrangements. Nucleic Acids Res 2019; 46:W55-W59. [PMID: 29733393 PMCID: PMC6030906 DOI: 10.1093/nar/gky337] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2018] [Accepted: 04/19/2018] [Indexed: 01/23/2023] Open
Abstract
CSAR-web is a web-based tool that allows the users to efficiently and accurately scaffold (i.e. order and orient) the contigs of a target draft genome based on a complete or incomplete reference genome from a related organism. It takes as input a target genome in multi-FASTA format and a reference genome in FASTA or multi-FASTA format, depending on whether the reference genome is complete or incomplete, respectively. In addition, it requires the users to choose either ‘NUCmer on nucleotides’ or ‘PROmer on translated amino acids’ for CSAR-web to identify conserved genomic markers (i.e. matched sequence regions) between the target and reference genomes, which are used by the rearrangement-based scaffolding algorithm in CSAR-web to order and orient the contigs of the target genome based on the reference genome. In the output page, CSAR-web displays its scaffolding result in a graphical mode (i.e. scalable dotplot) allowing the users to visually validate the correctness of scaffolded contigs and in a tabular mode allowing the users to view the details of scaffolds. CSAR-web is available online at http://genome.cs.nthu.edu.tw/CSAR-web.
Collapse
Affiliation(s)
- Kun-Tze Chen
- Department of Computer Science, National Tsing Hua University, Hsinchu 30013, Taiwan
| | - Chin Lung Lu
- Department of Computer Science, National Tsing Hua University, Hsinchu 30013, Taiwan
| |
Collapse
|
4
|
Wang X, Wang L. A Simple Linear Space Algorithm for Computing Nonoverlapping Inversion and Transposition Distance in Quadratic Average Time. J Comput Biol 2018; 25:563-575. [DOI: 10.1089/cmb.2017.0257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
| | - Lei Wang
- Facebook, Inc., Menlo Park, California
| |
Collapse
|
5
|
Ta TT, Lin CY, Lu CL. Two-string consensus problem under non-overlapping inversion and transposition distance. INFORM PROCESS LETT 2018. [DOI: 10.1016/j.ipl.2017.10.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
6
|
Genome Rearrangement Analysis: Cut and Join Genome Rearrangements and Gene Cluster Preserving Approaches. Methods Mol Biol 2018; 1704:261-289. [PMID: 29277869 DOI: 10.1007/978-1-4939-7463-4_9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Genome rearrangements are mutations that change the gene content of a genome or the arrangement of the genes on a genome. Several years of research on genome rearrangements have established different algorithmic approaches for solving some fundamental problems in comparative genomics based on gene order information. This review summarizes the literature on genome rearrangement analysis along two lines of research. The first line considers rearrangement models that are particularly well suited for a theoretical analysis. These models use rearrangement operations that cut chromosomes into fragments and then join the fragments into new chromosomes. The second line works with rearrangement models that reflect several biologically motivated constraints, e.g., the constraint that gene clusters have to be preserved. In this chapter, the border between algorithmically "easy" and "hard" rearrangement problems is sketched and a brief review is given on the available software tools for genome rearrangement analysis.
Collapse
|
7
|
An efficient algorithm for computing non-overlapping inversion and transposition distance. INFORM PROCESS LETT 2016. [DOI: 10.1016/j.ipl.2016.07.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
8
|
Yu S, Hao F, Leong HW. An O([Formula: see text]) algorithm for sorting signed genomes by reversals, transpositions, transreversals and block-interchanges. J Bioinform Comput Biol 2015; 14:1640002. [PMID: 26707923 DOI: 10.1142/s0219720016400023] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
We consider the problem of sorting signed permutations by reversals, transpositions, transreversals, and block-interchanges. The problem arises in the study of species evolution via large-scale genome rearrangement operations. Recently, Hao et al. gave a 2-approximation scheme called genome sorting by bridges (GSB) for solving this problem. Their result extended and unified the results of (i) He and Chen - a 2-approximation algorithm allowing reversals, transpositions, and block-interchanges (by also allowing transversals) and (ii) Hartman and Sharan - a 1.5-approximation algorithm allowing reversals, transpositions, and transversals (by also allowing block-interchanges). The GSB result is based on introduction of three bridge structures in the breakpoint graph, the L-bridge, T-bridge, and X-bridge that models goodreversal, transposition/transreversal, and block-interchange, respectively. However, the paper by Hao et al. focused on proving the 2-approximation GSB scheme and only mention a straightforward [Formula: see text] algorithm. In this paper, we give an [Formula: see text] algorithm for implementing the GSB scheme. The key idea behind our faster GSB algorithm is to represent cycles in the breakpoint graph by their canonical sequences, which greatly simplifies the search for these bridge structures. We also give some comparison results (running time and computed distances) against the original GSB implementation.
Collapse
Affiliation(s)
- Shuzhi Yu
- * Department of Computer Science, National University of Singapore, 13 Computing Drive, Singapore 117417, Republic of Singapore
| | - Fanchang Hao
- † School of Information and Key Laboratory of Evidence-Identifying in Universities of Shandong, Shandong University of Political Science and Law, Jinan, Shandong 250014, P. R. China
| | - Hon Wai Leong
- * Department of Computer Science, National University of Singapore, 13 Computing Drive, Singapore 117417, Republic of Singapore
| |
Collapse
|
9
|
Sielaff M, Schmidt H, Struck TH, Rosenkranz D, Mark Welch DB, Hankeln T, Herlyn H. Phylogeny of Syndermata (syn. Rotifera): Mitochondrial gene order verifies epizoic Seisonidea as sister to endoparasitic Acanthocephala within monophyletic Hemirotifera. Mol Phylogenet Evol 2015; 96:79-92. [PMID: 26702959 DOI: 10.1016/j.ympev.2015.11.017] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2015] [Revised: 11/19/2015] [Accepted: 11/24/2015] [Indexed: 10/22/2022]
Abstract
A monophyletic origin of endoparasitic thorny-headed worms (Acanthocephala) and wheel-animals (Rotifera) is widely accepted. However, the phylogeny inside the clade, be it called Syndermata or Rotifera, has lacked validation by mitochondrial (mt) data. Herein, we present the first mt genome of the key taxon Seison and report conflicting results of phylogenetic analyses: while mt sequence-based topologies showed monophyletic Lemniscea (Bdelloidea+Acanthocephala), gene order analyses supported monophyly of Pararotatoria (Seisonidea+Acanthocephala) and Hemirotifera (Bdelloidea+Pararotatoria). Sequence-based analyses obviously suffered from substitution saturation, compositional bias, and branch length heterogeneity; however, we observed no compromising effects in gene order analyses. Moreover, gene order-based topologies were robust to changes in coding (genes vs. gene pairs, two-state vs. multistate, aligned vs. non-aligned), tree reconstruction methods, and the treatment of the two monogonont mt genomes. Thus, mt gene order verifies seisonids as sister to acanthocephalans within monophyletic Hemirotifera, while deviating results of sequence-based analyses reflect artificial signal. This conclusion implies that the complex life cycle of extant acanthocephalans evolved from a free-living state, as retained by most monogononts and bdelloids, via an epizoic state with a simple life cycle, as shown by seisonids. Hence, Acanthocephala represent a rare example where ancestral transitional stages have counterparts amongst the closest relatives.
Collapse
Affiliation(s)
- Malte Sielaff
- Institute of Molecular Genetics, Johannes Gutenberg-University Mainz, J.J. Becher-Weg 30a, D-55099 Mainz, Germany
| | - Hanno Schmidt
- Institute of Molecular Genetics, Johannes Gutenberg-University Mainz, J.J. Becher-Weg 30a, D-55099 Mainz, Germany
| | - Torsten H Struck
- National Centre for Biosystematics, Natural History Museum, University of Oslo, P.O. Box 1172, Blindern, NO-0318 Oslo, Norway
| | - David Rosenkranz
- Institute of Anthropology, Johannes Gutenberg-University Mainz, Anselm-Franz-von-Bentzel-Weg 7, D-55099 Mainz, Germany
| | - David B Mark Welch
- Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, Woods Hole, MA, United States
| | - Thomas Hankeln
- Institute of Molecular Genetics, Johannes Gutenberg-University Mainz, J.J. Becher-Weg 30a, D-55099 Mainz, Germany
| | - Holger Herlyn
- Institute of Anthropology, Johannes Gutenberg-University Mainz, Anselm-Franz-von-Bentzel-Weg 7, D-55099 Mainz, Germany.
| |
Collapse
|
10
|
Lu CL. An efficient algorithm for the contig ordering problem under algebraic rearrangement distance. J Comput Biol 2015; 22:975-87. [PMID: 26247343 DOI: 10.1089/cmb.2015.0073] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Assembling a genome from short reads currently obtained by next-generation sequencing techniques often results in a collection of contigs, whose relative position and orientation along the genome being sequenced are unknown. Given two sets of contigs, the contig ordering problem is to order and orient the contigs in each set such that the genome rearrangement distance between the resulting sets of ordered and oriented contigs is minimized. In this article, we utilize the permutation groups in algebra to propose a near-linear time algorithm for solving the contig ordering problem under algebraic rearrangement distance, where the algebraic rearrangement distance between two sets of ordered and oriented contigs is the minimum weight of applicable rearrangement operations required to transform one set into the other.
Collapse
Affiliation(s)
- Chin Lung Lu
- Department of Computer Science, National Tsing Hua University , Hsinchu, Taiwan
| |
Collapse
|
11
|
Lu CL, Chen KT, Huang SY, Chiu HT. CAR: contig assembly of prokaryotic draft genomes using rearrangements. BMC Bioinformatics 2014; 15:381. [PMID: 25431302 PMCID: PMC4253983 DOI: 10.1186/s12859-014-0381-3] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2014] [Accepted: 11/05/2014] [Indexed: 11/23/2022] Open
Abstract
Background Next generation sequencing technology has allowed efficient production of draft genomes for many organisms of interest. However, most draft genomes are just collections of independent contigs, whose relative positions and orientations along the genome being sequenced are unknown. Although several tools have been developed to order and orient the contigs of draft genomes, more accurate tools are still needed. Results In this study, we present a novel reference-based contig assembly (or scaffolding) tool, named as CAR, that can efficiently and more accurately order and orient the contigs of a prokaryotic draft genome based on a reference genome of a related organism. Given a set of contigs in multi-FASTA format and a reference genome in FASTA format, CAR can output a list of scaffolds, each of which is a set of ordered and oriented contigs. For validation, we have tested CAR on a real dataset composed of several prokaryotic genomes and also compared its performance with several other reference-based contig assembly tools. Consequently, our experimental results have shown that CAR indeed performs better than all these other reference-based contig assembly tools in terms of sensitivity, precision and genome coverage. Conclusions CAR serves as an efficient tool that can more accurately order and orient the contigs of a prokaryotic draft genome based on a reference genome. The web server of CAR is freely available at http://genome.cs.nthu.edu.tw/CAR/
and its stand-alone program can also be downloaded from the same website. Electronic supplementary material The online version of this article (doi:10.1186/s12859-014-0381-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Chin Lung Lu
- Department of Computer Science, National Tsing Hua University, Hsinchu, 300, Taiwan.
| | - Kun-Tze Chen
- Department of Computer Science, National Tsing Hua University, Hsinchu, 300, Taiwan.
| | - Shih-Yuan Huang
- Department of Computer Science, National Tsing Hua University, Hsinchu, 300, Taiwan.
| | - Hsien-Tai Chiu
- Department of Chemistry, National Cheng Kung University, Tainan City, 701, Taiwan.
| |
Collapse
|
12
|
Heydari M, Marashi SA, Tusserkani R, Sadeghi M. Reconstruction of phylogenetic trees of prokaryotes using maximal common intervals. Biosystems 2014; 124:86-94. [PMID: 25195150 DOI: 10.1016/j.biosystems.2014.09.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2012] [Revised: 08/13/2014] [Accepted: 09/01/2014] [Indexed: 11/15/2022]
Abstract
One of the fundamental problems in bioinformatics is phylogenetic tree reconstruction, which can be used for classifying living organisms into different taxonomic clades. The classical approach to this problem is based on a marker such as 16S ribosomal RNA. Since evolutionary events like genomic rearrangements are not included in reconstructions of phylogenetic trees based on single genes, much effort has been made to find other characteristics for phylogenetic reconstruction in recent years. With the increasing availability of completely sequenced genomes, gene order can be considered as a new solution for this problem. In the present work, we applied maximal common intervals (MCIs) in two or more genomes to infer their distance and to reconstruct their evolutionary relationship. Additionally, measures based on uncommon segments (UCS's), i.e., those genomic segments which are not detected as part of any of the MCIs, are also used for phylogenetic tree reconstruction. We applied these two types of measures for reconstructing the phylogenetic tree of 63 prokaryotes with known COG (clusters of orthologous groups) families. Similarity between the MCI-based (resp. UCS-based) reconstructed phylogenetic trees and the phylogenetic tree obtained from NCBI taxonomy browser is as high as 93.1% (resp. 94.9%). We show that in the case of this diverse dataset of prokaryotes, tree reconstruction based on MCI and UCS outperforms most of the currently available methods based on gene orders, including breakpoint distance and DCJ. We additionally tested our new measures on a dataset of 13 closely-related bacteria from the genus Prochlorococcus. In this case, distances like rearrangement distance, breakpoint distance and DCJ proved to be useful, while our new measures are still appropriate for phylogenetic reconstruction.
Collapse
Affiliation(s)
- Mahdi Heydari
- Department of Algorithms and Computation, College of Engineering, University of Tehran, Tehran, Iran
| | - Sayed-Amir Marashi
- Department of Biotechnology, College of Science, University of Tehran, Tehran, Iran; School of Biological Science, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.
| | - Ruzbeh Tusserkani
- School of Mathematics, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
| | - Mehdi Sadeghi
- National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
| |
Collapse
|
13
|
Feijão P, Meidanis J. Extending the algebraic formalism for genome rearrangements to include linear chromosomes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:819-831. [PMID: 24334378 DOI: 10.1109/tcbb.2012.161] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Algebraic rearrangement theory, as introduced by Meidanis and Dias, focuses on representing the order in which genes appear in chromosomes, and applies to circular chromosomes only. By shifting our attention to genome adjacencies, we introduce the adjacency algebraic theory, extending the original algebraic theory to linear chromosomes in a very natural way, also allowing the original algebraic distance formula to be used to the general multichromosomal case, with both linear and circular chromosomes. The resulting distance, which we call algebraic distance here, is very similar to, but not quite the same as, double-cut-and-join distance. We present linear time algorithms to compute it and to sort genomes. We show how to compute the rearrangement distance from the adjacency graph, for an easier comparison with other rearrangement distances. A thorough discussion on the relationship between the chromosomal and adjacency representation is also given, and we show how all classic rearrangement operations can be modeled using the algebraic theory.
Collapse
Affiliation(s)
| | - João Meidanis
- University of Campinas, Campinas and Scylla Bioinformatics, Brazil
| |
Collapse
|
14
|
Li CL, Chen KT, Lu CL. Assembling contigs in draft genomes using reversals and block-interchanges. BMC Bioinformatics 2013; 14 Suppl 5:S9. [PMID: 23734866 PMCID: PMC3622642 DOI: 10.1186/1471-2105-14-s5-s9] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
The techniques of next generation sequencing allow an increasing number of draft genomes to be produced rapidly in a decreasing cost. However, these draft genomes usually are just partially sequenced as collections of unassembled contigs, which cannot be used directly by currently existing algorithms for studying their genome rearrangements and phylogeny reconstruction. In this work, we study the one-sided block (or contig) ordering problem with weighted reversal and block-interchange distance. Given a partially assembled genome π and a completely assembled genome σ, the problem is to find an optimal ordering to assemble (i.e., order and orient) the contigs of π such that the rearrangement distance measured by reversals and block-interchanges (also called generalized transpositions) with the weight ratio 1:2 between the assembled contigs of π and σ is minimized. In addition to genome rearrangements and phylogeny reconstruction, the one-sided block ordering problem particularly has a useful application in genome resequencing, because its algorithms can be used to assemble the contigs of a draft genome π based on a reference genome σ. By using permutation groups, we design an efficient algorithm to solve this one-sided block ordering problem in Oδn time, where n is the number of genes or markers and δ is the number of used reversals and block-interchanges. We also show that the assembly of the partially assembled genome can be done in On time and its weighted rearrangement distance from the completely assembled genome can be calculated in advance in On time. Finally, we have implemented our algorithm into a program and used some simulated datasets to compare its accuracy performance to a currently existing similar tool, called SIS that was implemented by a heuristic algorithm that considers only reversals, on assembling the contigs in draft genomes based on their reference genomes. Our experimental results have shown that the accuracy performance of our program is better than that of SIS, when the number of reversals and transpositions involved in the rearrangement events between the complete genomes of π and σ is increased. In particular, if there are more transpositions involved in the rearrangement events, then the gap of accuracy performance between our program and SIS is increasing.
Collapse
Affiliation(s)
- Chi-Long Li
- Department of Computer Science, National Tsing Hua University, Hsinchu 30013, Taiwan.
| | | | | |
Collapse
|
15
|
Huang KH, Chen KT, Lu CL. Sorting permutations by cut-circularize-linearize-and-paste operations. BMC Genomics 2011; 12 Suppl 3:S26. [PMID: 22369173 PMCID: PMC3333185 DOI: 10.1186/1471-2164-12-s3-s26] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
Background Genome rearrangements are studied on the basis of genome-wide analysis of gene orders and important in the evolution of species. In the last two decades, a variety of rearrangement operations, such as reversals, transpositions, block-interchanges, translocations, fusions and fissions, have been proposed to evaluate the differences between gene orders in two or more genomes. Usually, the computational studies of genome rearrangements are formulated as problems of sorting permutations by rearrangement operations. Result In this article, we study a sorting problem by cut-circularize-linearize-and-paste (CCLP) operations, which aims to find a minimum number of CCLP operations to sort a signed permutation representing a chromosome. The CCLP is a genome rearrangement operation that cuts a segment out of a chromosome, circularizes the segment into a temporary circle, linearizes the temporary circle as a linear segment, and possibly inverts the linearized segment and pastes it into the remaining chromosome. The CCLP operation can model many well-known rearrangements, such as reversals, transpositions and block-interchanges, and others not reported in the biological literature. In addition, it really occurs in the immune response of higher animals. To distinguish those CCLP operations from the reversal, we call them as non-reversal CCLP operations. In this study, we use permutation groups in algebra to design an O(δn) time algorithm for solving the weighted sorting problem by CCLP operations when the weight ratio between reversals and non-reversal CCLP operations is 1:2, where n is the number of genes in the given chromosome and δ is the number of needed CCLP operations. Conclusion The algorithm we propose in this study is very simple so that it can be easily implemented with 1-dimensional arrays and useful in the studies of phylogenetic tree reconstruction and human immune response to tumors.
Collapse
Affiliation(s)
- Keng-Hsuan Huang
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 30010, Taiwan
| | | | | |
Collapse
|
16
|
Huang YL, Huang CC, Tang CY, Lu CL. SoRT2: a tool for sorting genomes and reconstructing phylogenetic trees by reversals, generalized transpositions and translocations. Nucleic Acids Res 2010; 38:W221-7. [PMID: 20538651 PMCID: PMC2896082 DOI: 10.1093/nar/gkq520] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2010] [Revised: 05/20/2010] [Accepted: 05/24/2010] [Indexed: 12/03/2022] Open
Abstract
SoRT(2) is a web server that allows the user to perform genome rearrangement analysis involving reversals, generalized transpositions and translocations (including fusions and fissions), and infer phylogenetic trees of genomes being considered based on their pairwise genome rearrangement distances. It takes as input two or more linear/circular multi-chromosomal gene (or synteny block) orders in FASTA-like format. When the input is two genomes, SoRT(2) will quickly calculate their rearrangement distance, as well as a corresponding optimal scenario by highlighting the genes involved in each rearrangement operation. In the case of multiple genomes, SoRT(2) will also construct phylogenetic trees of these genomes based on a matrix of their pairwise rearrangement distances using distance-based approaches, such as neighbor-joining (NJ), unweighted pair group method with arithmetic mean (UPGMA) and Fitch-Margoliash (FM) methods. In addition, if the function of computing jackknife support values is selected, SoRT(2) will further perform the jackknife analysis to evaluate statistical reliability of the constructed NJ, UPGMA and FM trees. SoRT(2) is available online at http://bioalgorithm.life.nctu.edu.tw/SORT2/.
Collapse
Affiliation(s)
- Yen-Lin Huang
- Department of Computer Science, National Tsing Hua University, Institute of Bioinformatics and Systems Biology and Department of Biological Science and Technology, National Chiao Tung University, Hsinchu 300, Taiwan, R.O.C
| | - Chen-Cheng Huang
- Department of Computer Science, National Tsing Hua University, Institute of Bioinformatics and Systems Biology and Department of Biological Science and Technology, National Chiao Tung University, Hsinchu 300, Taiwan, R.O.C
| | - Chuan Yi Tang
- Department of Computer Science, National Tsing Hua University, Institute of Bioinformatics and Systems Biology and Department of Biological Science and Technology, National Chiao Tung University, Hsinchu 300, Taiwan, R.O.C
| | - Chin Lung Lu
- Department of Computer Science, National Tsing Hua University, Institute of Bioinformatics and Systems Biology and Department of Biological Science and Technology, National Chiao Tung University, Hsinchu 300, Taiwan, R.O.C
| |
Collapse
|
17
|
Cheng CH, Yang CH, Chiu HT, Lu CL. Reconstructing genome trees of prokaryotes using overlapping genes. BMC Bioinformatics 2010; 11:102. [PMID: 20181237 PMCID: PMC2845580 DOI: 10.1186/1471-2105-11-102] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2009] [Accepted: 02/24/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Overlapping genes (OGs) are defined as adjacent genes whose coding sequences overlap partially or entirely. In fact, they are ubiquitous in microbial genomes and more conserved between species than non-overlapping genes. Based on this property, we have previously implemented a web server, named OGtree, that allows the user to reconstruct genome trees of some prokaryotes according to their pairwise OG distances. By analogy to the analyses of gene content and gene order, the OG distance between two genomes we defined was based on a measure of combining OG content (i.e., the normalized number of shared orthologous OG pairs) and OG order (i.e., the normalized OG breakpoint distance) in their whole genomes. A shortcoming of using the concept of breakpoints to define the OG distance is its inability to analyze the OG distance of multi-chromosomal genomes. In addition, the amount of overlapping coding sequences between some distantly related prokaryotic genomes may be limited so that it is hard to find enough OGs to properly evaluate their pairwise OG distances. RESULTS In this study, we therefore define a new OG order distance that is based on more biologically accurate rearrangements (e.g., reversals, transpositions and translocations) rather than breakpoints and that is applicable to both uni-chromosomal and multi-chromosomal genomes. In addition, we expand the term "gene" to include both its coding sequence and regulatory regions so that two adjacent genes whose coding sequences or regulatory regions overlap with each other are considered as a pair of overlapping genes. This is because overlapping of regulatory regions of distinct genes suggests that the regulation of expression for these genes should be more or less interrelated. Based on these modifications, we have reimplemented our OGtree as a new web server, named OGtree2, and have also evaluated its accuracy of genome tree reconstruction on a testing dataset consisting of 21 Proteobacteria genomes. Our experimental results have finally shown that our current OGtree2 indeed outperforms its previous version OGtree, as well as another similar server, called BPhyOG, significantly in the quality of genome tree reconstruction, because the phylogenetic tree obtained by OGtree2 is greatly congruent with the reference tree that coincides with the taxonomy accepted by biologists for these Proteobacteria. CONCLUSIONS In this study, we have introduced a new web server OGtree2 at http://bioalgorithm.life.nctu.edu.tw/OGtree2.0/ that can serve as a useful tool for reconstructing more precise and robust genome trees of prokaryotes according to their overlapping genes.
Collapse
Affiliation(s)
- Chih-Hsien Cheng
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 300, Taiwan
| | - Chung-Han Yang
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 300, Taiwan
| | - Hsien-Tai Chiu
- Department of Biological Science and Technology, National Chiao Tung University, Hsinchu 300, Taiwan
| | - Chin Lung Lu
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 300, Taiwan
- Department of Biological Science and Technology, National Chiao Tung University, Hsinchu 300, Taiwan
| |
Collapse
|