1
|
The Galaxy platform for accessible, reproducible, and collaborative data analyses: 2024 update. Nucleic Acids Res 2024:gkae410. [PMID: 38769056 DOI: 10.1093/nar/gkae410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 04/18/2024] [Accepted: 05/02/2024] [Indexed: 05/22/2024] Open
Abstract
Galaxy (https://galaxyproject.org) is deployed globally, predominantly through free-to-use services, supporting user-driven research that broadens in scope each year. Users are attracted to public Galaxy services by platform stability, tool and reference dataset diversity, training, support and integration, which enables complex, reproducible, shareable data analysis. Applying the principles of user experience design (UXD), has driven improvements in accessibility, tool discoverability through Galaxy Labs/subdomains, and a redesigned Galaxy ToolShed. Galaxy tool capabilities are progressing in two strategic directions: integrating general purpose graphical processing units (GPGPU) access for cutting-edge methods, and licensed tool support. Engagement with global research consortia is being increased by developing more workflows in Galaxy and by resourcing the public Galaxy services to run them. The Galaxy Training Network (GTN) portfolio has grown in both size, and accessibility, through learning paths and direct integration with Galaxy tools that feature in training courses. Code development continues in line with the Galaxy Project roadmap, with improvements to job scheduling and the user interface. Environmental impact assessment is also helping engage users and developers, reminding them of their role in sustainability, by displaying estimated CO2 emissions generated by each Galaxy job.
Collapse
|
2
|
OpenMS 3 enables reproducible analysis of large-scale mass spectrometry data. Nat Methods 2024; 21:365-367. [PMID: 38366242 DOI: 10.1038/s41592-024-02197-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2024]
|
3
|
deepFPlearn +: enhancing toxicity prediction across the chemical universe using graph neural networks. Bioinformatics 2023; 39:btad713. [PMID: 38011648 PMCID: PMC10724847 DOI: 10.1093/bioinformatics/btad713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 11/06/2023] [Accepted: 11/26/2023] [Indexed: 11/29/2023] Open
Abstract
SUMMARY Sophisticated approaches for the in silico prediction of toxicity are required to support the risk assessment of chemicals. The number of chemicals on the global chemical market and the speed of chemical innovation stand in massive contrast to the capacity for regularizing chemical use. We recently proved our ready-to-use application deepFPlearn as a suitable approach for this task. Here, we present its extension deepFPlearn+ incorporating (i) a graph neural network to feed our AI with a more sophisticated molecular structure representation and (ii) alternative train-test splitting strategies that involve scaffold structures and the molecular weights of chemicals. We show that the GNNs outperform the previous model substantially and that our models can generalize on unseen data even with a more robust and challenging test set. Therefore, we highly recommend the application of deepFPlearn+ on the chemical inventory to prioritize chemicals for experimental testing or any chemical subset of interest in monitoring studies. AVAILABILITY AND IMPLEMENTATION The software is compatible with python 3.6 or higher, and the source code can be found on our GitHub repository: https://github.com/yigbt/deepFPlearn. The data underlying this article are available in Zenodo, and can be accessed with the link below: https://zenodo.org/record/8146252. Detailed installation guides via Docker, Singularity, and Conda are provided within the repository for operability across all operating systems.
Collapse
|
4
|
Fully automated annotation of mitochondrial genomes using a cluster-based approach with de Bruijn graphs. Front Genet 2023; 14:1250907. [PMID: 37636259 PMCID: PMC10448254 DOI: 10.3389/fgene.2023.1250907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 07/24/2023] [Indexed: 08/29/2023] Open
Abstract
A wide range of scientific fields, such as forensics, anthropology, medicine, and molecular evolution, benefits from the analysis of mitogenomic data. With the development of new sequencing technologies, the amount of mitochondrial sequence data to be analyzed has increased exponentially over the last few years. The accurate annotation of mitochondrial DNA is a prerequisite for any mitogenomic comparative analysis. To sustain with the growth of the available mitochondrial sequence data, highly efficient automatic computational methods are, hence, needed. Automatic annotation methods are typically based on databases that contain information about already annotated (and often pre-curated) mitogenomes of different species. However, the existing approaches have several shortcomings: 1) they do not scale well with the size of the database; 2) they do not allow for a fast (and easy) update of the database; and 3) they can only be applied to a relatively small taxonomic subset of all species. Here, we present a novel approach that does not have any of these aforementioned shortcomings, (1), (2), and (3). The reference database of mitogenomes is represented as a richly annotated de Bruijn graph. To generate gene predictions for a new user-supplied mitogenome, the method utilizes a clustering routine that uses the mapping information of the provided sequence to this graph. The method is implemented in a software package called DeGeCI (De Bruijn graph Gene Cluster Identification). For a large set of mitogenomes, for which expert-curated annotations are available, DeGeCI generates gene predictions of high conformity. In a comparative evaluation with MITOS2, a state-of-the-art annotation tool for mitochondrial genomes, DeGeCI shows better database scalability while still matching MITOS2 in terms of result quality and providing a fully automated means to update the underlying database. Moreover, unlike MITOS2, DeGeCI can be run in parallel on several processors to make use of modern multi-processor systems.
Collapse
|
5
|
A Galaxy of informatics resources for MS-based proteomics. Expert Rev Proteomics 2023; 20:251-266. [PMID: 37787106 DOI: 10.1080/14789450.2023.2265062] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 09/06/2023] [Indexed: 10/04/2023]
Abstract
INTRODUCTION Continuous advances in mass spectrometry (MS) technologies have enabled deeper and more reproducible proteome characterization and a better understanding of biological systems when integrated with other 'omics data. Bioinformatic resources meeting the analysis requirements of increasingly complex MS-based proteomic data and associated multi-omic data are critically needed. These requirements included availability of software that would span diverse types of analyses, scalability for large-scale, compute-intensive applications, and mechanisms to ease adoption of the software. AREAS COVERED The Galaxy ecosystem meets these requirements by offering a multitude of open-source tools for MS-based proteomics analyses and applications, all in an adaptable, scalable, and accessible computing environment. A thriving global community maintains these software and associated training resources to empower researcher-driven analyses. EXPERT OPINION The community-supported Galaxy ecosystem remains a crucial contributor to basic biological and clinical studies using MS-based proteomics. In addition to the current status of Galaxy-based resources, we describe ongoing developments for meeting emerging challenges in MS-based proteomic informatics. We hope this review will catalyze increased use of Galaxy by researchers employing MS-based proteomics and inspire software developers to join the community and implement new tools, workflows, and associated training content that will add further value to this already rich ecosystem.
Collapse
|
6
|
Detecting gene breakpoints in noisy genome sequences using position-annotated colored de-Bruijn graphs. BMC Bioinformatics 2023; 24:235. [PMID: 37277700 DOI: 10.1186/s12859-023-05371-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 05/30/2023] [Indexed: 06/07/2023] Open
Abstract
BACKGROUND Identifying the locations of gene breakpoints between species of different taxonomic groups can provide useful insights into the underlying evolutionary processes. Given the exact locations of their genes, the breakpoints can be computed without much effort. However, often, existing gene annotations are erroneous, or only nucleotide sequences are available. Especially in mitochondrial genomes, high variations in gene orders are usually accompanied by a high degree of sequence inconsistencies. This makes accurately locating breakpoints in mitogenomic nucleotide sequences a challenging task. RESULTS This contribution presents a novel method for detecting gene breakpoints in the nucleotide sequences of complete mitochondrial genomes, taking into account possible high substitution rates. The method is implemented in the software package DeBBI. DeBBI allows to analyze transposition- and inversion-based breakpoints independently and uses a parallel program design, allowing to make use of modern multi-processor systems. Extensive tests on synthetic data sets, covering a broad range of sequence dissimilarities and different numbers of introduced breakpoints, demonstrate DeBBI 's ability to produce accurate results. Case studies using species of various taxonomic groups further show DeBBI 's applicability to real-life data. While (some) multiple sequence alignment tools can also be used for the task at hand, we demonstrate that especially gene breaks between short, poorly conserved tRNA genes can be detected more frequently with the proposed approach. CONCLUSION The proposed method constructs a position-annotated de-Bruijn graph of the input sequences. Using a heuristic algorithm, this graph is searched for particular structures, called bulges, which may be associated with the breakpoint locations. Despite the large size of these structures, the algorithm only requires a small number of graph traversal steps.
Collapse
|
7
|
The Planemo toolkit for developing, deploying, and executing scientific data analyses in Galaxy and beyond. Genome Res 2023; 33:261-268. [PMID: 36828587 PMCID: PMC10069471 DOI: 10.1101/gr.276963.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 01/11/2023] [Indexed: 02/26/2023]
Abstract
There are thousands of well-maintained high-quality open-source software utilities for all aspects of scientific data analysis. For more than a decade, the Galaxy Project has been providing computational infrastructure and a unified user interface for these tools to make them accessible to a wide range of researchers. To streamline the process of integrating tools and constructing workflows as much as possible, we have developed Planemo, a software development kit for tool and workflow developers and Galaxy power users. Here we outline Planemo's implementation and describe its broad range of functionality for designing, testing, and executing Galaxy tools, workflows, and training material. In addition, we discuss the philosophy underlying Galaxy tool and workflow development, and how Planemo encourages the use of development best practices, such as test-driven development, by its users, including those who are not professional software developers.
Collapse
|
8
|
AI for predicting chemical-effect associations at the chemical universe level-deepFPlearn. Brief Bioinform 2022; 23:6645490. [PMID: 35849097 PMCID: PMC9487703 DOI: 10.1093/bib/bbac257] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Revised: 05/17/2022] [Accepted: 06/02/2022] [Indexed: 11/20/2022] Open
Abstract
Many chemicals are present in our environment, and all living species are exposed to them. However, numerous chemicals pose risks, such as developing severe diseases, if they occur at the wrong time in the wrong place. For the majority of the chemicals, these risks are not known. Chemical risk assessment and subsequent regulation of use require efficient and systematic strategies. Lab-based methods—even if high throughput—are too slow to keep up with the pace of chemical innovation. Existing computational approaches are designed for specific chemical classes or sub-problems but not usable on a large scale. Further, the application range of these approaches is limited by the low amount of available labeled training data. We present the ready-to-use and stand-alone program deepFPlearn that predicts the association between chemical structures and effects on the gene/pathway level using a combined deep learning approach. deepFPlearn uses a deep autoencoder for feature reduction before training a deep feed-forward neural network to predict the target association. We received good prediction qualities and showed that our feature compression preserves relevant chemical structural information. Using a vast chemical inventory (unlabeled data) as input for the autoencoder did not reduce our prediction quality but allowed capturing a much more comprehensive range of chemical structures. We predict meaningful—experimentally verified—associations of chemicals and effects on unseen data. deepFPlearn classifies hundreds of thousands of chemicals in seconds. We provide deepFPlearn as an open-source and flexible tool that can be easily retrained and customized to different application settings at https://github.com/yigbt/deepFPlearn.
Collapse
|
9
|
The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic Acids Res 2022; 50:W345-W351. [PMID: 35446428 PMCID: PMC9252830 DOI: 10.1093/nar/gkac247] [Citation(s) in RCA: 250] [Impact Index Per Article: 125.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 03/17/2022] [Accepted: 03/30/2022] [Indexed: 01/19/2023] Open
Abstract
Galaxy is a mature, browser accessible workbench for scientific computing. It enables scientists to share, analyze and visualize their own data, with minimal technical impediments. A thriving global community continues to use, maintain and contribute to the project, with support from multiple national infrastructure providers that enable freely accessible analysis and training services. The Galaxy Training Network supports free, self-directed, virtual training with >230 integrated tutorials. Project engagement metrics have continued to grow over the last 2 years, including source code contributions, publications, software packages wrapped as tools, registered users and their daily analysis jobs, and new independent specialized servers. Key Galaxy technical developments include an improved user interface for launching large-scale analyses with many files, interactive tools for exploratory data analysis, and a complete suite of machine learning tools. Important scientific developments enabled by Galaxy include Vertebrate Genome Project (VGP) assembly workflows and global SARS-CoV-2 collaborations.
Collapse
|
10
|
Democratizing data-independent acquisition proteomics analysis on public cloud infrastructures via the Galaxy framework. Gigascience 2022; 11:6528772. [PMID: 35166338 PMCID: PMC8848309 DOI: 10.1093/gigascience/giac005] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 11/26/2021] [Accepted: 01/12/2022] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Data-independent acquisition (DIA) has become an important approach in global, mass spectrometric proteomic studies because it provides in-depth insights into the molecular variety of biological systems. However, DIA data analysis remains challenging owing to the high complexity and large data and sample size, which require specialized software and vast computing infrastructures. Most available open-source DIA software necessitates basic programming skills and covers only a fraction of a complete DIA data analysis. In consequence, DIA data analysis often requires usage of multiple software tools and compatibility thereof, severely limiting the usability and reproducibility. FINDINGS To overcome this hurdle, we have integrated a suite of open-source DIA tools in the Galaxy framework for reproducible and version-controlled data processing. The DIA suite includes OpenSwath, PyProphet, diapysef, and swath2stats. We have compiled functional Galaxy pipelines for DIA processing, which provide a web-based graphical user interface to these pre-installed and pre-configured tools for their use on freely accessible, powerful computational resources of the Galaxy framework. This approach also enables seamless sharing workflows with full configuration in addition to sharing raw data and results. We demonstrate the usability of an all-in-one DIA pipeline in Galaxy by the analysis of a spike-in case study dataset. Additionally, extensive training material is provided to further increase access for the proteomics community. CONCLUSION The integration of an open-source DIA analysis suite in the web-based and user-friendly Galaxy framework in combination with extensive training material empowers a broad community of researches to perform reproducible and transparent DIA data analysis.
Collapse
|
11
|
Expanding the Galaxy's reference data. BIOINFORMATICS ADVANCES 2022; 2:vbac030. [PMID: 35669346 PMCID: PMC9155181 DOI: 10.1093/bioadv/vbac030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 04/01/2022] [Accepted: 04/26/2022] [Indexed: 01/27/2023]
Abstract
Summary Properly and effectively managing reference datasets is an important task for many bioinformatics analyses. Refgenie is a reference asset management system that allows users to easily organize, retrieve and share such datasets. Here, we describe the integration of refgenie into the Galaxy platform. Server administrators are able to configure Galaxy to make use of reference datasets made available on a refgenie instance. In addition, a Galaxy Data Manager tool has been developed to provide a graphical interface to refgenie's remote reference retrieval functionality. A large collection of reference datasets has also been made available using the CVMFS (CernVM File System) repository from GalaxyProject.org, with mirrors across the USA, Canada, Europe and Australia, enabling easy use outside of Galaxy. Availability and implementation The ability of Galaxy to use refgenie assets was added to the core Galaxy framework in version 22.01, which is available from https://github.com/galaxyproject/galaxy under the Academic Free License version 3.0. The refgenie Data Manager tool can be installed via the Galaxy ToolShed, with source code managed at https://github.com/BlankenbergLab/galaxy-tools-blankenberg/tree/main/data_managers/data_manager_refgenie_pull and released using an MIT license. Access to existing data is also available through CVMFS, with instructions at https://galaxyproject.org/admin/reference-data-repo/. No new data were generated or analyzed in support of this research.
Collapse
|
12
|
Improved annotation of protein-coding genes boundaries in metazoan mitochondrial genomes. Nucleic Acids Res 2020; 47:10543-10552. [PMID: 31584075 PMCID: PMC6847864 DOI: 10.1093/nar/gkz833] [Citation(s) in RCA: 233] [Impact Index Per Article: 58.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2018] [Revised: 08/30/2019] [Accepted: 09/29/2019] [Indexed: 11/13/2022] Open
Abstract
With the rapid increase of sequenced metazoan mitochondrial genomes, a detailed manual annotation is becoming more and more infeasible. While it is easy to identify the approximate location of protein-coding genes within mitogenomes, the peculiar processing of mitochondrial transcripts, however, makes the determination of precise gene boundaries a surprisingly difficult problem. We have analyzed the properties of annotated start and stop codon positions in detail, and use the inferred patterns to devise a new method for predicting gene boundaries in de novo annotations. Our method benefits from empirically observed prevalances of start/stop codons and gene lengths, and considers the dependence of these features on variations of genetic codes. Albeit not being perfect, our new approach yields a drastic improvement in the accuracy of gene boundaries and upgrades the mitochondrial genome annotation server MITOS to an even more sophisticated tool for fully automatic annotation of metazoan mitochondrial genomes.
Collapse
|
13
|
Abstract
The diverse array of codon reassignments demonstrate that the genetic code is not universal in nature. Exploring mechanisms underlying codon reassignment is critical for understanding the evolution of the genetic code during translation. Hemichordata, comprising worm-like Enteropneusta and colonial filter-feeding Pterobranchia, is the sister taxon of echinoderms and is more distantly related to chordates. However, only a few hemichordate mitochondrial genomes have been sequenced, hindering our understanding of mitochondrial genome evolution within Deuterostomia. In this study, we sequenced four mitochondrial genomes and two transcriptomes, including representatives of both major hemichordate lineages and analyzed together with public available data. Contrary to the current understanding of the mitochondrial genetic code in hemichordates, our comparative analyses suggest that UAA encodes Tyr instead of a "Stop" codon in the pterobranch lineage Cephalodiscidae. We also predict that AAA encodes Lys in pterobranch and enteropneust mitochondrial genomes, contradicting the previous assumption that hemichordates share the same genetic code with echinoderms for which AAA encodes Asn. Thus, we propose a new mitochondrial genetic code for Cephalodiscus and a revised code for enteropneusts. Moreover, our phylogenetic analyses are largely consistent with previous phylogenomic studies. The only exception is the phylogenetic position of the enteropneust Stereobalanus, whose placement as sister to all other described enteropneusts. With broader taxonomic sampling, we provide evidence that evolution of mitochondrial gene order and genetic codes in Hemichordata are more dynamic than previously thought and these findings provide insights into mitochondrial genome evolution within this clade.
Collapse
|
14
|
An Exact Algorithm for Sorting by Weighted Preserving Genome Rearrangements. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:52-62. [PMID: 29994030 DOI: 10.1109/tcbb.2018.2831661] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The preserving Genome Sorting Problem (pGSP) asks for a shortest sequence of rearrangement operations that transforms a given gene order into another given gene order by using rearrangement operations that preserve common intervals, i.e., groups of genes that form an interval in both given gene orders. The wpGSP is the weighted version of the problem were each type of rearrangement operation has a weight and a minimum weight sequence of rearrangement operations is sought. An exact algorithm - called CREx2 - is presented, which solves the wpGSP for arbitrary gene orders and the following types of rearrangement operations: inversions, transpositions, inverse transpositions, and tandem duplication random loss operations. CREx2 has a (worst case) exponential runtime, but a linear runtime for problem instances where the common intervals are organized in a linear structure. The efficiency of CREx2 and its usefulness for phylogenetic analysis is shown empirically for gene orders of fungal mitochondrial genomes.
Collapse
|
15
|
Genome Rearrangement with ILP. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1585-1593. [PMID: 28574364 DOI: 10.1109/tcbb.2017.2708121] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The weighted Genome Sorting Problem (wGSP) is to find a minimum-weight sequence of rearrangement operations that transforms a given gene order into another given gene order using rearrangement operations that are associated with a predefined weight. This paper presents a polynomial sized Integer Linear Program -called GeRe-ILP- for solving the wGSP for the following three types of rearrangement operations: inversion , transposition, and inverse transposition. GeRe-ILP uses variables and constraints for gene orders of length . It is studied experimentally on simulated data how different weighting schemes influence the reconstructed scenarios. The influences of the length of the gene orders and of the size of the reconstructed scenarios on the runtime of GeRe-ILP are studied as well.
Collapse
|
16
|
EqualTDRL: illustrating equivalent tandem duplication random loss rearrangements. BMC Bioinformatics 2018; 19:192. [PMID: 29843612 PMCID: PMC5975268 DOI: 10.1186/s12859-018-2170-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Accepted: 04/30/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND To study the differences between two unichromosomal circular genomes, e.g., mitochondrial genomes, under the tandem duplication random loss (TDRL) rearrangement it is important to consider the whole set of potential TDRL rearrangement events that could have taken place. The reason is that for two given circular gene orders there can exist different TDRL rearrangements that transform one of the gene orders into the other. Hence, a TDRL event cannot always be reconstructed only from the knowledge of the circular gene order before a TDRL event and the circular gene order after it. RESULTS We present the program EqualTDRL that computes and illustrates the complete set of TDRLs for pairs of circular gene orders that differ by only one TDRL. EqualTDRL considers the circularity of the given genomes and certain restrictions on the TDRL rearrangements. Examples for the latter are sequences of genes that have to be conserved during a TDRL or pairs of genes that frame intergenic regions which might represent remnants of duplicated genes. Additionally, EqualTDRL allows to determine the set of TDRLs that are minimum with respect to the number of duplicated genes. CONCLUSION EqualTDRL supports scientists to study the complete set of TDRLs that possibly could have taken place in the evolution of mitochondrial genomes. EqualTDRL is implemented in C++ using the ggplot2 package of the open source programming language R and is freely available from http://pacosy.informatik.uni-leipzig.de/equaltdrl .
Collapse
|
17
|
Combinatorics of Tandem Duplication Random Loss Mutations on Circular Genomes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:83-95. [PMID: 28114075 DOI: 10.1109/tcbb.2016.2613522] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The tandem duplication random loss operation (TDRL) is an important genome rearrangement operation in metazoan mitochondrial genomes. A TDRL consists of a duplication of a contiguous set of genes in tandem followed by a random loss of one copy of each duplicated gene. This paper presents an analysis of the combinatorics of TDRLs on circular genomes, e.g., the mitochondrial genome. In particular, results on TDRLs for circular genomes and their linear representatives are established. Moreover, the distance between gene orders with respect to linear TDRLs and circular TDRLs is studied. An analysis of the available animal mitochondrial gene orders shows the practical relevance of the theoretical results.
Collapse
|
18
|
Accurate annotation of protein-coding genes in mitochondrial genomes. Mol Phylogenet Evol 2016; 106:209-216. [PMID: 27693569 DOI: 10.1016/j.ympev.2016.09.024] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Revised: 08/29/2016] [Accepted: 09/25/2016] [Indexed: 10/20/2022]
Abstract
Mitochondrial genome sequences are available in large number and new sequences become published nowadays with increasing pace. Fast, automatic, consistent, and high quality annotations are a prerequisite for downstream analyses. Therefore, we present an automated pipeline for fast de novo annotation of mitochondrial protein-coding genes. The annotation is based on enhanced phylogeny-aware hidden Markov models (HMMs). The pipeline builds taxon-specific enhanced multiple sequence alignments (MSA) of already annotated sequences and corresponding HMMs using an approximation of the phylogeny. The MSAs are enhanced by fixing unannotated frameshifts, purging of wrong sequences, and removal of non-conserved columns from both ends. A comparison with reference annotations highlights the high quality of the results. The frameshift correction method predicts a large number of frameshifts, many of which are unknown. A detailed analysis of the frameshifts in nad3 of the Archosauria-Testudines group has been conducted.
Collapse
|
19
|
Cophylogenetic Reconciliation with ILP. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:1227-1235. [PMID: 26671795 DOI: 10.1109/tcbb.2015.2430336] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper, we present an integer linear programming (ILP) approach, called CoRe-ILP, for finding an optimal time consistent cophylogenetic host-parasite reconciliation under the cophylogenetic event model with the events cospeciation, duplication, sorting, host switch, and failure to diverge. Instead of assuming event costs, a simplified model is used, maximizing primarily for cospeciations and secondarily minimizing host switching events. Duplications, sortings, and failure to diverge events are not explicitly scored. Different from existing event based reconciliation methods, CoRe-ILP can use (approximate) phylogenetic branch lengths for filtering possible ancestral host-parasite interactions. Experimentally, it is shown that CoRe-ILP can successfully use branch length information and performs well for biological and simulated data sets. The results of CoRe-ILP are compared with the results of the reconciliation tools Jane 4, Treemap 3b, NOTUNG 2.8 Beta, and Ranger-DTL. Algorithm CoRe-ILP is implemented using IBM ILOG CPLEX Optimizer 12.6 and is freely available from http://pacosy.informatik.uni-leipzig.de/core-ilp.
Collapse
|
20
|
Towards a comprehensive picture of alloacceptor tRNA remolding in metazoan mitochondrial genomes. Nucleic Acids Res 2015; 43:8044-56. [PMID: 26227972 PMCID: PMC4783518 DOI: 10.1093/nar/gkv746] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2015] [Accepted: 07/11/2015] [Indexed: 12/03/2022] Open
Abstract
Remolding of tRNAs is a well-documented process in mitochondrial genomes that changes the identity of a tRNA. It involves a duplication of a tRNA gene, a mutation that changes the anticodon and the loss of the ancestral tRNA gene. The net effect is a functional tRNA that is more closely related to tRNAs of a different alloacceptor family than to tRNAs with the same anticodon in related species. Beyond being of interest for understanding mitochondrial tRNA function and evolution, tRNA remolding events can lead to artifacts in the annotation of mitogenomes and thus in studies of mitogenomic evolution. Therefore, it is important to identify and catalog these events. Here we describe novel methods to detect tRNA remolding in large-scale data sets and apply them to survey tRNA remolding throughout animal evolution. We identify several novel remolding events in addition to the ones previously mentioned in the literature. A detailed analysis of these remoldings showed that many of them are derived from ancestral events.
Collapse
|
21
|
Local similarity search to find gene indicators in mitochondrial genomes. BIOLOGY 2014; 3:220-242. [PMID: 24833343 PMCID: PMC4009762 DOI: 10.3390/biology3010220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Revised: 02/15/2014] [Accepted: 02/18/2014] [Indexed: 06/03/2023]
Abstract
Given a set of nucleotide sequences we consider the problem of identifying conserved substrings occurring in homologous genes in a large number of sequences. The problem is solved by identifying certain nodes in a suffix tree containing all substrings occurring in the given nucleotide sequences. Due to the large size of the targeted data set, our approach employs a truncated version of suffix trees. Two methods for this task are introduced: (1) The annotation guided marker detection method uses gene annotations which might contain a moderate number of errors; (2) The probability based marker detection method determines sequences that appear significantly more often than expected. The approach is successfully applied to the mitochondrial nucleotide sequences, and the corresponding annotations that are available in RefSeq for 2989 metazoan species. We demonstrate that the approach finds appropriate substrings.
Collapse
|
22
|
Bioinformatics methods for the comparative analysis of metazoan mitochondrial genome sequences. Mol Phylogenet Evol 2013; 69:320-7. [DOI: 10.1016/j.ympev.2012.09.019] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2012] [Revised: 08/31/2012] [Accepted: 09/17/2012] [Indexed: 01/25/2023]
|
23
|
Mitogenomics at the base of Metazoa. Mol Phylogenet Evol 2013; 69:339-51. [PMID: 23891951 DOI: 10.1016/j.ympev.2013.07.016] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2013] [Revised: 05/29/2013] [Accepted: 07/09/2013] [Indexed: 11/25/2022]
Abstract
Unraveling the base of metazoan evolution is of crucial importance for rooting the metazoan Tree of Life. This subject has attracted substantial attention for more than a century and recently fueled a burst of modern phylogenetic studies. Conflicting scenarios from different studies and incongruent results from nuclear versus mitochondrial markers challenge current molecular phylogenetic approaches. Here we analyze the presently most comprehensive data sets of mitochondrial genomes from non-bilaterian animals to illuminate the phylogenetic relationships among early branching metazoan phyla. The results of our analyses illustrate the value of mitogenomics and support previously known topologies between animal phyla but also identify several problematic taxa, which are sensitive to long branch artifacts or missing data.
Collapse
|
24
|
A comprehensive analysis of bilaterian mitochondrial genomes and phylogeny. Mol Phylogenet Evol 2013; 69:352-64. [PMID: 23684911 DOI: 10.1016/j.ympev.2013.05.002] [Citation(s) in RCA: 153] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2012] [Revised: 04/27/2013] [Accepted: 05/03/2013] [Indexed: 12/16/2022]
Abstract
About 2800 mitochondrial genomes of Metazoa are present in NCBI RefSeq today, two thirds belonging to vertebrates. Metazoan phylogeny was recently challenged by large scale EST approaches (phylogenomics), stabilizing classical nodes while simultaneously supporting new sister group hypotheses. The use of mitochondrial data in deep phylogeny analyses was often criticized because of high substitution rates on nucleotides, large differences in amino acid substitution rate between taxa, and biases in nucleotide frequencies. Nevertheless, mitochondrial genome data might still be promising as it allows for a larger taxon sampling, while presenting a smaller amount of sequence information. We present the most comprehensive analysis of bilaterian relationships based on mitochondrial genome data. The analyzed data set comprises more than 650 mitochondrial genomes that have been chosen to represent a profound sample of the phylogenetic as well as sequence diversity. The results are based on high quality amino acid alignments obtained from a complete reannotation of the mitogenomic sequences from NCBI RefSeq database. However, the results failed to give support for many otherwise undisputed high-ranking taxa, like Mollusca, Hexapoda, Arthropoda, and suffer from extreme long branches of Nematoda, Platyhelminthes, and some other taxa. In order to identify the sources of misleading phylogenetic signals, we discuss several problems associated with mitochondrial genome data sets, e.g. the nucleotide and amino acid landscapes and a strong correlation of gene rearrangements with long branches.
Collapse
|
25
|
Genetic aspects of mitochondrial genome evolution. Mol Phylogenet Evol 2012; 69:328-38. [PMID: 23142697 DOI: 10.1016/j.ympev.2012.10.020] [Citation(s) in RCA: 155] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2012] [Revised: 10/20/2012] [Accepted: 10/22/2012] [Indexed: 11/30/2022]
Abstract
Many years of extensive studies of metazoan mitochondrial genomes have established differences in gene arrangements and genetic codes as valuable phylogenetic markers. Understanding the underlying mechanisms of replication, transcription and the role of the control regions which cause e.g. different gene orders is important to assess the phylogenetic signal of such events. This review summarises and discusses, for the Metazoa, the general aspects of mitochondrial transcription and replication with respect to control regions as well as several proposed models of gene rearrangements. As whole genome sequencing projects accumulate, more and more observations about mitochondrial gene transfer to the nucleus are reported. Thus occurrence and phylogenetic aspects concerning nuclear mitochondrial-like sequences (NUMTS) is another aspect of this review.
Collapse
|
26
|
MITOS: improved de novo metazoan mitochondrial genome annotation. Mol Phylogenet Evol 2012; 69:313-9. [PMID: 22982435 DOI: 10.1016/j.ympev.2012.08.023] [Citation(s) in RCA: 3071] [Impact Index Per Article: 255.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2012] [Revised: 07/11/2012] [Accepted: 08/27/2012] [Indexed: 11/16/2022]
Abstract
About 2000 completely sequenced mitochondrial genomes are available from the NCBI RefSeq data base together with manually curated annotations of their protein-coding genes, rRNAs, and tRNAs. This annotation information, which has accumulated over two decades, has been obtained with a diverse set of computational tools and annotation strategies. Despite all efforts of manual curation it is still plagued by misassignments of reading directions, erroneous gene names, and missing as well as false positive annotations in particular for the RNA genes. Taken together, this causes substantial problems for fully automatic pipelines that aim to use these data comprehensively for studies of animal phylogenetics and the molecular evolution of mitogenomes. The MITOS pipeline is designed to compute a consistent de novo annotation of the mitogenomic sequences. We show that the results of MITOS match RefSeq and MitoZoa in terms of annotation coverage and quality. At the same time we avoid biases, inconsistencies of nomenclature, and typos originating from manual curation strategies. The MITOS pipeline is accessible online at http://mitos.bioinf.uni-leipzig.de.
Collapse
|
27
|
Mitochondrial genome evolution in species belonging to the Phialocephala fortinii s.l. - Acephala applanata species complex. BMC Genomics 2012; 13:166. [PMID: 22559219 PMCID: PMC3434094 DOI: 10.1186/1471-2164-13-166] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2011] [Accepted: 05/04/2012] [Indexed: 01/01/2023] Open
Abstract
Background Mitochondrial (mt) markers are successfully applied in evolutionary biology and systematics because mt genomes often evolve faster than the nuclear genomes. In addition, they allow robust phylogenetic analysis based on conserved proteins of the oxidative phosphorylation system. In the present study we sequenced and annotated the complete mt genome of P. subalpina, a member of the Phialocephala fortinii s.l. – Acephala applanata species complex (PAC). PAC belongs to the Helotiales, which is one of the most diverse groups of ascomycetes including more than 2,000 species. The gene order was compared to deduce the mt genome evolution in the Pezizomycotina. Genetic variation in coding and intergenic regions of the mtDNA was studied for PAC to assess the usefulness of mt DNA for species diagnosis. Results The mt genome of P. subalpina is 43,742 bp long and codes for 14 mt genes associated with the oxidative phosphorylation. In addition, a GIY-YIG endonuclease, the ribosomal protein S3 (Rps3) and a putative N-acetyl-transferase were recognized. A complete set of tRNA genes as well as the large and small rRNA genes but no introns were found. All protein-coding genes were confirmed by EST sequences. The gene order in P. subalpina deviated from the gene order in Sclerotinia sclerotiorum, the only other helotialean species with a fully sequenced and annotated mt genome. Gene order analysis within Pezizomycotina suggests that the evolution of gene orders is mostly driven by transpositions. Furthermore, sequence diversity in coding and non-coding mtDNA regions in seven additional PAC species was pronounced and allowed for unequivocal species diagnosis in PAC. Conclusions The combination of non-interrupted ORFs and EST sequences resulted in a high quality annotation of the mt genome of P. subalpina, which can be used as a reference for the annotation of other mt genomes in the Helotiales. In addition, our analyses show that mtDNA loci will be the marker of choice for future analysis of PAC communities.
Collapse
|
28
|
Improved systematic tRNA gene annotation allows new insights into the evolution of mitochondrial tRNA structures and into the mechanisms of mitochondrial genome rearrangements. Nucleic Acids Res 2011; 40:2833-45. [PMID: 22139921 PMCID: PMC3326299 DOI: 10.1093/nar/gkr1131] [Citation(s) in RCA: 166] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Transfer RNAs (tRNAs) are present in all types of cells as well as in organelles. tRNAs of animal mitochondria show a low level of primary sequence conservation and exhibit 'bizarre' secondary structures, lacking complete domains of the common cloverleaf. Such sequences are hard to detect and hence frequently missed in computational analyses and mitochondrial genome annotation. Here, we introduce an automatic annotation procedure for mitochondrial tRNA genes in Metazoa based on sequence and structural information in manually curated covariance models. The method, applied to re-annotate 1876 available metazoan mitochondrial RefSeq genomes, allows to distinguish between remaining functional genes and degrading 'pseudogenes', even at early stages of divergence. The subsequent analysis of a comprehensive set of mitochondrial tRNA genes gives new insights into the evolution of structures of mitochondrial tRNA sequences as well as into the mechanisms of genome rearrangements. We find frequent losses of tRNA genes concentrated in basal Metazoa, frequent independent losses of individual parts of tRNA genes, particularly in Arthropoda, and wide-spread conserved overlaps of tRNAs in opposite reading direction. Direct evidence for several recent Tandem Duplication-Random Loss events is gained, demonstrating that this mechanism has an impact on the appearance of new mitochondrial gene orders.
Collapse
|
29
|
Abstract
Background Changes in the order of mitochondrial genes are a good source of information for phylogenetic investigations. Phylogenetic hypotheses are often supported by parsimonious mitochondrial gene order rearrangement scenarios. CREx is a heuristic for computing short pairwise rearrangement scenarios for metazoan mitochondrial gene orders. Different from other methods, CREx considers four types of rearrangement operations: inversions, transpositions, inverse transpositions, and tandem duplication random loss operations. Results An extensive analysis of the CREx reconstructions for artificial data has been presented and it is shown how the quality of the reconstructed rearrangement scenarios depends on the type of rearrangement model and additional parameter values. Moreover, a fast method is proposed to apply CREx to a large number of gene orders to find likely rearrangement scenarios and store them in a graph structure called RI-Graph. This method is applied to analyse all known metazoan mitochondrial gene orders. It is shown that the obtained RI-Graph contains many rearrangement scenarios that are described in the literature. Conclusions The prospects and limitations of CREx have been analysed empirically and a comparison with the literature on gene order evolution highlights its benefits. The newly developed method to apply CREx to a large number of gene orders is successful in computing an RI-graph that contains many rearrangement scenarios for metazoan gene orders that have also been described in the literature. This shows that the new method is very helpful for a fast analysis of a large number of gene orders which is relevant due to the strongly increasing number of known gene orders.
Collapse
|
30
|
|
31
|
Is it, or is not? The conceptualisation of gentrification and displacement and its political implications in the case of Berlin‐Prenzlauer Berg. ACTA ACUST UNITED AC 2010. [DOI: 10.1080/13604810902982268] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
32
|
Solving the preserving reversal median problem. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2008; 5:332-347. [PMID: 18670038 DOI: 10.1109/tcbb.2008.39] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Genomic rearrangement operations can be very useful to infer the phylogenetic relationship of gene orders representing species. We study the problem of finding potential ancestral gene orders for the gene orders of given taxa, such that the corresponding rearrangement scenario has a minimal number of reversals, and where each of the reversals has to preserve the common intervals of the given input gene orders. Common intervals identify sets of genes that occur consecutively in all input gene orders. The problem of finding such an ancestral gene order is called the preserving reversal median problem (pRMP). A tree-based data structure for the representation of the common intervals of all input gene orders is used in our exact algorithm TCIP for solving the pRMP. It is known that the minimum number of reversals to transform one gene order into another can be computed in polynomial time, whereas the corresponding problem with the restriction that common intervals should not be destroyed is already NP-hard. It is shown theoretically that TCIP can solve a large class of pRMP instances in polynomial time. Empirically we show the good performance of TCIP on biological and artificial data.
Collapse
|
33
|
Evolution of mitochondrial gene orders in echinoderms. Mol Phylogenet Evol 2007; 47:855-64. [PMID: 18280182 DOI: 10.1016/j.ympev.2007.11.034] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2007] [Revised: 11/27/2007] [Accepted: 11/29/2007] [Indexed: 11/28/2022]
Abstract
A comprehensive analysis of the mitochondrial gene orders of all previously published and two novel Antedon mediterranea (Crinoidea) and Ophiura albida (Ophiuroidea) complete echinoderm mitochondrial genomes shows that all major types of rearrangement operations are necessary to explain the evolution of mitochondrial genomes. In addition to protein coding genes we include all tRNA genes as well as the control region in our analysis. Surprisingly, 7 of the 16 genomes published in the GenBank database contain misannotations, mostly unannotated tRNAs and/or mistakes in the orientation of tRNAs, which we have corrected here. Although the gene orders of mt genomes appear very different, only 8 events are necessary to explain the evolutionary history of echinoderms with the exception of the ophiuroids. Only two of these rearrangements are inversions, while we identify three tandem-duplication-random-loss events and three transpositions.
Collapse
|
34
|
Abstract
SUMMARY We present the web-based program CREx for heuristically determining pairwise rearrangement events in unichromosomal genomes. CREx considers transpositions, reverse transpositions, reversals and tandem-duplication-random-loss (TDRL) events. It supports the user in finding parsimonious rearrangement scenarios given a phylogenetic hypothesis. CREx is based on common intervals, which reflect genes that appear consecutively in several of the input gene orders. AVAILABILITY CREx is freely available at http://pacosy.informatik.uni-leipzig.de/crex
Collapse
|
35
|
Abstract
MOTIVATION Algorithms for phylogenetic tree reconstruction based on gene order data typically repeatedly solve instances of the reversal median problem (RMP) which is to find for three given gene orders a fourth gene order (called median) with a minimal sum of reversal distances. All existing algorithms of this type consider only one median for each RMP instance even when a large number of medians exist. A careful selection of one of the medians might lead to better phylogenetic trees. RESULTS We propose a heuristic algorithm amGRP for solving the multiple genome rearrangement problem (MGRP) by repeatedly solving instances of the RMP taking all medians into account. Algorithm amGRP uses a branch-and-bound method that branches over medians from a selected subset of all medians for each RMP instance. Different heuristics for selecting the subsets have been investigated. To show that the medians for RMP vary strongly with respect to different properties that are likely to be relevant for phylogenetic tree reconstruction, the set of all medians has been investigated for artificial datasets and mitochondrial DNA (mtDNA) gene orders. Phylogenetic trees have been computed for a large set of randomly generated gene orders and two sets of mtDNA gene order data for different animal taxa with amGRP and with two standard approaches for solving the MGRP (GRAPPA-DCM and MGR). The results show that amGRP outperforms both other methods with respect to solution quality and computation time on the test data. AVAILABILITY The source code of amGRP, additional results and the test instances used in this paper are freely available from the authors.
Collapse
|
36
|
Genome rearrangement based on reversals that preserve conserved intervals. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2006; 3:275-88. [PMID: 17048465 DOI: 10.1109/tcbb.2006.38] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
The order of genes in the genomes of species can change during evolution and can provide information about their phylogenetic relationship. An interesting method to infer the phylogenetic relationship from the gene orders is to use different types of rearrangement operations and to find possible rearrangement scenarios using these operations. One of the most common rearrangement operations is reversals, which reverse the order of a subset of neighbored genes. In this paper, we study the problem to find the ancestral gene order for three species represented by their gene orders. The rearrangement scenario should use a minimal number of reversals and no other rearrangement operations. This problem is called the Median problem and is known to be NP-complete. In this paper, we describe a heuristic algorithm for finding solutions to the Median problem that searches for rearrangement scenarios with the additional property that gene groups should not be destroyed by reversal operations. The concept of conserved intervals for signed permutations is used to describe such gene groups. We show experimentally, for different types of test problems, that the proposed algorithm produces very good results compared to other algorithms for the Median problem. We also integrate our reversal selection procedure into the well-known MGR and GRAPPA algorithms and show that they achieve a significant speedup while obtaining solutions of the same quality as the original algorithms on the test problems.
Collapse
|