Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Doerr D, Thévenin A, Stoye J. Gene family assignment-free comparative genomics. BMC Bioinformatics 2012;13 Suppl 19:S3. [PMID: 23281826 PMCID: PMC3526435 DOI: 10.1186/1471-2105-13-s19-s3] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open

For:	Doerr D, Thévenin A, Stoye J. Gene family assignment-free comparative genomics. BMC Bioinformatics 2012;13 Suppl 19:S3. [PMID: 23281826 PMCID: PMC3526435 DOI: 10.1186/1471-2105-13-s19-s3] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open

Number

Cited by Other Article(s)

Braga MDV, Doerr D, Rubert DP, Stoye J. Family-Free Genome Comparison. Methods Mol Biol 2024;2802:57-72. [PMID: 38819556 DOI: 10.1007/978-1-0716-3838-5_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]

Rubert DP, Doerr D, Braga MDV. The potential of family-free rearrangements towards gene orthology inference. J Bioinform Comput Biol 2021;19:2140014. [PMID: 34775922 DOI: 10.1142/s021972002140014x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

Rubert DP, Martinez FV, Braga MDV. Natural family-free genomic distance. Algorithms Mol Biol 2021;16:4. [PMID: 33971908 PMCID: PMC8111734 DOI: 10.1186/s13015-021-00183-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Accepted: 04/13/2021] [Indexed: 11/13/2022] Open

Abstract

BACKGROUND

A classical problem in comparative genomics is to compute the rearrangement distance, that is the minimum number of large-scale rearrangements required to transform a given genome into another given genome. The traditional approaches in this area are family-based, i.e., require the classification of DNA fragments of both genomes into families. Furthermore, the most elementary family-based models, which are able to compute distances in polynomial time, restrict the families to occur at most once in each genome. In contrast, the distance computation in models that allow multifamilies (i.e., families with multiple occurrences) is NP-hard. Very recently, Bohnenkämper et al. (J Comput Biol 28:410-431, 2021) proposed an ILP formulation for computing the genomic distance of genomes with multifamilies, allowing structural rearrangements, represented by the generic double cut and join (DCJ) operation, and content-modifying insertions and deletions of DNA segments. This ILP is very efficient, but must maximize a matching of the genes in each multifamily, in order to prevent the free lunch artifact that would otherwise let empty or almost empty matchings give smaller distances.

RESULTS

In this paper, we adopt the alternative family-free setting that, instead of family classification, simply uses the pairwise similarities between DNA fragments of both genomes to compute their rearrangement distance. We adapted the ILP mentioned above and developed a model in which pairwise similarities are used to assign weights to both matched and unmatched genes, so that an optimal solution does not necessarily maximize the matching. Our model then results in a natural family-free genomic distance, that takes into consideration all given genes, without prior classification into families, and has a search space composed of matchings of any size. In spite of its bigger search space, our ILP seems to be boosted by a reduction of the number of co-optimal solutions due to the weights. Indeed, it converged faster than the original one by Bohnenkämper et al. for instances with the same number of multiple connections. We can handle not only bacterial genomes, but also fungi and insects, or sets of chromosomes of mammals and plants. In a comparison study of six fruit fly genomes, we obtained accurate results.

Collapse

Gene Phylogenies and Orthologous Groups. Methods Mol Biol 2018. [PMID: 29277861 DOI: 10.1007/978-1-4939-7463-4_1] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]

Rubert DP, Hoshino EA, Braga MDV, Stoye J, Martinez FV. Computing the family-free DCJ similarity. BMC Bioinformatics 2018;19:152. [PMID: 29745861 PMCID: PMC5998916 DOI: 10.1186/s12859-018-2130-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Doerr D, Feijão P, Stoye J. Family-Free Genome Comparison. Methods Mol Biol 2018;1704:331-342. [PMID: 29277872 DOI: 10.1007/978-1-4939-7463-4_12] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]

Doerr D, Kowada LAB, Araujo E, Deshpande S, Dantas S, Moret BME, Stoye J. New Genome Similarity Measures based on Conserved Gene Adjacencies. J Comput Biol 2017;24:616-634. [PMID: 28590847 DOI: 10.1089/cmb.2017.0065] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Abstract

Many important questions in molecular biology, evolution, and biomedicine can be addressed by comparative genomic approaches. One of the basic tasks when comparing genomes is the definition of measures of similarity (or dissimilarity) between two genomes, for example, to elucidate the phylogenetic relationships between species. The power of different genome comparison methods varies with the underlying formal model of a genome. The simplest models impose the strong restriction that each genome under study must contain the same genes, each in exactly one copy. More realistic models allow several copies of a gene in a genome. One speaks of gene families, and comparative genomic methods that allow this kind of input are called gene family-based. The most powerful-but also most complex-models avoid this preprocessing of the input data and instead integrate the family assignment within the comparative analysis. Such methods are called gene family-free. In this article, we study an intermediate approach between family-based and family-free genomic similarity measures. Introducing this simpler model, called gene connections, we focus on the combinatorial aspects of gene family-free genome comparison. While in most cases, the computational costs to the general family-free case are the same, we also find an instance where the gene connections model has lower complexity. Within the gene connections model, we define three variants of genomic similarity measures that have different expression powers. We give polynomial-time algorithms for two of them, while we show NP-hardness for the third, most powerful one. We also generalize the measures and algorithms to make them more robust against recent local disruptions in gene order. Our theoretical findings are supported by experimental results, proving the applicability and performance of our newly defined similarity measures.

Collapse

Doerr D, Balaban M, Feijão P, Chauve C. The gene family-free median of three. Algorithms Mol Biol 2017;12:14. [PMID: 28559921 PMCID: PMC5446766 DOI: 10.1186/s13015-017-0106-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2017] [Accepted: 05/18/2017] [Indexed: 11/20/2022] Open

Abstract

Background

The gene family-free framework for comparative genomics aims at providing methods for gene order analysis that do not require prior gene family assignment, but work directly on a sequence similarity graph. We study two problems related to the breakpoint median of three genomes, which asks for the construction of a fourth genome that minimizes the sum of breakpoint distances to the input genomes.

Methods

We present a model for constructing a median of three genomes in this family-free setting, based on maximizing an objective function that generalizes the classical breakpoint distance by integrating sequence similarity in the score of a gene adjacency. We study its computational complexity and we describe an integer linear program (ILP) for its exact solution. We further discuss a related problem called family-free adjacencies for k genomes for the special case of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k \le 3$$\end{document}k≤3 and present an ILP for its solution. However, for this problem, the computation of exact solutions remains intractable for sufficiently large instances. We then proceed to describe a heuristic method, FFAdj-AM, which performs well in practice.

Results

The developed methods compute accurate positional orthologs for genomes comparable in size of bacterial genomes on simulated data and genomic data acquired from the OMA orthology database. In particular, FFAdj-AM performs equally or better when compared to the well-established gene family prediction tool MultiMSOAR.

Conclusions

We study the computational complexity of a new family-free model and present algorithms for its solution. With FFAdj-AM, we propose an appealing alternative to established tools for identifying higher confidence positional orthologs.

Electronic supplementary material

The online version of this article (doi:10.1186/s13015-017-0106-z) contains supplementary material, which is available to authorized users.

Collapse

Genome-wide identification and characterization of Fox genes in the silkworm, Bombyx mori. Funct Integr Genomics 2015;15:511-22. [DOI: 10.1007/s10142-015-0440-5] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2014] [Revised: 03/25/2015] [Accepted: 04/06/2015] [Indexed: 12/28/2022]

Martinez FV, Feijão P, Braga MDV, Stoye J. On the family-free DCJ distance and similarity. Algorithms Mol Biol 2015;10:13. [PMID: 25859276 PMCID: PMC4391664 DOI: 10.1186/s13015-015-0041-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2015] [Accepted: 03/13/2015] [Indexed: 11/10/2022] Open

Lechner M, Hernandez-Rosales M, Doerr D, Wieseke N, Thévenin A, Stoye J, Hartmann RK, Prohaska SJ, Stadler PF. Orthology detection combining clustering and synteny for very large datasets. PLoS One 2014;9:e105015. [PMID: 25137074 PMCID: PMC4138177 DOI: 10.1371/journal.pone.0105015] [Citation(s) in RCA: 73] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2014] [Accepted: 07/14/2014] [Indexed: 11/18/2022] Open

Affiliation(s)

Marcus Lechner Institut für Pharmazeutische Chemie, Philipps-Universität Marburg, Marburg, Germany * E-mail:
Maribel Hernandez-Rosales Bioinformatics Group, Department of Computer Science, Universität Leipzig, Leipzig, Germany Interdisciplinary Center for Bioinformatics, Universität Leipzig, Leipzig, Germany Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany Departamento de Ciência da Computação, Instituto de Ciências Exatas, Universidade de Brasília, Brasília, Brasil
Daniel Doerr Genome Informatics, Faculty of Technology, Bielefeld University, Bielefeld, Germany Institute for Bioinformatics, Center for Biotechnology, Bielefeld University, Bielefeld, Germany
Nicolas Wieseke Faculty of Mathematics and Computer Science University of Leipzig, Leipzig, Germany
Annelyse Thévenin Genome Informatics, Faculty of Technology, Bielefeld University, Bielefeld, Germany Institute for Bioinformatics, Center for Biotechnology, Bielefeld University, Bielefeld, Germany
Jens Stoye Genome Informatics, Faculty of Technology, Bielefeld University, Bielefeld, Germany Institute for Bioinformatics, Center for Biotechnology, Bielefeld University, Bielefeld, Germany
Roland K. Hartmann Institut für Pharmazeutische Chemie, Philipps-Universität Marburg, Marburg, Germany
Sonja J. Prohaska Computational EvoDevo Group, Department of Computer Science, Universität Leipzig, Leipzig, Germany
Peter F. Stadler Bioinformatics Group, Department of Computer Science, Universität Leipzig, Leipzig, Germany Interdisciplinary Center for Bioinformatics, Universität Leipzig, Leipzig, Germany Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria Center for non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark The Santa Fe Institute, Santa Fe, New Mexico, United States of America RNomics Group, Fraunhofer Institut for Cell Therapy and Immunology, Leipzig, Germany

Collapse

Chauve C, El-Mabrouk N, Guéguen L, Semeria M, Tannier E. Duplication, Rearrangement and Reconciliation: A Follow-Up 13 Years Later. MODELS AND ALGORITHMS FOR GENOME EVOLUTION 2013. [DOI: 10.1007/978-1-4471-5298-9_4] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]