1
|
Ouedraogo WYDD, Ouangraoua A. Orthology and Paralogy Relationships at Transcript Level. J Comput Biol 2024; 31:277-293. [PMID: 38621191 DOI: 10.1089/cmb.2023.0400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2024] Open
Abstract
Eukaryotic genes undergo a mechanism called alternative processing, resulting in transcriptome diversity by allowing the production of multiple distinct transcripts from a gene. More than half of human genes are affected, and the resulting transcripts are highly conserved among orthologous genes of distinct species. In this work, we present the definition of orthology and paralogy between transcripts of homologous genes, together with an algorithm to compute clusters of conserved orthologous and paralogous transcripts. Gene-level homology relationships are utilized to define various types of homology relationships between transcripts originating from the same ancestral transcript. A Reciprocal Best Hits approach is employed to infer clusters of isoorthologous and recent paralogous transcripts. We applied this method to transcripts from simulated gene families as well as real gene families from the Ensembl-Compara database. The results are consistent with those from previous studies that compared orthologous gene transcripts. Furthermore, our findings provide evidence that searching for conserved transcripts between homologous genes, beyond the scope of orthologous genes, is likely to yield valuable information.
Collapse
Affiliation(s)
| | - Aida Ouangraoua
- Department of Computer Science, Université de Sherbrooke, Sherbrooke, Quebec, Canada
| |
Collapse
|
2
|
Özçelik E, Kuru N, Adebali O. Phylostat: a web-based tool to analyze paralogous clade divergence in phylogenetic trees. Turk J Biol 2022; 45:667-673. [PMID: 35068947 PMCID: PMC8733950 DOI: 10.3906/biy-2105-18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 09/29/2021] [Indexed: 11/03/2022] Open
Abstract
Phylogenetic trees are useful tools to infer evolutionary relationships between genetic entities. Phylogenetics enables not only evolution-based gene clustering but also the assignment of gene duplication and deletion events to the nodes when coupled with statistical approaches such as bootstrapping. However, extensive gene duplication and deletion events bring along a challenge in interpreting phylogenetic trees and require manual inference. In particular, there has been no robust method of determining whether one of the paralog clades systematically shows higher divergence following the gene duplication event as a sign of functional divergence. Here, we provide Phylostat, a graphical user interface that enables clade divergence analysis, visually and statistically. Phylostat is a web-based tool built on phylo.io to allow comparative clade divergence analysis, which is available at https://phylostat.adebalilab.org under an MIT open-source licence.
Collapse
Affiliation(s)
- Elif Özçelik
- Molecular Biology, Genetic and Bioengineering, Faculty of Engineering and Natural Sciences, Sabancı University, İstanbul Turkey
| | - Nurdan Kuru
- Molecular Biology, Genetic and Bioengineering, Faculty of Engineering and Natural Sciences, Sabancı University, İstanbul Turkey
| | - Ogün Adebali
- Molecular Biology, Genetic and Bioengineering, Faculty of Engineering and Natural Sciences, Sabancı University, İstanbul Turkey
| |
Collapse
|
3
|
Sánchez AL, Lafond M. Colorful orthology clustering in bounded-degree similarity graphs. J Bioinform Comput Biol 2021; 19:2140010. [PMID: 34775924 DOI: 10.1142/s0219720021400102] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Clustering genes in similarity graphs is a popular approach for orthology prediction. Most algorithms group genes without considering their species, which results in clusters that contain several paralogous genes. Moreover, clustering is known to be problematic when in-paralogs arise from ancient duplications. Recently, we proposed a two-step process that avoids these problems. First, we infer clusters of only orthologs (i.e. with only genes from distinct species), and second, we infer the missing inter-cluster orthologs. In this paper, we focus on the first step, which leads to a problem we call Colorful Clustering. In general, this is as hard as classical clustering. However, in similarity graphs, the number of species is usually small, as well as the neighborhood size of genes in other species. We therefore study the problem of clustering in which the number of colors is bounded by [Formula: see text], and each gene has at most [Formula: see text] neighbors in another species. We show that the well-known cluster editing formulation remains NP-hard even when [Formula: see text] and [Formula: see text]. We then propose a fixed-parameter algorithm in [Formula: see text] to find the single best cluster in the graph. We implemented this algorithm and included it in the aforementioned two-step approach. Experiments on simulated data show that this approach performs favorably to applying only an unconstrained clustering step.
Collapse
Affiliation(s)
- Alitzel López Sánchez
- Computer Science Department, Université de Sherbrooke, 2500 Boulevard de l'Université, Sherbrooke, Québec J1K 2R1, Canada
| | - Manuel Lafond
- Computer Science Department, Université de Sherbrooke, 2500 Boulevard de l'Université, Sherbrooke, Québec J1K 2R1, Canada
| |
Collapse
|
4
|
Begum T, Serrano‐Serrano ML, Robinson‐Rechavi M. Performance of a phylogenetic independent contrast method and an improved pairwise comparison under different scenarios of trait evolution after speciation and duplication. Methods Ecol Evol 2021. [DOI: 10.1111/2041-210x.13680] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Tina Begum
- Department of Ecology and Evolution University of Lausanne Lausanne Switzerland
- SIB Swiss Institute of Bioinformatics Lausanne Switzerland
| | - Martha Liliana Serrano‐Serrano
- Department of Ecology and Evolution University of Lausanne Lausanne Switzerland
- SIB Swiss Institute of Bioinformatics Lausanne Switzerland
| | - Marc Robinson‐Rechavi
- Department of Ecology and Evolution University of Lausanne Lausanne Switzerland
- SIB Swiss Institute of Bioinformatics Lausanne Switzerland
| |
Collapse
|
5
|
Schaller D, Geiß M, Hellmuth M, Stadler PF. Heuristic algorithms for best match graph editing. Algorithms Mol Biol 2021; 16:19. [PMID: 34404422 PMCID: PMC8369769 DOI: 10.1186/s13015-021-00196-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 06/26/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Best match graphs (BMGs) are a class of colored digraphs that naturally appear in mathematical phylogenetics as a representation of the pairwise most closely related genes among multiple species. An arc connects a gene x with a gene y from another species (vertex color) Y whenever it is one of the phylogenetically closest relatives of x. BMGs can be approximated with the help of similarity measures between gene sequences, albeit not without errors. Empirical estimates thus will usually violate the theoretical properties of BMGs. The corresponding graph editing problem can be used to guide error correction for best match data. Since the arc set modification problems for BMGs are NP-complete, efficient heuristics are needed if BMGs are to be used for the practical analysis of biological sequence data. RESULTS Since BMGs have a characterization in terms of consistency of a certain set of rooted triples (binary trees on three vertices) defined on the set of genes, we consider heuristics that operate on triple sets. As an alternative, we show that there is a close connection to a set partitioning problem that leads to a class of top-down recursive algorithms that are similar to Aho's supertree algorithm and give rise to BMG editing algorithms that are consistent in the sense that they leave BMGs invariant. Extensive benchmarking shows that community detection algorithms for the partitioning steps perform best for BMG editing. CONCLUSION Noisy BMG data can be corrected with sufficient accuracy and efficiency to make BMGs an attractive alternative to classical phylogenetic methods.
Collapse
|
6
|
Harris CD, Torrance EL, Raymann K, Bobay LM. CoreCruncher: Fast and Robust Construction of Core Genomes in Large Prokaryotic Data Sets. Mol Biol Evol 2021; 38:727-734. [PMID: 32886787 PMCID: PMC7826169 DOI: 10.1093/molbev/msaa224] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
The core genome represents the set of genes shared by all, or nearly all, strains of a given population or species of prokaryotes. Inferring the core genome is integral to many genomic analyses, however, most methods rely on the comparison of all the pairs of genomes; a step that is becoming increasingly difficult given the massive accumulation of genomic data. Here, we present CoreCruncher; a program that robustly and rapidly constructs core genomes across hundreds or thousands of genomes. CoreCruncher does not compute all pairwise genome comparisons and uses a heuristic based on the distributions of identity scores to classify sequences as orthologs or paralogs/xenologs. Although it is much faster than current methods, our results indicate that our approach is more conservative than other tools and less sensitive to the presence of paralogs and xenologs. CoreCruncher is freely available from: https://github.com/lbobay/CoreCruncher. CoreCruncher is written in Python 3.7 and can also run on Python 2.7 without modification. It requires the python library Numpy and either Usearch or Blast. Certain options require the programs muscle or mafft.
Collapse
Affiliation(s)
- Connor D Harris
- Department of Biology, University of North Carolina Greensboro, Greensboro, NC
| | - Ellis L Torrance
- Department of Biology, University of North Carolina Greensboro, Greensboro, NC
| | - Kasie Raymann
- Department of Biology, University of North Carolina Greensboro, Greensboro, NC
| | - Louis-Marie Bobay
- Department of Biology, University of North Carolina Greensboro, Greensboro, NC
| |
Collapse
|
7
|
Zaquin T, Malik A, Drake JL, Putnam HM, Mass T. Evolution of Protein-Mediated Biomineralization in Scleractinian Corals. Front Genet 2021; 12:618517. [PMID: 33633782 PMCID: PMC7902050 DOI: 10.3389/fgene.2021.618517] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2020] [Accepted: 01/08/2021] [Indexed: 12/19/2022] Open
Abstract
While recent strides have been made in understanding the biological process by which stony corals calcify, much remains to be revealed, including the ubiquity across taxa of specific biomolecules involved. Several proteins associated with this process have been identified through proteomic profiling of the skeletal organic matrix (SOM) extracted from three scleractinian species. However, the evolutionary history of this putative “biomineralization toolkit,” including the appearance of these proteins’ throughout metazoan evolution, remains to be resolved. Here we used a phylogenetic approach to examine the evolution of the known scleractinians’ SOM proteins across the Metazoa. Our analysis reveals an evolutionary process dominated by the co-option of genes that originated before the cnidarian diversification. Each one of the three species appears to express a unique set of the more ancient genes, representing the independent co-option of SOM proteins, as well as a substantial proportion of proteins that evolved independently. In addition, in some instances, the different species expressed multiple orthologous proteins sharing the same evolutionary history. Furthermore, the non-random clustering of multiple SOM proteins within scleractinian-specific branches suggests the conservation of protein function between distinct species for what we posit is part of the scleractinian “core biomineralization toolkit.” This “core set” contains proteins that are likely fundamental to the scleractinian biomineralization mechanism. From this analysis, we infer that the scleractinians’ ability to calcify was achieved primarily through multiple lineage-specific protein expansions, which resulted in a new functional role that was not present in the parent gene.
Collapse
Affiliation(s)
- Tal Zaquin
- Department of Marine Biology, The Leon H. Charney School of Marine Sciences, University of Haifa, Haifa, Israel
| | - Assaf Malik
- Department of Marine Biology, The Leon H. Charney School of Marine Sciences, University of Haifa, Haifa, Israel
| | - Jeana L Drake
- Department of Marine Biology, The Leon H. Charney School of Marine Sciences, University of Haifa, Haifa, Israel
| | - Hollie M Putnam
- Department of Biological Sciences, University of Rhode Island, Kingston, RI, United States
| | - Tali Mass
- Department of Marine Biology, The Leon H. Charney School of Marine Sciences, University of Haifa, Haifa, Israel
| |
Collapse
|
8
|
Chen Y, Song W, Xie X, Wang Z, Guan P, Peng H, Jiao Y, Ni Z, Sun Q, Guo W. A Collinearity-Incorporating Homology Inference Strategy for Connecting Emerging Assemblies in the Triticeae Tribe as a Pilot Practice in the Plant Pangenomic Era. MOLECULAR PLANT 2020; 13:1694-1708. [PMID: 32979565 DOI: 10.1016/j.molp.2020.09.019] [Citation(s) in RCA: 108] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Revised: 09/03/2020] [Accepted: 09/21/2020] [Indexed: 05/18/2023]
Abstract
Plant genome sequencing has dramatically increased, and some species even have multiple high-quality reference versions. Demands for clade-specific homology inference and analysis have increased in the pangenomic era. Here we present a novel method, GeneTribe (https://chenym1.github.io/genetribe/), for homology inference among genetically similar genomes that incorporates gene collinearity and shows better performance than traditional sequence-similarity-based methods in terms of accuracy and scalability. The Triticeae tribe is a typical allopolyploid-rich clade with complex species relationships that includes many important crops, such as wheat, barley, and rye. We built Triticeae-GeneTribe (http://wheat.cau.edu.cn/TGT/), a homology database, by integrating 12 Triticeae genomes and 3 outgroup model genomes and implemented versatile analysis and visualization functions. With macrocollinearity analysis, we were able to construct a refined model illustrating the structural rearrangements of the 4A-5A-7B chromosomes in wheat as two major translocation events. With collinearity analysis at both the macro- and microscale, we illustrated the complex evolutionary history of homologs of the wheat vernalization gene Vrn2, which evolved as a combined result of genome translocation, duplication, and polyploidization and gene loss events. Our work provides a useful practice for connecting emerging genome assemblies, with awareness of the extensive polyploidy in plants, and will help researchers efficiently exploit genome sequence resources.
Collapse
Affiliation(s)
- Yongming Chen
- Key Laboratory of Crop Heterosis and Utilization, State Key Laboratory for Agrobiotechnology, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Wanjun Song
- Key Laboratory of Crop Heterosis and Utilization, State Key Laboratory for Agrobiotechnology, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China; Beijing Geek Gene Technology Co Ltd, Beijing 100193, China
| | - Xiaoming Xie
- Key Laboratory of Crop Heterosis and Utilization, State Key Laboratory for Agrobiotechnology, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Zihao Wang
- Key Laboratory of Crop Heterosis and Utilization, State Key Laboratory for Agrobiotechnology, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Panfeng Guan
- Key Laboratory of Crop Heterosis and Utilization, State Key Laboratory for Agrobiotechnology, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Huiru Peng
- Key Laboratory of Crop Heterosis and Utilization, State Key Laboratory for Agrobiotechnology, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Yuannian Jiao
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhongfu Ni
- Key Laboratory of Crop Heterosis and Utilization, State Key Laboratory for Agrobiotechnology, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Qixin Sun
- Key Laboratory of Crop Heterosis and Utilization, State Key Laboratory for Agrobiotechnology, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Weilong Guo
- Key Laboratory of Crop Heterosis and Utilization, State Key Laboratory for Agrobiotechnology, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China.
| |
Collapse
|
9
|
Agarwal PR, Lahiri A. Comparative study of the SBP-box gene family in rice siblings. J Biosci 2020. [DOI: 10.1007/s12038-020-00048-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
10
|
Molloy EK, Warnow T. FastMulRFS: fast and accurate species tree estimation under generic gene duplication and loss models. Bioinformatics 2020; 36:i57-i65. [PMID: 32657396 PMCID: PMC7355287 DOI: 10.1093/bioinformatics/btaa444] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
MOTIVATION Species tree estimation is a basic part of biological research but can be challenging because of gene duplication and loss (GDL), which results in genes that can appear more than once in a given genome. All common approaches in phylogenomic studies either reduce available data or are error-prone, and thus, scalable methods that do not discard data and have high accuracy on large heterogeneous datasets are needed. RESULTS We present FastMulRFS, a polynomial-time method for estimating species trees without knowledge of orthology. We prove that FastMulRFS is statistically consistent under a generic model of GDL when adversarial GDL does not occur. Our extensive simulation study shows that FastMulRFS matches the accuracy of MulRF (which tries to solve the same optimization problem) and has better accuracy than prior methods, including ASTRAL-multi (the only method to date that has been proven statistically consistent under GDL), while being much faster than both methods. AVAILABILITY AND IMPEMENTATION FastMulRFS is available on Github (https://github.com/ekmolloy/fastmulrfs). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Erin K Molloy
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
11
|
|
12
|
Gao K, Miller J. Primary orthologs from local sequence context. BMC Bioinformatics 2020; 21:48. [PMID: 32028880 PMCID: PMC7006074 DOI: 10.1186/s12859-020-3384-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 01/22/2020] [Indexed: 02/05/2023] Open
Abstract
BACKGROUND The evolutionary history of genes serves as a cornerstone of contemporary biology. Most conserved sequences in mammalian genomes don't code for proteins, yielding a need to infer evolutionary history of sequences irrespective of what kind of functional element they may encode. Thus, sequence-, as opposed to gene-, centric modes of inferring paths of sequence evolution are increasingly relevant. Customarily, homologous sequences derived from the same direct ancestor, whose ancestral position in two genomes is usually conserved, are termed "primary" (or "positional") orthologs. Methods based solely on similarity don't reliably distinguish primary orthologs from other homologs; for this, genomic context is often essential. Context-dependent identification of orthologs traditionally relies on genomic context over length scales characteristic of conserved gene order or whole-genome sequence alignment, and can be computationally intensive. RESULTS We demonstrate that short-range sequence context-as short as a single "maximal" match- distinguishes primary orthologs from other homologs across whole genomes. On mammalian whole genomes not preprocessed by repeat-masker, potential orthologs are extracted by genome intersection as "non-nested maximal matches:" maximal matches that are not nested into other maximal matches. It emerges that on both nucleotide and gene scales, non-nested maximal matches recapitulate primary or positional orthologs with high precision and high recall, while the corresponding computation consumes less than one thirtieth of the computation time required by commonly applied whole-genome alignment methods. In regions of genomes that would be masked by repeat-masker, non-nested maximal matches recover orthologs that are inaccessible to Lastz net alignment, for which repeat-masking is a prerequisite. mmRBHs, reciprocal best hits of genes containing non-nested maximal matches, yield novel putative orthologs, e.g. around 1000 pairs of genes for human-chimpanzee. CONCLUSIONS We describe an intersection-based method that requires neither repeat-masking nor alignment to infer evolutionary history of sequences based on short-range genomic sequence context. Ortholog identification based on non-nested maximal matches is parameter-free, and less computationally intensive than many alignment-based methods. It is especially suitable for genome-wide identification of orthologs, and may be applicable to unassembled genomes. We are agnostic as to the reasons for its effectiveness, which may reflect local variation of mean mutation rate.
Collapse
Affiliation(s)
- Kun Gao
- School of Science, Southwest University of Science and Technology, 59 Qinglong Road, Mianyang, Sichuan Province, 621010, People's Republic of China.
| | - Jonathan Miller
- Physics and Biology Unit, Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Kunigami-gun, Okinawa, 904-0495, Japan
| |
Collapse
|
13
|
Glover N, Dessimoz C, Ebersberger I, Forslund SK, Gabaldón T, Huerta-Cepas J, Martin MJ, Muffato M, Patricio M, Pereira C, da Silva AS, Wang Y, Sonnhammer E, Thomas PD. Advances and Applications in the Quest for Orthologs. Mol Biol Evol 2020; 36:2157-2164. [PMID: 31241141 PMCID: PMC6759064 DOI: 10.1093/molbev/msz150] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Gene families evolve by the processes of speciation (creating orthologs), gene duplication (paralogs), and horizontal gene transfer (xenologs), in addition to sequence divergence and gene loss. Orthologs in particular play an essential role in comparative genomics and phylogenomic analyses. With the continued sequencing of organisms across the tree of life, the data are available to reconstruct the unique evolutionary histories of tens of thousands of gene families. Accurate reconstruction of these histories, however, is a challenging computational problem, and the focus of the Quest for Orthologs Consortium. We review the recent advances and outstanding challenges in this field, as revealed at a symposium and meeting held at the University of Southern California in 2017. Key advances have been made both at the level of orthology algorithm development and with respect to coordination across the community of algorithm developers and orthology end-users. Applications spanned a broad range, including gene function prediction, phylostratigraphy, genome evolution, and phylogenomics. The meetings highlighted the increasing use of meta-analyses integrating results from multiple different algorithms, and discussed ongoing challenges in orthology inference as well as the next steps toward improvement and integration of orthology resources.
Collapse
Affiliation(s)
- Natasha Glover
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Genetics, Evolution & Environment, University College London, London, United Kingdom.,Department of Computer Science, University College London, London, United Kingdom
| | - Ingo Ebersberger
- Applied Bioinformatics Group, Institute of Cell Biology and Neuroscience, Goethe University Frankfurt, Frankfurt, Germany.,Senckenberg Biodiversity and Climate Research Centre (BIK-F), Frankfurt, Germany.,LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany
| | - Sofia K Forslund
- Experimental and Clinical Research Center, A Cooperation of Charité-Universitätsmedizin Berlin and Max Delbruck Center for Molecular Medicine, Berlin, Germany.,Max Delbruck Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany.,Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität u Berlin, Berlin, Germany.,Berlin Institute of Health (BIH), Berlin, Germany.,Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Toni Gabaldón
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,ICREA, Barcelona, Spain
| | - Jaime Huerta-Cepas
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.,Centro de Biotecnología y Genómica de Plantas, Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Universidad Politécnica de Madrid (UPM), Madrid, Spain
| | - Maria-Jesus Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Matthieu Muffato
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Mateus Patricio
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Cécile Pereira
- Eura Nova, Marseille, France.,Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, University of Florida, Gainesville, FL
| | - Alan Sousa da Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Yan Wang
- Department of Microbiology and Plant Pathology, Institute for Integrative Genome Biology, University of California-Riverside, Riverside, CA
| | - Erik Sonnhammer
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Solna, Sweden
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA
| |
Collapse
|
14
|
Emms DM, Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol 2019; 20:238. [PMID: 31727128 PMCID: PMC6857279 DOI: 10.1186/s13059-019-1832-y] [Citation(s) in RCA: 2588] [Impact Index Per Article: 517.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2019] [Accepted: 09/23/2019] [Indexed: 12/22/2022] Open
Abstract
Here, we present a major advance of the OrthoFinder method. This extends OrthoFinder's high accuracy orthogroup inference to provide phylogenetic inference of orthologs, rooted gene trees, gene duplication events, the rooted species tree, and comparative genomics statistics. Each output is benchmarked on appropriate real or simulated datasets, and where comparable methods exist, OrthoFinder is equivalent to or outperforms these methods. Furthermore, OrthoFinder is the most accurate ortholog inference method on the Quest for Orthologs benchmark test. Finally, OrthoFinder's comprehensive phylogenetic analysis is achieved with equivalent speed and scalability to the fastest, score-based heuristic methods. OrthoFinder is available at https://github.com/davidemms/OrthoFinder.
Collapse
Affiliation(s)
- David M Emms
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK
| | - Steven Kelly
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK.
| |
Collapse
|
15
|
Dondi R, Lafond M, Scornavacca C. Reconciling multiple genes trees via segmental duplications and losses. Algorithms Mol Biol 2019; 14:7. [PMID: 30930955 PMCID: PMC6425616 DOI: 10.1186/s13015-019-0139-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Accepted: 02/23/2019] [Indexed: 01/18/2023] Open
Abstract
Reconciling gene trees with a species tree is a fundamental problem to understand the evolution of gene families. Many existing approaches reconcile each gene tree independently. However, it is well-known that the evolution of gene families is interconnected. In this paper, we extend a previous approach to reconcile a set of gene trees with a species tree based on segmental macro-evolutionary events, where segmental duplication events and losses are associated with cost \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\delta $$\end{document}δ and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\lambda $$\end{document}λ, respectively. We show that the problem is polynomial-time solvable when \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\delta \le \lambda $$\end{document}δ≤λ (via LCA-mapping), while if \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\delta > \lambda $$\end{document}δ>λ the problem is NP-hard, even when \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\lambda = 0$$\end{document}λ=0 and a single gene tree is given, solving a long standing open problem on the complexity of multi-gene reconciliation. On the positive side, we give a fixed-parameter algorithm for the problem, where the parameters are \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\delta /\lambda $$\end{document}δ/λ and the number d of segmental duplications, of time complexity \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$O\left(\lceil \frac{\delta }{\lambda } \rceil ^{d} \cdot n \cdot \frac{\delta }{\lambda }\right)$$\end{document}O⌈δλ⌉d·n·δλ. Finally, we demonstrate the usefulness of this algorithm on two previously studied real datasets: we first show that our method can be used to confirm or raise doubt on hypothetical segmental duplications on a set of 16 eukaryotes, then show how we can detect whole genome duplications in yeast genomes.
Collapse
|