Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Dalquen DA, Altenhoff AM, Gonnet GH, Dessimoz C. The impact of gene duplication, insertion, deletion, lateral gene transfer and sequencing error on orthology inference: a simulation study. PLoS One 2013;8:e56925. [PMID: 23451112 PMCID: PMC3581572 DOI: 10.1371/journal.pone.0056925] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2012] [Accepted: 01/16/2013] [Indexed: 11/19/2022] Open

For:	Dalquen DA, Altenhoff AM, Gonnet GH, Dessimoz C. The impact of gene duplication, insertion, deletion, lateral gene transfer and sequencing error on orthology inference: a simulation study. PLoS One 2013;8:e56925. [PMID: 23451112 PMCID: PMC3581572 DOI: 10.1371/journal.pone.0056925] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2012] [Accepted: 01/16/2013] [Indexed: 11/19/2022] Open

Number

Cited by Other Article(s)

Rossier V, Warwick Vesztrocy A, Robinson-Rechavi M, Dessimoz C. OMAmer: tree-driven and alignment-free protein assignment to subfamilies outperforms closest sequence approaches. Bioinformatics 2021;37:2866-2873. [PMID: 33787851 PMCID: PMC8479680 DOI: 10.1093/bioinformatics/btab219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2020] [Revised: 02/18/2021] [Accepted: 03/30/2021] [Indexed: 02/02/2023] Open

Deutekom ES, Snel B, van Dam TJP. Benchmarking orthology methods using phylogenetic patterns defined at the base of Eukaryotes. Brief Bioinform 2020;22:5906198. [PMID: 32935832 PMCID: PMC8138875 DOI: 10.1093/bib/bbaa206] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Revised: 08/10/2020] [Accepted: 08/11/2020] [Indexed: 12/26/2022] Open

Abstract

Insights into the evolution of ancestral complexes and pathways are generally achieved through careful and time-intensive manual analysis often using phylogenetic profiles of the constituent proteins. This manual analysis limits the possibility of including more protein-complex components, repeating the analyses for updated genome sets or expanding the analyses to larger scales. Automated orthology inference should allow such large-scale analyses, but substantial differences between orthologous groups generated by different approaches are observed.

We evaluate orthology methods for their ability to recapitulate a number of observations that have been made with regard to genome evolution in eukaryotes. Specifically, we investigate phylogenetic profile similarity (co-occurrence of complexes), the last eukaryotic common ancestor’s gene content, pervasiveness of gene loss and the overlap with manually determined orthologous groups. Moreover, we compare the inferred orthologies to each other.

We find that most orthology methods reconstruct a large last eukaryotic common ancestor, with substantial gene loss, and can predict interacting proteins reasonably well when applying phylogenetic co-occurrence. At the same time, derived orthologous groups show imperfect overlap with manually curated orthologous groups. There is no strong indication of which orthology method performs better than another on individual or all of these aspects. Counterintuitively, despite the orthology methods behaving similarly regarding large-scale evaluation, the obtained orthologous groups differ vastly from one another.

Availability and implementation The data and code underlying this article are available in github and/or upon reasonable request to the corresponding author: https://github.com/ESDeutekom/ComparingOrthologies.

Collapse

Dencker T, Leimeister CA, Gerth M, Bleidorn C, Snir S, Morgenstern B. 'Multi-SpaM': a maximum-likelihood approach to phylogeny reconstruction using multiple spaced-word matches and quartet trees. NAR Genom Bioinform 2020;2:lqz013. [PMID: 33575565 PMCID: PMC7671388 DOI: 10.1093/nargab/lqz013] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 07/31/2019] [Accepted: 10/13/2019] [Indexed: 02/03/2023] Open

Piližota I, Train CM, Altenhoff A, Redestig H, Dessimoz C. Phylogenetic approaches to identifying fragments of the same gene, with application to the wheat genome. Bioinformatics 2019;35:1159-1166. [PMID: 30184069 PMCID: PMC6449756 DOI: 10.1093/bioinformatics/bty772] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2017] [Revised: 07/30/2018] [Accepted: 08/31/2018] [Indexed: 11/12/2022] Open

Inferring Orthology and Paralogy. Methods Mol Biol 2019;1910:149-175. [PMID: 31278664 DOI: 10.1007/978-1-4939-9074-0_5] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

Pouchon C, Fernández A, Nassar JM, Boyer F, Aubert S, Lavergne S, Mavárez J. Phylogenomic Analysis of the Explosive Adaptive Radiation of the Espeletia Complex (Asteraceae) in the Tropical Andes. Syst Biol 2018;67:1041-1060. [PMID: 30339252 DOI: 10.1093/sysbio/syy022] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Accepted: 03/15/2018] [Indexed: 01/17/2023] Open

Abstract

The subtribe Espeletiinae (Asteraceae), endemic to the high-elevations in the Northern Andes, exhibits an exceptional diversity of species, growth-forms, and reproductive strategies. This complex of 140 species includes large trees, dichotomous trees, shrubs and the extraordinary giant caulescent rosettes, considered as a classic example of adaptation in tropical high-elevation ecosystems. The subtribe has also long been recognized as a prominent case of adaptive radiation, but the understanding of its evolution has been hampered by a lack of phylogenetic resolution. Herein, we produce the first fully resolved phylogeny of all morphological groups of Espeletiinae, using whole plastomes and about a million nuclear nucleotides obtained with an original de novo assembly procedure without reference genome, and analyzed with traditional and coalescent-based approaches that consider the possible impact of incomplete lineage sorting and hybridization on phylogenetic inference. We show that the diversification of Espeletiinae started from a rosette ancestor about 2.3 Ma, after the final uplift of the Northern Andes. This was followed by two independent radiations in the Colombian and Venezuelan Andes, with a few trans-cordilleran dispersal events among low-elevation tree lineages but none among high-elevation rosettes. We demonstrate complex scenarios of morphological change in Espeletiinae, usually implying the convergent evolution of growth-forms with frequent loss/gains of various traits. For instance, caulescent rosettes evolved independently in both countries, likely as convergent adaptations to life in tropical high-elevation habitats. Tree growth-forms evolved independently three times from the repeated colonization of lower elevations by high-elevation rosette ancestors. The rate of morphological diversification increased during the early phase of the radiation, after which it decreased steadily towards the present. On the other hand, the rate of species diversification in the best-sampled Venezuelan radiation was on average very high (3.1 spp/My), with significant rate variation among growth-forms (much higher in polycarpic caulescent rosettes). Our results point out a scenario where both adaptive morphological evolution and geographical isolation due to Pleistocene climatic oscillations triggered an exceptionally rapid radiation for a continental plant group.

Collapse

Smith SA, Pease JB. Heterogeneous molecular processes among the causes of how sequence similarity scores can fail to recapitulate phylogeny. Brief Bioinform 2017;18:451-457. [PMID: 27103098 PMCID: PMC5429007 DOI: 10.1093/bib/bbw034] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2016] [Indexed: 11/24/2022] Open

Jahangiri-Tazehkand S, Wong L, Eslahchi C. OrthoGNC: A Software for Accurate Identification of Orthologs Based on Gene Neighborhood Conservation. GENOMICS PROTEOMICS & BIOINFORMATICS 2017;15:361-370. [PMID: 29133277 PMCID: PMC5828658 DOI: 10.1016/j.gpb.2017.07.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2017] [Revised: 07/17/2017] [Accepted: 07/28/2017] [Indexed: 11/17/2022]

SMORE: Synteny Modulator of Repetitive Elements. Life (Basel) 2017;7:life7040042. [PMID: 29088079 PMCID: PMC5745555 DOI: 10.3390/life7040042] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2017] [Revised: 10/27/2017] [Accepted: 10/28/2017] [Indexed: 12/19/2022] Open

Rane RV, Oakeshott JG, Nguyen T, Hoffmann AA, Lee SF. Orthonome - a new pipeline for predicting high quality orthologue gene sets applicable to complete and draft genomes. BMC Genomics 2017;18:673. [PMID: 28859620 PMCID: PMC5580312 DOI: 10.1186/s12864-017-4079-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2017] [Accepted: 08/21/2017] [Indexed: 12/15/2022] Open

Abstract

BACKGROUND

Distinguishing orthologous and paralogous relationships between genes across multiple species is essential for comparative genomic analyses. Various computational approaches have been developed to resolve these evolutionary relationships, but strong trade-offs between precision and recall of orthologue prediction remains an ongoing challenge.

RESULTS

Here we present Orthonome, an orthologue prediction pipeline, designed to reduce the trade-off between orthologue capture rates (recall) and accuracy of multi-species orthologue prediction. The pipeline compares sequence domains and then forms sequence-similar clusters before using phylogenetic comparisons to identify inparalogues. It then corrects sequence similarity metrics for fragment and gene length bias using a novel scoring metric capturing relationships between full length as well as fragmented genes. The remaining genes are then brought together for the identification of orthologues within a phylogenetic framework. The orthologue predictions are further calibrated along with inparalogues and gene births, using synteny, to identify novel orthologous relationships. We use 12 high quality Drosophila genomes to show that, compared to other orthologue prediction pipelines, Orthonome provides orthogroups with minimal error but high recall. Furthermore, Orthonome is resilient to suboptimal assembly/annotation quality, with the inclusion of draft genomes from eight additional Drosophila species still providing >6500 1:1 orthologues across all twenty species while retaining a better combination of accuracy and recall than other pipelines. Orthonome is implemented as a searchable database and query tool along with multiple-sequence alignment browsers for all sets of orthologues. The underlying documentation and database are accessible at http://www.orthonome.com .

CONCLUSION

We demonstrate that Orthonome provides a superior combination of orthologue capture rates and accuracy on complete and draft drosophilid genomes when tested alongside previously published pipelines. The study also highlights a greater degree of evolutionary conservation across drosophilid species than earlier thought.

Collapse

Velandia-Huerto CA, Berkemer SJ, Hoffmann A, Retzlaff N, Romero Marroquín LC, Hernández-Rosales M, Stadler PF, Bermúdez-Santana CI. Orthologs, turn-over, and remolding of tRNAs in primates and fruit flies. BMC Genomics 2016;17:617. [PMID: 27515907 PMCID: PMC4981973 DOI: 10.1186/s12864-016-2927-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2016] [Accepted: 07/11/2016] [Indexed: 12/26/2022] Open

Abstract

Background

Transfer RNAs (tRNAs) are ubiquitous in all living organism. They implement the genetic code so that most genomes contain distinct tRNAs for almost all 61 codons. They behave similar to mobile elements and proliferate in genomes spawning both local and non-local copies. Most tRNA families are therefore typically present as multicopy genes. The members of the individual tRNA families evolve under concerted or rapid birth-death evolution, so that paralogous copies maintain almost identical sequences over long evolutionary time-scales. To a good approximation these are functionally equivalent. Individual tRNA copies thus are evolutionary unstable and easily turn into pseudogenes and disappear. This leads to a rapid turnover of tRNAs and often large differences in the tRNA complements of closely related species. Since tRNA paralogs are not distinguished by sequence, common methods cannot not be used to establish orthology between tRNA genes.

Results

In this contribution we introduce a general framework to distinguish orthologs and paralogs in gene families that are subject to concerted evolution. It is based on the use of uniquely aligned adjacent sequence elements as anchors to establish syntenic conservation of sequence intervals. In practice, anchors and intervals can be extracted from genome-wide multiple sequence alignments. Syntenic clusters of concertedly evolving genes of different families can then be subdivided by list alignments, leading to usually small clusters of candidate co-orthologs. On the basis of recent advances in phylogenetic combinatorics, these candidate clusters can be further processed by cograph editing to recover their duplication histories. We developed a workflow that can be conceptualized as stepwise refinement of a graph of homologous genes. We apply this analysis strategy with different types of synteny anchors to investigate the evolution of tRNAs in primates and fruit flies. We identified a large number of tRNA remolding events concentrated at the tips of the phylogeny. With one notable exception all phylogenetically old tRNA remoldings do not change the isoacceptor class.

Conclusions

Gene families evolving under concerted evolution are not amenable to classical phylogenetic analyses since paralogs maintain identical, species-specific sequences, precluding the estimation of correct gene trees from sequence differences. This leaves conservation of syntenic arrangements with respect to “anchor elements” that are not subject to concerted evolution as the only viable source of phylogenetic information. We have demonstrated here that a purely synteny-based analysis of tRNA gene histories is indeed feasible. Although the choice of synteny anchors influences the resolution in particular when tight gene clusters are present, and the quality of sequence alignments, genome assemblies, and genome rearrangements limits the scope of the analysis, largely coherent results can be obtained for tRNAs. In particular, we conclude that a large fraction of the tRNAs are recent copies. This proliferation is compensated by rapid pseudogenization as exemplified by many very recent alloacceptor remoldings.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-016-2927-4) contains supplementary material, which is available to authorized users.

Collapse

Affiliation(s)

Cristian A Velandia-Huerto Biology Department, Universidad Nacional de Colombia, Carrera 45 # 26-85, Edif. Uriel Gutiérrez, Bogotá, D.C, Colombia
Sarah J Berkemer Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, Leipzig, D-04103, Germany.,Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18D-04107, Leipzig, Germany
Anne Hoffmann Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18D-04107, Leipzig, Germany
Nancy Retzlaff Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, Leipzig, D-04103, Germany.,Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18D-04107, Leipzig, Germany
Liliana C Romero Marroquín Biology Department, Universidad Nacional de Colombia, Carrera 45 # 26-85, Edif. Uriel Gutiérrez, Bogotá, D.C, Colombia
Maribel Hernández-Rosales CONACYT - Instituto de Matemáticas, UNAM Juriquilla, Av. Juriquilla #3001, Santiago de Querétaro, MX-76230, QRO, México
Peter F Stadler Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, Leipzig, D-04103, Germany. .,Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18D-04107, Leipzig, Germany. .,Fraunhofer Institut for Cell Therapy and Immunology, Perlickstraße 1, Leipzig, D-04103, Germany. .,Department of Theoretical Chemistry, University of Vienna, Währinger Straße 17, Vienna, A-1090, Austria. .,Center for non-coding RNA in Technology and Health, Grønegårdsvej 3, Frederiksberg C, DK-1870, Denmark. .,Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM87501, USA.
Clara I Bermúdez-Santana Biology Department, Universidad Nacional de Colombia, Carrera 45 # 26-85, Edif. Uriel Gutiérrez, Bogotá, D.C, Colombia

Collapse

Comparing the Statistical Fate of Paralogous and Orthologous Sequences. Genetics 2016;204:475-482. [PMID: 27474728 DOI: 10.1534/genetics.116.193912] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Accepted: 07/26/2016] [Indexed: 02/01/2023] Open

Ješovnik A, González VL, Schultz TR. Phylogenomics and Divergence Dating of Fungus-Farming Ants (Hymenoptera: Formicidae) of the Genera Sericomyrmex and Apterostigma. PLoS One 2016;11:e0151059. [PMID: 27466804 PMCID: PMC4965065 DOI: 10.1371/journal.pone.0151059] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2015] [Accepted: 02/22/2016] [Indexed: 01/27/2023] Open

Abstract

Fungus-farming ("attine") ants are model systems for studies of symbiosis, coevolution, and advanced eusociality. A New World clade of nearly 300 species in 15 genera, all attine ants cultivate fungal symbionts for food. In order to better understand the evolution of ant agriculture, we sequenced, assembled, and analyzed transcriptomes of four different attine ant species in two genera: three species in the higher-attine genus Sericomyrmex and a single lower-attine ant species, Apterostigma megacephala, representing the first genomic data for either genus. These data were combined with published genomes of nine other ant species and the honey bee Apis mellifera for phylogenomic and divergence-dating analyses. The resulting phylogeny confirms relationships inferred in previous studies of fungus-farming ants. Divergence-dating analyses recovered slightly older dates than most prior analyses, estimating that attine ants originated 53.6–66.7 million of years ago, and recovered a very long branch subtending a very recent, rapid radiation of the genus Sericomyrmex. This result is further confirmed by a separate analysis of the three Sericomyrmex species, which reveals that 92.71% of orthologs have 99% - 100% pairwise-identical nucleotide sequences. We searched the transcriptomes for genes of interest, most importantly argininosuccinate synthase and argininosuccinate lyase, which are functional in other ants but which are known to have been lost in seven previously studied attine ant species. Loss of the ability to produce the amino acid arginine has been hypothesized to contribute to the obligate dependence of attine ants upon their cultivated fungi, but the point in fungus-farming ant evolution at which these losses occurred has remained unknown. We did not find these genes in any of the sequenced transcriptomes. Although expected for Sericomyrmex species, the absence of arginine anabolic genes in the lower-attine ant Apterostigma megacephala strongly suggests that the loss coincided with the origin of attine ants.

Collapse

Glover NM, Redestig H, Dessimoz C. Homoeologs: What Are They and How Do We Infer Them? TRENDS IN PLANT SCIENCE 2016;21:609-621. [PMID: 27021699 PMCID: PMC4920642 DOI: 10.1016/j.tplants.2016.02.005] [Citation(s) in RCA: 102] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Revised: 02/09/2016] [Accepted: 02/20/2016] [Indexed: 05/18/2023]

Standardized benchmarking in the quest for orthologs. Nat Methods 2016;13:425-30. [PMID: 27043882 PMCID: PMC4827703 DOI: 10.1038/nmeth.3830] [Citation(s) in RCA: 126] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2016] [Accepted: 03/09/2016] [Indexed: 11/23/2022]

Tekaia F. Inferring Orthologs: Open Questions and Perspectives. GENOMICS INSIGHTS 2016;9:17-28. [PMID: 26966373 PMCID: PMC4778853 DOI: 10.4137/gei.s37925] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Revised: 12/30/2015] [Accepted: 01/02/2016] [Indexed: 01/25/2023]

Schierwater B, Holland PWH, Miller DJ, Stadler PF, Wiegmann BM, Wörheide G, Wray GA, DeSalle R. Never Ending Analysis of a Century Old Evolutionary Debate: “Unringing” the Urmetazoon Bell. Front Ecol Evol 2016. [DOI: 10.3389/fevo.2016.00005] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Hellmuth M, Wieseke N. From Sequence Data Including Orthologs, Paralogs, and Xenologs to Gene and Species Trees. Evol Biol 2016. [DOI: 10.1007/978-3-319-41324-2_21] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Kotowski N, Jardim R, Dávila AMR. Improved orthologous databases to ease protozoan targets inference. Parasit Vectors 2015;8:494. [PMID: 26416523 PMCID: PMC4587786 DOI: 10.1186/s13071-015-1090-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2015] [Accepted: 09/11/2015] [Indexed: 11/10/2022] Open

Abstract

Background

Homology inference helps on identifying similarities, as well as differences among organisms, which provides a better insight on how closely related one might be to another. In addition, comparative genomics pipelines are widely adopted tools designed using different bioinformatics applications and algorithms. In this article, we propose a methodology to build improved orthologous databases with the potential to aid on protozoan target identification, one of the many tasks which benefit from comparative genomics tools.

Methods

Our analyses are based on OrthoSearch, a comparative genomics pipeline originally designed to infer orthologs through protein-profile comparison, supported by an HMM, reciprocal best hits based approach. Our methodology allows OrthoSearch to confront two orthologous databases and to generate an improved new one. Such can be later used to infer potential protozoan targets through a similarity analysis against the human genome.

Results

The protein sequences of Cryptosporidium hominis, Entamoeba histolytica and Leishmania infantum genomes were comparatively analyzed against three orthologous databases: (i) EggNOG KOG, (ii) ProtozoaDB and (iii) Kegg Orthology (KO). That allowed us to create two new orthologous databases, “KO + EggNOG KOG” and “KO + EggNOG KOG + ProtozoaDB”, with 16,938 and 27,701 orthologous groups, respectively.

Such new orthologous databases were used for a regular OrthoSearch run. By confronting “KO + EggNOG KOG” and “KO + EggNOG KOG + ProtozoaDB” databases and protozoan species we were able to detect the following total of orthologous groups and coverage (relation between the inferred orthologous groups and the species total number of proteins): Cryptosporidium hominis: 1,821 (11 %) and 3,254 (12 %); Entamoeba histolytica: 2,245 (13 %) and 5,305 (19 %); Leishmania infantum: 2,702 (16 %) and 4,760 (17 %).

Using our HMM-based methodology and the largest created orthologous database, it was possible to infer 13 orthologous groups which represent potential protozoan targets; these were found because of our distant homology approach.

We also provide the number of species-specific, pair-to-pair and core groups from such analyses, depicted in Venn diagrams.

Conclusions

The orthologous databases generated by our HMM-based methodology provide a broader dataset, with larger amounts of orthologous groups when compared to the original databases used as input. Those may be used for several homology inference analyses, annotation tasks and protozoan targets identification.

Electronic supplementary material

The online version of this article (doi:10.1186/s13071-015-1090-0) contains supplementary material, which is available to authorized users.

Collapse

Behura SK. Insect phylogenomics. INSECT MOLECULAR BIOLOGY 2015;24:403-11. [PMID: 25963452 PMCID: PMC4503476 DOI: 10.1111/imb.12174] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2014] [Revised: 03/10/2015] [Accepted: 04/04/2015] [Indexed: 05/08/2023]

Chiara M, Caruso M, D'Erchia AM, Manzari C, Fraccalvieri R, Goffredo E, Latorre L, Miccolupo A, Padalino I, Santagada G, Chiocco D, Pesole G, Horner DS, Parisi A. Comparative Genomics of Listeria Sensu Lato: Genus-Wide Differences in Evolutionary Dynamics and the Progressive Gain of Complex, Potentially Pathogenicity-Related Traits through Lateral Gene Transfer. Genome Biol Evol 2015;7:2154-72. [PMID: 26185097 PMCID: PMC4558849 DOI: 10.1093/gbe/evv131] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open

Arenas M. Advances in computer simulation of genome evolution: toward more realistic evolutionary genomics analysis by approximate bayesian computation. J Mol Evol 2015;80:189-92. [PMID: 25808249 DOI: 10.1007/s00239-015-9673-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2015] [Accepted: 03/19/2015] [Indexed: 11/29/2022]

Phylogenomics with paralogs. Proc Natl Acad Sci U S A 2015;112:2058-63. [PMID: 25646426 DOI: 10.1073/pnas.1412770112] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Trachana K, Forslund K, Larsson T, Powell S, Doerks T, von Mering C, Bork P. A phylogeny-based benchmarking test for orthology inference reveals the limitations of function-based validation. PLoS One 2014;9:e111122. [PMID: 25369365 PMCID: PMC4219706 DOI: 10.1371/journal.pone.0111122] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Accepted: 09/23/2014] [Indexed: 11/19/2022] Open

Sonnhammer ELL, Gabaldón T, Sousa da Silva AW, Martin M, Robinson-Rechavi M, Boeckmann B, Thomas PD, Dessimoz C. Big data and other challenges in the quest for orthologs. Bioinformatics 2014;30:2993-8. [PMID: 25064571 PMCID: PMC4201156 DOI: 10.1093/bioinformatics/btu492] [Citation(s) in RCA: 98] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2014] [Revised: 06/25/2014] [Accepted: 07/16/2014] [Indexed: 01/29/2023] Open

Affiliation(s)

Erik L L Sonnhammer Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London
Toni Gabaldón Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London
Alan W Sousa da Silva Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK
Maria Martin Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK
Marc Robinson-Rechavi Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London
Brigitte Boeckmann Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK
Paul D Thomas Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK
Christophe Dessimoz Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK Stockholm Bioinformatics Center, Science for Life Laboratory, Box 1031, SE-17121 Solna, Sweden, Swedish eScience Research Center, Stockholm, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain, EMBL-European Bioinformatics Institute, Hinxton CB10 1SD, UK, Department of Ecology and Evolution, University of Lausanne, Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, SwissProt, Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland, Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA and Department of Genetics, Evolution and Environment, and Department of Computer Science, University College London, Gower St, London

Collapse

Pereira C, Denise A, Lespinet O. A meta-approach for improving the prediction and the functional annotation of ortholog groups. BMC Genomics 2014;15 Suppl 6:S16. [PMID: 25573073 PMCID: PMC4240552 DOI: 10.1186/1471-2164-15-s6-s16] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Linard B, Allot A, Schneider R, Morel C, Ripp R, Bigler M, Thompson JD, Poch O, Lecompte O. OrthoInspector 2.0: Software and database updates. ACTA ACUST UNITED AC 2014;31:447-8. [PMID: 25273105 DOI: 10.1093/bioinformatics/btu642] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Affiliation(s)

Benjamin Linard LBGI, Computer Science Department, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de médecine translationnelle, 4 rue Kirschleger 67085 Strasbourg, France and Department of Life Sciences, Natural History Museum, Cromwell Road, London SW7 5BD LBGI, Computer Science Department, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de médecine translationnelle, 4 rue Kirschleger 67085 Strasbourg, France and Department of Life Sciences, Natural History Museum, Cromwell Road, London SW7 5BD
Alexis Allot LBGI, Computer Science Department, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de médecine translationnelle, 4 rue Kirschleger 67085 Strasbourg, France and Department of Life Sciences, Natural History Museum, Cromwell Road, London SW7 5BD
Raphaël Schneider LBGI, Computer Science Department, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de médecine translationnelle, 4 rue Kirschleger 67085 Strasbourg, France and Department of Life Sciences, Natural History Museum, Cromwell Road, London SW7 5BD
Can Morel LBGI, Computer Science Department, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de médecine translationnelle, 4 rue Kirschleger 67085 Strasbourg, France and Department of Life Sciences, Natural History Museum, Cromwell Road, London SW7 5BD
Raymond Ripp LBGI, Computer Science Department, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de médecine translationnelle, 4 rue Kirschleger 67085 Strasbourg, France and Department of Life Sciences, Natural History Museum, Cromwell Road, London SW7 5BD
Marc Bigler LBGI, Computer Science Department, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de médecine translationnelle, 4 rue Kirschleger 67085 Strasbourg, France and Department of Life Sciences, Natural History Museum, Cromwell Road, London SW7 5BD
Julie D Thompson LBGI, Computer Science Department, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de médecine translationnelle, 4 rue Kirschleger 67085 Strasbourg, France and Department of Life Sciences, Natural History Museum, Cromwell Road, London SW7 5BD
Olivier Poch LBGI, Computer Science Department, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de médecine translationnelle, 4 rue Kirschleger 67085 Strasbourg, France and Department of Life Sciences, Natural History Museum, Cromwell Road, London SW7 5BD
Odile Lecompte LBGI, Computer Science Department, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de médecine translationnelle, 4 rue Kirschleger 67085 Strasbourg, France and Department of Life Sciences, Natural History Museum, Cromwell Road, London SW7 5BD

Collapse

Khenoussi W, Vanhoutrève R, Poch O, Thompson JD. SIBIS: a Bayesian model for inconsistent protein sequence estimation. Bioinformatics 2014;30:2432-9. [PMID: 24825613 DOI: 10.1093/bioinformatics/btu329] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Dalquen DA, Dessimoz C. Bidirectional best hits miss many orthologs in duplication-rich clades such as plants and animals. Genome Biol Evol 2014;5:1800-6. [PMID: 24013106 PMCID: PMC3814191 DOI: 10.1093/gbe/evt132] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Powell S, Forslund K, Szklarczyk D, Trachana K, Roth A, Huerta-Cepas J, Gabaldón T, Rattei T, Creevey C, Kuhn M, Jensen LJ, von Mering C, Bork P. eggNOG v4.0: nested orthology inference across 3686 organisms. Nucleic Acids Res 2013;42:D231-9. [PMID: 24297252 PMCID: PMC3964997 DOI: 10.1093/nar/gkt1253] [Citation(s) in RCA: 422] [Impact Index Per Article: 38.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open