1
|
Shikov AE, Malovichko YV, Nizhnikov AA, Antonets KS. Current Methods for Recombination Detection in Bacteria. Int J Mol Sci 2022; 23:ijms23116257. [PMID: 35682936 PMCID: PMC9181119 DOI: 10.3390/ijms23116257] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 05/30/2022] [Accepted: 05/30/2022] [Indexed: 02/05/2023] Open
Abstract
The role of genetic exchanges, i.e., homologous recombination (HR) and horizontal gene transfer (HGT), in bacteria cannot be overestimated for it is a pivotal mechanism leading to their evolution and adaptation, thus, tracking the signs of recombination and HGT events is importance both for fundamental and applied science. To date, dozens of bioinformatics tools for revealing recombination signals are available, however, their pros and cons as well as the spectra of solvable tasks have not yet been systematically reviewed. Moreover, there are two major groups of software. One aims to infer evidence of HR, while the other only deals with horizontal gene transfer (HGT). However, despite seemingly different goals, all the methods use similar algorithmic approaches, and the processes are interconnected in terms of genomic evolution influencing each other. In this review, we propose a classification of novel instruments for both HR and HGT detection based on the genomic consequences of recombination. In this context, we summarize available methodologies paying particular attention to the type of traceable events for which a certain program has been designed.
Collapse
Affiliation(s)
- Anton E. Shikov
- Laboratory for Proteomics of Supra-Organismal Systems, All-Russia Research Institute for Agricultural Microbiology (ARRIAM), 196608 St. Petersburg, Russia; (A.E.S.); (Y.V.M.); (A.A.N.)
- Faculty of Biology, St. Petersburg State University (SPbSU), 199034 St. Petersburg, Russia
| | - Yury V. Malovichko
- Laboratory for Proteomics of Supra-Organismal Systems, All-Russia Research Institute for Agricultural Microbiology (ARRIAM), 196608 St. Petersburg, Russia; (A.E.S.); (Y.V.M.); (A.A.N.)
- Faculty of Biology, St. Petersburg State University (SPbSU), 199034 St. Petersburg, Russia
| | - Anton A. Nizhnikov
- Laboratory for Proteomics of Supra-Organismal Systems, All-Russia Research Institute for Agricultural Microbiology (ARRIAM), 196608 St. Petersburg, Russia; (A.E.S.); (Y.V.M.); (A.A.N.)
- Faculty of Biology, St. Petersburg State University (SPbSU), 199034 St. Petersburg, Russia
| | - Kirill S. Antonets
- Laboratory for Proteomics of Supra-Organismal Systems, All-Russia Research Institute for Agricultural Microbiology (ARRIAM), 196608 St. Petersburg, Russia; (A.E.S.); (Y.V.M.); (A.A.N.)
- Faculty of Biology, St. Petersburg State University (SPbSU), 199034 St. Petersburg, Russia
- Correspondence:
| |
Collapse
|
2
|
Moustafa AM, Lal A, Planet PJ. Comparative genomics in infectious disease. Curr Opin Microbiol 2020; 53:61-70. [PMID: 32248056 DOI: 10.1016/j.mib.2020.02.009] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Revised: 02/23/2020] [Accepted: 02/24/2020] [Indexed: 02/07/2023]
Abstract
With more than one million bacterial genome sequences uploaded to public databases in the last 25 years, genomics has become a powerful tool for studying bacterial biology. Here, we review recent approaches that leverage large numbers of whole genome sequences to decipher the spread and pathogenesis of bacterial infectious diseases.
Collapse
Affiliation(s)
- Ahmed M Moustafa
- Division of Pediatric Infectious Diseases, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Arnav Lal
- School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Paul J Planet
- Division of Pediatric Infectious Diseases, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Pediatrics, Perelman College of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY 10024, USA.
| |
Collapse
|
3
|
Debray K, Marie-Magdelaine J, Ruttink T, Clotault J, Foucher F, Malécot V. Identification and assessment of variable single-copy orthologous (SCO) nuclear loci for low-level phylogenomics: a case study in the genus Rosa (Rosaceae). BMC Evol Biol 2019; 19:152. [PMID: 31340752 PMCID: PMC6657147 DOI: 10.1186/s12862-019-1479-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Accepted: 07/16/2019] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND With an ever-growing number of published genomes, many low levels of the Tree of Life now contain several species with enough molecular data to perform shallow-scale phylogenomic studies. Moving away from using just a few universal phylogenetic markers, we can now target thousands of other loci to decipher taxa relationships. Making the best possible selection of informative sequences regarding the taxa studied has emerged as a new issue. Here, we developed a general procedure to mine genomic data, looking for orthologous single-copy loci capable of deciphering phylogenetic relationships below the generic rank. To develop our strategy, we chose the genus Rosa, a rapid-evolving lineage of the Rosaceae family in which several species genomes have recently been sequenced. We also compared our loci to conventional plastid markers, commonly used for phylogenetic inference in this genus. RESULTS We generated 1856 sequence tags in putative single-copy orthologous nuclear loci. Associated in silico primer pairs can potentially amplify fragments able to resolve a wide range of speciation events within the genus Rosa. Analysis of parsimony-informative site content showed the value of non-coding genomic regions to obtain variable sequences despite the fact that they may be more difficult to target in less related species. Dozens of nuclear loci outperform the conventional plastid phylogenetic markers in terms of phylogenetic informativeness, for both recent and ancient evolutionary divergences. However, conflicting phylogenetic signals were found between nuclear gene tree topologies and the species-tree topology, shedding light on the many patterns of hybridization and/or incomplete lineage sorting that occur in the genus Rosa. CONCLUSIONS With recently published genome sequence data, we developed a set of single-copy orthologous nuclear loci to resolve species-level phylogenomics in the genus Rosa. This genome-wide scale dataset contains hundreds of highly variable loci which phylogenetic interest was assessed in terms of phylogenetic informativeness and topological conflict. Our target identification procedure can easily be reproduced to identify new highly informative loci for other taxonomic groups and ranks.
Collapse
Affiliation(s)
- Kevin Debray
- IRHS, Agrocampus-Ouest, INRA, UNIV Angers, SFR 4207 QuaSaV, Beaucouzé, France.
| | | | - Tom Ruttink
- ILVO, Flanders Research Institute for Agriculture, Fisheries and Food, Plant Sciences Unit, Melle, Belgium
| | - Jérémy Clotault
- IRHS, Agrocampus-Ouest, INRA, UNIV Angers, SFR 4207 QuaSaV, Beaucouzé, France
| | - Fabrice Foucher
- IRHS, Agrocampus-Ouest, INRA, UNIV Angers, SFR 4207 QuaSaV, Beaucouzé, France
| | - Valéry Malécot
- IRHS, Agrocampus-Ouest, INRA, UNIV Angers, SFR 4207 QuaSaV, Beaucouzé, France.
| |
Collapse
|
4
|
Laumer CE. Inferring Ancient Relationships with Genomic Data: A Commentary on Current Practices. Integr Comp Biol 2019; 58:623-639. [PMID: 29982611 DOI: 10.1093/icb/icy075] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Contemporary phylogeneticists enjoy an embarrassment of riches, not only in the volumes of data now available, but also in the diversity of bioinformatic tools for handling these data. Here, I discuss a subset of these tools I consider well-suited to the task of inferring ancient relationships with coding sequence data in particular, encompassing data generation, orthology assignment, alignment and gene tree inference, supermatrix construction, and analysis under the best-fitting models applicable to large-scale datasets. Throughout, I compare and critique methods, considering both their theoretical principles and the details of their implementation, and offering practical tips on usage where appropriate. I also entertain different motivations for analyzing what are almost always originally DNA sequence data as codons, amino acids, and higher-order recodings. Although presented in a linear order, I see value in using the diversity of tools available to us to assess the sensitivity of clades of biological interest to different gene and taxon sets and analytical modes, which can be an indication of the presence of systematic error, of which a few forms remain poorly controlled by even the best available inference methods.
Collapse
Affiliation(s)
- Christopher E Laumer
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, EBML-EBI South Building, Hinxton CB10 1SD, UK
| |
Collapse
|
5
|
Dornburg A, Townsend JP, Wang Z. Maximizing Power in Phylogenetics and Phylogenomics: A Perspective Illuminated by Fungal Big Data. ADVANCES IN GENETICS 2017; 100:1-47. [PMID: 29153398 DOI: 10.1016/bs.adgen.2017.09.007] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Since its original inception over 150 years ago by Darwin, we have made tremendous progress toward the reconstruction of the Tree of Life. In particular, the transition from analyzing datasets comprised of small numbers of loci to those comprised of hundreds of loci, if not entire genomes, has aided in resolving some of the most vexing of evolutionary problems while giving us a new perspective on biodiversity. Correspondingly, phylogenetic trees have taken a central role in fields that span ecology, conservation, and medicine. However, the rise of big data has also presented phylogenomicists with a new set of challenges to experimental design, quantitative analyses, and computation. The sequencing of a number of very first genomes presented significant challenges to phylogenetic inference, leading fungal phylogenomicists to begin addressing pitfalls and postulating solutions to the issues that arise from genome-scale analyses relevant to any lineage across the Tree of Life. Here we highlight insights from fungal phylogenomics for topics including systematics and species delimitation, ecological and phenotypic diversification, and biogeography while providing an overview of progress made on the reconstruction of the fungal Tree of Life. Finally, we provide a review of considerations to phylogenomic experimental design for robust tree inference. We hope that this special issue of Advances in Genetics not only excites the continued progress of fungal evolutionary biology but also motivates the interdisciplinary development of new theory and methods designed to maximize the power of genomic scale data in phylogenetic analyses.
Collapse
Affiliation(s)
- Alex Dornburg
- North Carolina Museum of Natural Sciences, Raleigh, NC, United States
| | | | - Zheng Wang
- Yale University, New Haven, CT, United States.
| |
Collapse
|
6
|
Clusterflock: a flocking algorithm for isolating congruent phylogenomic datasets. Gigascience 2016; 5:44. [PMID: 27776538 PMCID: PMC5078944 DOI: 10.1186/s13742-016-0152-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2015] [Accepted: 10/12/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Collective animal behavior, such as the flocking of birds or the shoaling of fish, has inspired a class of algorithms designed to optimize distance-based clusters in various applications, including document analysis and DNA microarrays. In a flocking model, individual agents respond only to their immediate environment and move according to a few simple rules. After several iterations the agents self-organize, and clusters emerge without the need for partitional seeds. In addition to its unsupervised nature, flocking offers several computational advantages, including the potential to reduce the number of required comparisons. FINDINGS In the tool presented here, Clusterflock, we have implemented a flocking algorithm designed to locate groups (flocks) of orthologous gene families (OGFs) that share an evolutionary history. Pairwise distances that measure phylogenetic incongruence between OGFs guide flock formation. We tested this approach on several simulated datasets by varying the number of underlying topologies, the proportion of missing data, and evolutionary rates, and show that in datasets containing high levels of missing data and rate heterogeneity, Clusterflock outperforms other well-established clustering techniques. We also verified its utility on a known, large-scale recombination event in Staphylococcus aureus. By isolating sets of OGFs with divergent phylogenetic signals, we were able to pinpoint the recombined region without forcing a pre-determined number of groupings or defining a pre-determined incongruence threshold. CONCLUSIONS Clusterflock is an open-source tool that can be used to discover horizontally transferred genes, recombined areas of chromosomes, and the phylogenetic 'core' of a genome. Although we used it here in an evolutionary context, it is generalizable to any clustering problem. Users can write extensions to calculate any distance metric on the unit interval, and can use these distances to 'flock' any type of data.
Collapse
|