1
|
Rakesh S, Aravind L, Krishnan A. Reappraisal of the DNA phosphorothioate modification machinery: uncovering neglected functional modalities and identification of new counter-invader defense systems. Nucleic Acids Res 2024; 52:1005-1026. [PMID: 38163645 PMCID: PMC10853773 DOI: 10.1093/nar/gkad1213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 12/03/2023] [Accepted: 12/10/2023] [Indexed: 01/03/2024] Open
Abstract
The DndABCDE systems catalysing the unusual phosphorothioate (PT) DNA backbone modification, and the DndFGH systems, which restrict invasive DNA, have enigmatic and paradoxical features. Using comparative genomics and sequence-structure analyses, we show that the DndABCDE module is commonly functionally decoupled from the DndFGH module. However, the modification gene-neighborhoods encode other nucleases, potentially acting as the actual restriction components or suicide effectors limiting propagation of the selfish elements. The modification module's core consists of a coevolving gene-pair encoding the DNA-scanning apparatus - a DndD/CxC-clade ABC ATPase and DndE with two ribbon-helix-helix (MetJ/Arc) DNA-binding domains. Diversification of DndE's DNA-binding interface suggests a multiplicity of target specificities. Additionally, many systems feature DNA cytosine methylase genes instead of PT modification, indicating the DndDE core can recruit other nucleobase modifications. We show that DndFGH is a distinct counter-invader system with several previously uncharacterized domains, including a nucleotide kinase. These likely trigger its restriction endonuclease domain in response to multiple stimuli, like nucleotides, while blocking protective modifications by invader methylases. Remarkably, different DndH variants contain a HerA/FtsK ATPase domain acquired from multiple sources, including cellular genome-segregation systems and mobile elements. Thus, we uncovered novel HerA/FtsK-dependent defense systems that might intercept invasive DNA during replication, conjugation, or packaging.
Collapse
Affiliation(s)
- Siuli Rakesh
- Department of Biological Sciences, Indian Institute of Science Education and Research Berhampur (IISER Berhampur), Berhampur 760010, India
| | - L Aravind
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20894, USA
| | - Arunkumar Krishnan
- Department of Biological Sciences, Indian Institute of Science Education and Research Berhampur (IISER Berhampur), Berhampur 760010, India
| |
Collapse
|
2
|
Sarkar BK, Bhattacharya M, Agoramoorthy G, Dhama K, Chakraborty C. Entropy-Driven, Integrative Bioinformatics Approaches Reveal the Recent Transmission of the Monkeypox Virus from Nigeria to Multiple Non-African Countries. Mol Biotechnol 2023:10.1007/s12033-023-00889-7. [PMID: 37798393 DOI: 10.1007/s12033-023-00889-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Accepted: 09/06/2023] [Indexed: 10/07/2023]
Abstract
Monkeypox virus (mpox) has currently affected multiple countries around the globe. This study aims to analyze how the virus spread globally. The study uses entropy-driven bioinformatics in five directions to analyze the 60 full-length complete genomes of mpox. We analyzed the topological entropy distribution of the genomes, principal component analysis (PCA), the dissimilarity matrix, entropy-driven phylogenetics, and genome clustering. The topological entropy distribution showed genome positional entropy. We found five clusters of the mpox genomes through the two PCA, while the three PCA elucidated the clustering events in 3D space. The clustering of genomes was further confirmed through the dissimilarity matrix and phylogenetic analysis which showed the bigger size of Cluster 1 and size similarity between Clusters 2 and 4 as well as Clusters 3 and 5. It corroborated with the phylogenetics of the genomes, where Cluster 1 showed clear segregation from the other four clusters. Finally, the study concluded that the spreading of the mpox is likely to have originated from African countries to the rest of the non-African countries. Overall, the spreading and distribution of the mpox will shed light on its evolution and pathogenicity of the mpox and help to adopt preventive measures to stop the spreading of the virus.
Collapse
Affiliation(s)
- Bimal Kumar Sarkar
- Department of Physics, Adamas University, Kolkata, West Bengal, 700126, India
| | - Manojit Bhattacharya
- Department of Zoology, Fakir Mohan University, Vyasa Vihar, Balasore, 756020, Odisha, India
| | | | - Kuldeep Dhama
- Division of Pathology, ICAR-Indian Veterinary Research Institute, Izatnagar, Bareilly, Uttar Pradesh, 243122, India.
| | - Chiranjib Chakraborty
- Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, Kolkata, West Bengal, 700126, India.
| |
Collapse
|
3
|
Sengupta S, Azad RK. Leveraging comparative genomics to uncover alien genes in bacterial genomes. Microb Genom 2023; 9:mgen000939. [PMID: 36748570 PMCID: PMC9973850 DOI: 10.1099/mgen.0.000939] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
A significant challenge in bacterial genomics is to catalogue genes acquired through the evolutionary process of horizontal gene transfer (HGT). Both comparative genomics and sequence composition-based methods have often been invoked to quantify horizontally acquired genes in bacterial genomes. Comparative genomics methods rely on completely sequenced genomes and therefore the confidence in their predictions increases as the databases become more enriched in completely sequenced genomes. Recent developments including in microbial genome sequencing call for reassessment of alien genes based on information-rich resources currently available. We revisited the comparative genomics approach and developed a new algorithm for alien gene detection. Our algorithm compared favourably with the existing comparative genomics-based methods and is capable of detecting both recent and ancient transfers. It can be used as a standalone tool or in concert with other complementary algorithms for comprehensively cataloguing alien genes in bacterial genomes.
Collapse
Affiliation(s)
- Soham Sengupta
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, Texas, 76203, USA
| | - Rajeev K Azad
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, Texas, 76203, USA.,Department of Mathematics, University of North Texas, Denton, Texas, 76203, USA
| |
Collapse
|
4
|
Sengupta S, Azad RK. Reconstructing horizontal gene flow network to understand prokaryotic evolution. Open Biol 2022; 12:220169. [PMID: 36446404 PMCID: PMC9708380 DOI: 10.1098/rsob.220169] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
Horizontal gene transfer (HGT) is a major source of phenotypic innovation and a mechanism of niche adaptation in prokaryotes. Quantification of HGT is critical to decipher its myriad roles in microbial evolution and adaptation. Advances in genome sequencing and bioinformatics have augmented our ability to understand the microbial world, particularly the direct or indirect influence of HGT on diverse life forms. Methods for detecting HGT can be classified into phylogenetic-based and parametric or composition-based approaches. Here, we exploited the complementary strengths of both the approaches to construct a high confidence horizontal gene flow network. Our network is unique in its ability to detect the transfer of native genes of a genome to genomes from other taxa, thus establishing donor and recipient organisms (taxa), rather than through a post hoc analysis as is the practice with several other approaches. The scale-free horizontal gene flow network presented here provides new insights into modes of transfer for the exchange of genetic information and also illuminates differential gene flow across phyla.
Collapse
Affiliation(s)
- Soham Sengupta
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX 76203, USA
| | - Rajeev K. Azad
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX 76203, USA,Department of Mathematics, University of North Texas, Denton, TX 76203, USA
| |
Collapse
|
5
|
Sengupta S, Azad RK. Reconstructing horizontal gene flow network to understand prokaryotic evolution. Open Biol 2022. [PMID: 36446404 DOI: 10.6084/m9.figshare.c.6307519] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
Horizontal gene transfer (HGT) is a major source of phenotypic innovation and a mechanism of niche adaptation in prokaryotes. Quantification of HGT is critical to decipher its myriad roles in microbial evolution and adaptation. Advances in genome sequencing and bioinformatics have augmented our ability to understand the microbial world, particularly the direct or indirect influence of HGT on diverse life forms. Methods for detecting HGT can be classified into phylogenetic-based and parametric or composition-based approaches. Here, we exploited the complementary strengths of both the approaches to construct a high confidence horizontal gene flow network. Our network is unique in its ability to detect the transfer of native genes of a genome to genomes from other taxa, thus establishing donor and recipient organisms (taxa), rather than through a post hoc analysis as is the practice with several other approaches. The scale-free horizontal gene flow network presented here provides new insights into modes of transfer for the exchange of genetic information and also illuminates differential gene flow across phyla.
Collapse
Affiliation(s)
- Soham Sengupta
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX 76203, USA
| | - Rajeev K Azad
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX 76203, USA.,Department of Mathematics, University of North Texas, Denton, TX 76203, USA
| |
Collapse
|
6
|
Burks DJ, Azad RK. Mapping Strengths and Weaknesses of Different Clustering Approaches to Deciphering Bacterial Chimerism. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2022; 26:422-439. [PMID: 35925817 DOI: 10.1089/omi.2022.0062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Bacterial genomes are chimeras of DNA of different ancestries. Deconstructing chimeric genomes is central to understanding the evolutionary trajectories of their disparate components and thus the organisms as a whole in the light of their evolutionary contexts. Of specific interest is to delineate and quantify native (vertically inherited) and alien (horizontally acquired) components of bacterial genomes and also specify genomic fractions that represent different donor sources. An agglomerative clustering procedure that prioritizes grouping of proximal similar genomic segments has previously been invoked for this purpose in conjunction with a recursive segmentation procedure. Surprisingly, however, the relative strengths and weaknesses of different clustering approaches to deciphering bacterial chimerism have not yet been investigated, despite the need to robustly interpret tens of thousands of completely sequenced bacterial genomes and nearly complete genome assemblies available in the public databases. To bridge this knowledge gap and develop more robust approaches, we assessed different clustering methods, including segment order based (proximal) clustering, hierarchical clustering, affinity propagation clustering, and a novel network clustering approach on chimeric genomes modeled after bacterial genomes representing a broad spectrum of compositional complexity. Although segment order-based clustering and network clustering compared favorably with the other approaches in discriminating between native and alien DNA at genome optimized settings, network clustering did consistently better than other methods at parametric settings optimized on all test genomes together. Segment order-based clustering and hierarchical clustering outperformed other methods in alien DNA identification while preserving donor identity in the genomes. Our study highlights the strengths and weaknesses of different approaches and suggests how this can be leveraged to achieve a more robust deconstruction of bacterial chimerism.
Collapse
Affiliation(s)
- David J Burks
- Department of Biological Sciences, BioDiscovery Institute, University of North Texas, Denton, Texas, USA
| | - Rajeev K Azad
- Department of Biological Sciences, BioDiscovery Institute, University of North Texas, Denton, Texas, USA
- Department of Mathematics, University of North Texas, Denton, Texas, USA
| |
Collapse
|
7
|
Pandey RS, Azad RK. A Protocol for Horizontally Acquired Metabolic Gene Detection in Algae. Methods Mol Biol 2022; 2396:61-69. [PMID: 34786676 DOI: 10.1007/978-1-0716-1822-6_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Horizontal gene transfer (HGT) or lateral gene transfer (LGT), the exchange of genetic materials among organisms by means of other than parent-to-offspring (vertical) inheritance, plays a major role in prokaryotic genome evolution, facilitating adaptation of prokaryotes to changes in the environment. Phylogenetic methods have been frequently invoked to catalog horizontally acquired genes; however, these methods are often constrained by the paucity of sequenced genomes of close relatives (and even distant relatives) for a robust analysis and reliable inference. In this chapter, we describe a HGT quantification protocol that exploits the complementary strengths of the integrative segmentation and clustering method and the comparative genomics approach to identify foreign genes. Users can use this pipeline in combination with phylogenetic tree reconstruction to identify foreign genes that are supported by multiple lines of evidence, that is, atypical composition, atypical distribution in close relatives, and aberrant phylogenetic pattern.
Collapse
Affiliation(s)
- Ravi S Pandey
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Rajeev K Azad
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX, USA.
- Department of Mathematics, University of North Texas, Denton, TX, USA.
| |
Collapse
|
8
|
Ibtehaz N, Ahmed I, Ahmed MS, Rahman MS, Azad RK, Bayzid MS. SSG-LUGIA: Single Sequence based Genome Level Unsupervised Genomic Island Prediction Algorithm. Brief Bioinform 2021; 22:6290171. [PMID: 34058749 DOI: 10.1093/bib/bbab116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 03/11/2021] [Accepted: 03/13/2021] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Genomic Islands (GIs) are clusters of genes that are mobilized through horizontal gene transfer. GIs play a pivotal role in bacterial evolution as a mechanism of diversification and adaptation to different niches. Therefore, identification and characterization of GIs in bacterial genomes is important for understanding bacterial evolution. However, quantifying GIs is inherently difficult, and the existing methods suffer from low prediction accuracy and precision-recall trade-off. Moreover, several of them are supervised in nature, and thus, their applications to newly sequenced genomes are riddled with their dependency on the functional annotation of existing genomes. RESULTS We present SSG-LUGIA, a completely automated and unsupervised approach for identifying GIs and horizontally transferred genes. SSG-LUGIA is a novel method based on unsupervised anomaly detection technique, accompanied by further refinement using cues from signal processing literature. SSG-LUGIA leverages the atypical compositional biases of the alien genes to localize GIs in prokaryotic genomes. SSG-LUGIA was assessed on a large benchmark dataset `IslandPick' and on a set of 15 well-studied genomes in the literature and followed by a thorough analysis on the well-understood Salmonella typhi CT18 genome. Furthermore, the efficacy of SSG-LUGIA in identifying horizontally transferred genes was evaluated on two additional bacterial genomes, namely, those of Corynebacterium diphtheria NCTC13129 and Pseudomonas aeruginosa LESB58. SSG-LUGIA was examined on draft genomes and was demonstrated to be efficient as an ensemble method. CONCLUSIONS Our results indicate that SSG-LUGIA achieved superior performance in comparison to frequently used existing methods. Importantly, it yielded a better trade-off between precision and recall than the existing methods. Its nondependency on the functional annotation of genomes makes it suitable for analyzing newly sequenced, yet uncharacterized genomes. Thus, our study is a significant advance in identification of GIs and horizontally transferred genes. SSG-LUGIA is available as an open source software at https://nibtehaz.github.io/SSG-LUGIA/.
Collapse
Affiliation(s)
| | - Ishtiaque Ahmed
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| | - Md Sabbir Ahmed
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| | - M Sohel Rahman
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| | - Rajeev K Azad
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, TX, USA.,Department of Mathematics, University of North Texas, Denton, TX, USA
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| |
Collapse
|
9
|
Cavassim MIA, Moeskjær S, Moslemi C, Fields B, Bachmann A, Vilhjálmsson BJ, Schierup MH, W. Young JP, Andersen SU. Symbiosis genes show a unique pattern of introgression and selection within a Rhizobium leguminosarum species complex. Microb Genom 2020; 6:e000351. [PMID: 32176601 PMCID: PMC7276703 DOI: 10.1099/mgen.0.000351] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Accepted: 02/17/2020] [Indexed: 12/22/2022] Open
Abstract
Rhizobia supply legumes with fixed nitrogen using a set of symbiosis genes. These can cross rhizobium species boundaries, but it is unclear how many other genes show similar mobility. Here, we investigate inter-species introgression using de novo assembly of 196 Rhizobium leguminosarum sv. trifolii genomes. The 196 strains constituted a five-species complex, and we calculated introgression scores based on gene-tree traversal to identify 171 genes that frequently cross species boundaries. Rather than relying on the gene order of a single reference strain, we clustered the introgressing genes into four blocks based on population structure-corrected linkage disequilibrium patterns. The two largest blocks comprised 125 genes and included the symbiosis genes, a smaller block contained 43 mainly chromosomal genes, and the last block consisted of three genes with variable genomic location. All introgression events were likely mediated by conjugation, but only the genes in the symbiosis linkage blocks displayed overrepresentation of distinct, high-frequency haplotypes. The three genes in the last block were core genes essential for symbiosis that had, in some cases, been mobilized on symbiosis plasmids. Inter-species introgression is thus not limited to symbiosis genes and plasmids, but other cases are infrequent and show distinct selection signatures.
Collapse
Affiliation(s)
- Maria Izabel A. Cavassim
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Sara Moeskjær
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Camous Moslemi
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | | | - Asger Bachmann
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | | | | | | | - Stig U. Andersen
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| |
Collapse
|
10
|
Pinilla-Redondo R, Cyriaque V, Jacquiod S, Sørensen SJ, Riber L. Monitoring plasmid-mediated horizontal gene transfer in microbiomes: recent advances and future perspectives. Plasmid 2018; 99:56-67. [PMID: 30086339 DOI: 10.1016/j.plasmid.2018.08.002] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Revised: 07/31/2018] [Accepted: 08/01/2018] [Indexed: 10/28/2022]
Abstract
The emergence of antimicrobial resistant bacteria constitutes an increasing global health concern. Although it is well recognized that the cornerstone underlying this phenomenon is the dissemination of antimicrobial resistance via plasmids and other mobile genetic elements, the antimicrobial resistance transfer routes remain largely uncharted. In this review, we describe different methods for assessing the transfer frequency and host ranges of plasmids within complex microbiomes. The discussion is centered around the critical evaluation of recent advances for monitoring the fate of fluorescently tagged plasmids in bacterial communities through the coupling of fluorescence activated cell sorting and next generation sequencing techniques. We argue that this approach constitutes an exceptional tool for obtaining quantitative data regarding the extent of plasmid transfer, key disseminating taxa, and possible propagation routes. The integration of this information will provide valuable insights on how to develop alternative avenues for fighting the rise of antimicrobial resistant pathogens, as well as the means for constructing more comprehensive risk assessment models.
Collapse
Affiliation(s)
| | - Valentine Cyriaque
- Proteomics and Microbiology Lab, Research Institute for Biosciences, UMONS, Mons, Belgium
| | | | - Søren J Sørensen
- Section of Microbiology, University of Copenhagen, Copenhagen, Denmark
| | - Leise Riber
- Section for Functional Genomics, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
11
|
Jani M, Sengupta S, Hu K, Azad RK. Deciphering pathogenicity and antibiotic resistance islands in methicillin-resistant Staphylococcus aureus genomes. Open Biol 2018; 7:rsob.170094. [PMID: 29263245 PMCID: PMC5746543 DOI: 10.1098/rsob.170094] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2017] [Accepted: 11/16/2017] [Indexed: 01/16/2023] Open
Abstract
Staphylococcus aureus is a versatile pathogen that is capable of causing infections in both humans and animals. It can cause furuncles, septicaemia, pneumonia and endocarditis. Adaptation of S. aureus to the modern hospital environment has been facilitated, in part, by the horizontal acquisition of drug resistance genes, such as mecA gene that imparts resistance to methicillin. Horizontal acquisitions of islands of genes harbouring virulence and antibiotic resistance genes have made S. aureus resistant to commonly used antibiotics. To decipher genomic islands (GIs) in 22 hospital- and 9 community-associated methicillin-resistant S. aureus strains and classify a subset of GIs carrying virulence and resistance genes as pathogenicity and resistance islands respectively, we applied a host of methods for localizing genomic islands in prokaryotic genomes. Surprisingly, none of the frequently used GI prediction methods could perform well in delineating the resistance islands in the S. aureus genomes. Rather, a gene clustering procedure exploiting biases in codon usage for identifying horizontally transferred genes outperformed the current methods for GI detection, in particular in identifying the known islands in S. aureus including the SCCmec island that harbours the mecA resistance gene. The gene clustering approach also identified novel, as yet unreported islands, with many of these found to harbour virulence and/or resistance genes. These as yet unexplored islands may provide valuable information on the evolution of drug resistance in S. aureus.
Collapse
Affiliation(s)
- Mehul Jani
- Department of Biological Sciences, University of North Texas, Denton, TX 76203, USA
| | - Soham Sengupta
- Department of Biological Sciences, University of North Texas, Denton, TX 76203, USA
| | - Kelsey Hu
- Texas Academy of Mathematics and Science, University of North Texas, Denton, TX 76203, USA
| | - Rajeev K Azad
- Department of Biological Sciences, University of North Texas, Denton, TX 76203, USA .,Department of Mathematics, University of North Texas, Denton, TX 76203, USA
| |
Collapse
|
12
|
Godde JS, Baichoo S, Mungloo-Dilmohamud Z, Jaufeerally-Fakim Y. Comparison of genomic islands in cyanobacteria: Evidence of bacteriophage-mediated horizontal gene transfer from eukaryotes. Microbiol Res 2018; 211:31-46. [DOI: 10.1016/j.micres.2018.03.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2017] [Revised: 02/11/2018] [Accepted: 03/17/2018] [Indexed: 12/21/2022]
|
13
|
Ré MA, Azad RK. Generalization of entropy based divergence measures for symbolic sequence analysis. PLoS One 2014; 9:e93532. [PMID: 24728338 PMCID: PMC3984095 DOI: 10.1371/journal.pone.0093532] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2013] [Accepted: 03/04/2014] [Indexed: 11/26/2022] Open
Abstract
Entropy based measures have been frequently used in symbolic sequence analysis. A symmetrized and smoothed form of Kullback-Leibler divergence or relative entropy, the Jensen-Shannon divergence (JSD), is of particular interest because of its sharing properties with families of other divergence measures and its interpretability in different domains including statistical physics, information theory and mathematical statistics. The uniqueness and versatility of this measure arise because of a number of attributes including generalization to any number of probability distributions and association of weights to the distributions. Furthermore, its entropic formulation allows its generalization in different statistical frameworks, such as, non-extensive Tsallis statistics and higher order Markovian statistics. We revisit these generalizations and propose a new generalization of JSD in the integrated Tsallis and Markovian statistical framework. We show that this generalization can be interpreted in terms of mutual information. We also investigate the performance of different JSD generalizations in deconstructing chimeric DNA sequences assembled from bacterial genomes including that of E. coli, S. enterica typhi, Y. pestis and H. influenzae. Our results show that the JSD generalizations bring in more pronounced improvements when the sequences being compared are from phylogenetically proximal organisms, which are often difficult to distinguish because of their compositional similarity. While small but noticeable improvements were observed with the Tsallis statistical JSD generalization, relatively large improvements were observed with the Markovian generalization. In contrast, the proposed Tsallis-Markovian generalization yielded more pronounced improvements relative to the Tsallis and Markovian generalizations, specifically when the sequences being compared arose from phylogenetically proximal organisms.
Collapse
Affiliation(s)
- Miguel A. Ré
- Departamento de Ciencias Básicas, CIII - Facultad Regional Córdoba, Universidad Tecnológica Nacional, Córdoba, Argentina
- Facultad de Matemática, Astronomía y Física, Universidad Nacional de Córdoba, Córdoba, Argentina
| | - Rajeev K. Azad
- Department of Biological Sciences, University of North Texas, Denton, Texas, United States of America
- Department of Mathematics, University of North Texas, Denton, Texas, United States of America
- * E-mail:
| |
Collapse
|
14
|
Sjostrand J, Tofigh A, Daubin V, Arvestad L, Sennblad B, Lagergren J. A Bayesian Method for Analyzing Lateral Gene Transfer. Syst Biol 2014; 63:409-20. [DOI: 10.1093/sysbio/syu007] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
|
15
|
Kunik V, Ofran Y. The indistinguishability of epitopes from protein surface is explained by the distinct binding preferences of each of the six antigen-binding loops. Protein Eng Des Sel 2013; 26:599-609. [DOI: 10.1093/protein/gzt027] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
16
|
de Carvalho MO, Loreto ELS. Methods for detection of horizontal transfer of transposable elements in complete genomes. Genet Mol Biol 2012; 35:1078-84. [PMID: 23411916 PMCID: PMC3571429 DOI: 10.1590/s1415-47572012000600024] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Recent advances in nucleic acid sequencing technology are creating a diverse landscape for the analysis of horizontal transfer in complete genomes. Previously limited to prokaryotes, the availability of complete genomes from close eukaryotic species presents an opportunity to validate hypotheses about the patterns of evolution and mechanisms that drive horizontal transfer. Many of those methods can be transported from methods previously used in prokaryotic genomes, as the assumptions for horizontal transfer can be interpreted as the same. Some methods, however, require a complete adaptation, while others need refinements in sensitivity and specificity to deal with the huge datasets generated from next-generation sequencing technologies. Here we list the types of methods used for horizontal transfer detection, as well as theirs strengths and weakness.
Collapse
Affiliation(s)
- Marcos Oliveira de Carvalho
- Programa de Pós-Graduação em Genética e Biologia Molecular, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
| | | |
Collapse
|
17
|
Abstract
Since the emergence of high-throughput genome sequencing platforms and more recently the next-generation platforms, the genome databases are growing at an astronomical rate. Tremendous efforts have been invested in recent years in understanding intriguing complexities beneath the vast ocean of genomic data. This is apparent in the spurt of computational methods for interpreting these data in the past few years. Genomic data interpretation is notoriously difficult, partly owing to the inherent heterogeneities appearing at different scales. Methods developed to interpret these data often suffer from their inability to adequately measure the underlying heterogeneities and thus lead to confounding results. Here, we present an information entropy-based approach that unravels the distinctive patterns underlying genomic data efficiently and thus is applicable in addressing a variety of biological problems. We show the robustness and consistency of the proposed methodology in addressing three different biological problems of significance—identification of alien DNAs in bacterial genomes, detection of structural variants in cancer cell lines and alignment-free genome comparison.
Collapse
Affiliation(s)
- Rajeev K Azad
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA.
| | | |
Collapse
|
18
|
Xiong D, Xiao F, Liu L, Hu K, Tan Y, He S, Gao X. Towards a better detection of horizontally transferred genes by combining unusual properties effectively. PLoS One 2012; 7:e43126. [PMID: 22905214 PMCID: PMC3419211 DOI: 10.1371/journal.pone.0043126] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2012] [Accepted: 07/16/2012] [Indexed: 02/01/2023] Open
Abstract
Background Horizontal gene transfer (HGT) is one of the major mechanisms contributing to microbial genome diversification. A number of computational methods for finding horizontally transferred genes have been proposed in the past decades; however none of them has provided a reliable detector yet. In existing parametric approaches, only one single compositional property can participate in the detection process, or the results obtained through each single property are just simply combined. It’s known that different properties may mean different information, so the single property can’t sufficiently contain the information encoded by gene sequences. In addition, the class imbalance problem in the datasets, which also results in great errors for the gene detection, hasn’t been considered by the published methods. Here we developed an effective classifier system (Hgtident) that used support vector machine (SVM) by combining unusual properties effectively for HGT detection. Results Our approach Hgtident includes the introduction of more representative datasets, optimization of SVM model, feature selection, handling of imbalance problem in the datasets and extensive performance evaluation via systematic cross-validation methods. Through feature selection, we found that JS-DN and JS-CB have higher discriminating power for HGT detection, while GC1–GC3 and k-mer (k = 1, 2, …, 7) make the least contribution. Extensive experiments indicated the new classifier could reduce Mean error dramatically, and also improve Recall by a certain level. For the testing genomes, compared with the existing popular multiple-threshold approach, on average, our Recall and Mean error was respectively improved by 2.81% and reduced by 26.32%, which means that numerous false positives were identified correctly. Conclusions Hgtident introduced here is an effective approach for better detecting HGT. Combining multiple features of HGT is also essential for a wider range of HGT events detection.
Collapse
Affiliation(s)
- Dapeng Xiong
- Key Laboratory of Intelligent Computing & Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan, People’s Republic of China
| | - Fen Xiao
- Key Laboratory of Intelligent Computing & Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan, People’s Republic of China
| | - Li Liu
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, People’s Republic of China
| | - Kai Hu
- Key Laboratory of Intelligent Computing & Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan, People’s Republic of China
| | - Yanping Tan
- Key Laboratory of Intelligent Computing & Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan, People’s Republic of China
| | - Shunmin He
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, People’s Republic of China
- * E-mail: (SH); (XG)
| | - Xieping Gao
- Key Laboratory of Intelligent Computing & Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan, People’s Republic of China
- * E-mail: (SH); (XG)
| |
Collapse
|
19
|
Ménigaud S, Mallet L, Picord G, Churlaud C, Borrel A, Deschavanne P. GOHTAM: a website for 'Genomic Origin of Horizontal Transfers, Alignment and Metagenomics'. Bioinformatics 2012; 28:1270-1. [PMID: 22426345 PMCID: PMC3338014 DOI: 10.1093/bioinformatics/bts118] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
Motivation: This website allows the detection of horizontal transfers based on a combination of parametric methods and proposes an origin by researching neighbors in a bank of genomic signatures. This bank is also used to research an origin to DNA fragments from metagenomics studies. Results: Different services are provided like the possibility of inferring a phylogenetic tree with sequence signatures or comparing two genomes and displaying the rearrangements that happened since their separation. Availability and implementation:http://gohtam.rpbs.univ-paris-diderot.fr/ Contact:patrick.deschavanne@univ-paris-diderot.fr; ludovic.mallet@jouy.inra.fr Supplementary information:Supplementary data are available at Bioinformatics online http://gohtam.rpbs.univ-paris-diderot.fr:8080/Data/bin/GOHTAM_bin.tgz
Collapse
Affiliation(s)
- Sabine Ménigaud
- Molécules Thérapeutiques in silico, Institut National de la Santé et de la Recherche Médicale (INSERM) UMR-S 973, Université Paris Diderot, Sorbonne Paris Cité, 35 rue Héléne Brion, Paris, France
| | | | | | | | | | | |
Collapse
|
20
|
Abstract
Methods for identifying alien genes in genomes fall into two general classes. Phylogenetic methods examine the distribution of a gene's homologues among genomes to find those with relationships not consistent with vertical inheritance. These approaches include identifying orphan genes which lack homologues in closely related genomes and genes with unduly high levels of similarity to genes in otherwise unrelated genomes. Rigorous statistical tests are available to place confidence intervals for predicted alien genes. Parametric methods examine the compositional properties of genes within a genome to find those with atypical properties, likely indicating the directional mutational pressures of a donor genome. These methods may compare the properties of genes to genomic averages, properties of genes to each other, or properties of large, multigene regions of the chromosome. Here, we discuss the strengths and weaknesses of each approach.
Collapse
Affiliation(s)
- Rajeev K Azad
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, USA
| | | |
Collapse
|
21
|
Grassi L, Caselle M, Lercher MJ, Lagomarsino MC. Horizontal gene transfers as metagenomic gene duplications. MOLECULAR BIOSYSTEMS 2012; 8:790-5. [DOI: 10.1039/c2mb05330f] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
22
|
Yu JF, Xiao K, Jiang DK, Guo J, Wang JH, Sun X. An integrative method for identifying the over-annotated protein-coding genes in microbial genomes. DNA Res 2011; 18:435-49. [PMID: 21903723 PMCID: PMC3223076 DOI: 10.1093/dnares/dsr030] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
The falsely annotated protein-coding genes have been deemed one of the major causes accounting for the annotating errors in public databases. Although many filtering approaches have been designed for the over-annotated protein-coding genes, some are questionable due to the resultant increase in false negative. Furthermore, there is no webserver or software specifically devised for the problem of over-annotation. In this study, we propose an integrative algorithm for detecting the over-annotated protein-coding genes in microorganisms. Overall, an average accuracy of 99.94% is achieved over 61 microbial genomes. The extremely high accuracy indicates that the presented algorithm is efficient to differentiate the protein-coding genes from the non-coding open reading frames. Abundant analyses show that the predicting results are reliable and the integrative algorithm is robust and convenient. Our analysis also indicates that the over-annotated protein-coding genes can cause the false positive of horizontal gene transfers detection. The webserver of the proposed algorithm can be freely accessible from www.cbi.seu.edu.cn/RPGM.
Collapse
Affiliation(s)
- Jia-Feng Yu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China.
| | | | | | | | | | | |
Collapse
|
23
|
Schlüter A, Ruiz-Trillo I, Pujol A. Phylogenomic evidence for a myxococcal contribution to the mitochondrial fatty acid beta-oxidation. PLoS One 2011; 6:e21989. [PMID: 21760940 PMCID: PMC3131387 DOI: 10.1371/journal.pone.0021989] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2011] [Accepted: 06/09/2011] [Indexed: 11/26/2022] Open
Abstract
Background The origin of eukaryotes remains a fundamental question in evolutionary biology. Although it is clear that eukaryotic genomes are a chimeric combination of genes of eubacterial and archaebacterial ancestry, the specific ancestry of most eubacterial genes is still unknown. The growing availability of microbial genomes offers the possibility of analyzing the ancestry of eukaryotic genomes and testing previous hypotheses on their origins. Methodology/Principal Findings Here, we have applied a phylogenomic analysis to investigate a possible contribution of the Myxococcales to the first eukaryotes. We conducted a conservative pipeline with homologous sequence searches against a genomic sampling of 40 eukaryotic and 357 prokaryotic genomes. The phylogenetic reconstruction showed that several eukaryotic proteins traced to Myxococcales. Most of these proteins were associated with mitochondrial lipid intermediate pathways, particularly enzymes generating reducing equivalents with pivotal roles in fatty acid β-oxidation metabolism. Our data suggest that myxococcal species with the ability to oxidize fatty acids transferred several genes to eubacteria that eventually gave rise to the mitochondrial ancestor. Later, the eukaryotic nucleocytoplasmic lineage acquired those metabolic genes through endosymbiotic gene transfer. Conclusions/Significance Our results support a prokaryotic origin, different from α-proteobacteria, for several mitochondrial genes. Our data reinforce a fluid prokaryotic chromosome model in which the mitochondrion appears to be an important entry point for myxococcal genes to enter eukaryotes.
Collapse
Affiliation(s)
- Agatha Schlüter
- Neurometabolic Diseases Laboratory, Institut d'Investigació Biomèdica de Bellvitge (IDIBELL), L'Hospitalet de Llobregat, Barcelona, Spain
- Institut de Neuropatologia, Hospital Universitari de Bellvitge, Universitat de Barcelona, Barcelona, Spain
- Centro de Investigación en Red sobre Enfermedades Raras (CIBERER), Valencia, Spain
| | - Iñaki Ruiz-Trillo
- Departament de Genètica & Institut de Recerca en Biodiversitat (Irbio), Universitat de Barcelona, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Aurora Pujol
- Neurometabolic Diseases Laboratory, Institut d'Investigació Biomèdica de Bellvitge (IDIBELL), L'Hospitalet de Llobregat, Barcelona, Spain
- Institut de Neuropatologia, Hospital Universitari de Bellvitge, Universitat de Barcelona, Barcelona, Spain
- Centro de Investigación en Red sobre Enfermedades Raras (CIBERER), Valencia, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
- * E-mail:
| |
Collapse
|
24
|
Tofigh A, Hallett M, Lagergren J. Simultaneous identification of duplications and lateral gene transfers. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:517-535. [PMID: 21233529 DOI: 10.1109/tcbb.2010.14] [Citation(s) in RCA: 97] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
The incongruency between a gene tree and a corresponding species tree can be attributed to evolutionary events such as gene duplication and gene loss. This paper describes a combinatorial model where so-called DTL-scenarios are used to explain the differences between a gene tree and a corresponding species tree taking into account gene duplications, gene losses, and lateral gene transfers (also known as horizontal gene transfers). The reasonable biological constraint that a lateral gene transfer may only occur between contemporary species leads to the notion of acyclic DTL-scenarios. Parsimony methods are introduced by defining appropriate optimization problems. We show that finding most parsimonious acyclic DTL-scenarios is NP-hard. However, by dropping the condition of acyclicity, the problem becomes tractable, and we provide a dynamic programming algorithm as well as a fixed-parameter tractable algorithm for finding most parsimonious DTL-scenarios.
Collapse
Affiliation(s)
- Ali Tofigh
- KTH Royal Institute of Technology, Department of Computational Biology, Stockholm, Sweden.
| | | | | |
Collapse
|
25
|
Abstract
Because the properties of horizontally-transferred genes will reflect the mutational proclivities of their donor genomes, they often show atypical compositional properties relative to native genes. Parametric methods use these discrepancies to identify bacterial genes recently acquired by horizontal transfer. However, compositional patterns of native genes vary stochastically, leaving no clear boundary between typical and atypical genes. As a result, while strongly atypical genes are readily identified as alien, genes of ambiguous character are poorly classified when a single threshold separates typical and atypical genes. This limitation affects all parametric methods that examine genes independently, and escaping it requires the use of additional genomic information. We propose that the performance of all parametric methods can be improved by using a multiple-threshold approach. First, strongly atypical alien genes and strongly typical native genes would be identified using conservative thresholds. Genes with ambiguous compositional features would then be classified by examining gene context, including the class (native or alien) of flanking genes. By including additional genomic information in a multiple-threshold framework, we observed a remarkable improvement in the performance of several popular, but algorithmically distinct, methods for alien gene detection.
Collapse
Affiliation(s)
- Rajeev K Azad
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | | |
Collapse
|
26
|
Bavishi A, Abhishek A, Lin L, Choudhary M. Complex prokaryotic genome structure: rapid evolution of chromosome II. Genome 2011; 53:675-87. [PMID: 20924417 DOI: 10.1139/g10-046] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Although many bacteria with two chromosomes have been sequenced, the roles of such complex genome structuring are still unclear. To uncover levels of chromosome I (CI) and chromosome II (CII) sequence divergence, Mauve 2.2.0 was used to align the CI- and CII-specific sequences of bacteria with complex genome structuring in two sets of comparisons: the first set was conducted among the CI and CII of bacterial strains of the same species, while the second set was conducted among the CI and CII of species in Alphaproteobacteria that possess two chromosomes. The analyses revealed a rapid evolution of CII-specific DNA sequences compared with CI-specific sequences in a majority of organisms. In addition, levels of protein divergence between CI-specific and CII-specific genes were determined using phylogenetic analyses and confirmed the DNA alignment findings. Analysis of synonymous and nonsynonymous substitutions revealed that the structural and functional constraints on CI and CII genes are not significantly different. Also, horizontal gene transfer estimates in selected organisms demonstrated that CII in many species has acquired higher levels of horizontally transferred segments than CI. In summary, rapid evolution of CII may perform particular roles for organisms such as aiding in adapting to specialized niches.
Collapse
Affiliation(s)
- Anish Bavishi
- Department of Biological Sciences, Sam Houston State University, Huntsville, TX 77341, USA
| | | | | | | |
Collapse
|
27
|
Treangen TJ, Rocha EPC. Horizontal transfer, not duplication, drives the expansion of protein families in prokaryotes. PLoS Genet 2011; 7:e1001284. [PMID: 21298028 PMCID: PMC3029252 DOI: 10.1371/journal.pgen.1001284] [Citation(s) in RCA: 319] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2010] [Accepted: 12/20/2010] [Indexed: 01/09/2023] Open
Abstract
Gene duplication followed by neo- or sub-functionalization deeply impacts the evolution of protein families and is regarded as the main source of adaptive functional novelty in eukaryotes. While there is ample evidence of adaptive gene duplication in prokaryotes, it is not clear whether duplication outweighs the contribution of horizontal gene transfer in the expansion of protein families. We analyzed closely related prokaryote strains or species with small genomes (Helicobacter, Neisseria, Streptococcus, Sulfolobus), average-sized genomes (Bacillus, Enterobacteriaceae), and large genomes (Pseudomonas, Bradyrhizobiaceae) to untangle the effects of duplication and horizontal transfer. After removing the effects of transposable elements and phages, we show that the vast majority of expansions of protein families are due to transfer, even among large genomes. Transferred genes--xenologs--persist longer in prokaryotic lineages possibly due to a higher/longer adaptive role. On the other hand, duplicated genes--paralogs--are expressed more, and, when persistent, they evolve slower. This suggests that gene transfer and gene duplication have very different roles in shaping the evolution of biological systems: transfer allows the acquisition of new functions and duplication leads to higher gene dosage. Accordingly, we show that paralogs share most protein-protein interactions and genetic regulators, whereas xenologs share very few of them. Prokaryotes invented most of life's biochemical diversity. Therefore, the study of the evolution of biology systems should explicitly account for the predominant role of horizontal gene transfer in the diversification of protein families.
Collapse
Affiliation(s)
- Todd J Treangen
- Institut Pasteur, Microbial Evolutionary Genomics, Département Génomes et Génétique, Paris, France.
| | | |
Collapse
|
28
|
Parnell JJ, Rompato G, Latta LC, Pfrender ME, Van Nostrand JD, He Z, Zhou J, Andersen G, Champine P, Ganesan B, Weimer BC. Functional biogeography as evidence of gene transfer in hypersaline microbial communities. PLoS One 2010; 5:e12919. [PMID: 20957119 PMCID: PMC2950788 DOI: 10.1371/journal.pone.0012919] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2010] [Accepted: 08/27/2010] [Indexed: 02/03/2023] Open
Abstract
BACKGROUND Horizontal gene transfer (HGT) plays a major role in speciation and evolution of bacteria and archaea by controlling gene distribution within an environment. However, information that links HGT to a natural community using relevant population-genetics parameters and spatial considerations is scarce. The Great Salt Lake (Utah, USA) provides an excellent model for studying HGT in the context of biogeography because it is a contiguous system with dispersal limitations due to a strong selective salinity gradient. We hypothesize that in spite of the barrier to phylogenetic dispersal, functional characteristics--in the form of HGT--expand beyond phylogenetic limitations due to selective pressure. METHODOLOGY AND RESULTS To assay the functional genes and microorganisms throughout the GSL, we used a 16S rRNA oligonucleotide microarray (Phylochip) and a functional gene array (GeoChip) to measure biogeographic patterns of nine microbial communities. We found a significant difference in biogeography based on microarray analyses when comparing Sørensen similarity values for presence/absence of function and phylogeny (Student's t-test; p = 0.005). CONCLUSION AND SIGNIFICANCE Biogeographic patterns exhibit behavior associated with horizontal gene transfer in that informational genes (16S rRNA) have a lower similarity than functional genes, and functional similarity is positively correlated with lake-wide selective pressure. Specifically, high concentrations of chromium throughout GSL correspond to an average similarity of chromium resistance genes that is 22% higher than taxonomic similarity. This suggests active HGT may be measured at the population level in microbial communities and these biogeographic patterns may serve as a model to study bacteria adaptation and speciation.
Collapse
Affiliation(s)
- J Jacob Parnell
- Center for Integrated BioSystems, Utah State University, Logan, Utah, United States of America.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
29
|
|
30
|
Becq J, Churlaud C, Deschavanne P. A benchmark of parametric methods for horizontal transfers detection. PLoS One 2010; 5:e9989. [PMID: 20376325 PMCID: PMC2848678 DOI: 10.1371/journal.pone.0009989] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2009] [Accepted: 03/10/2010] [Indexed: 11/23/2022] Open
Abstract
Horizontal gene transfer (HGT) has appeared to be of importance for prokaryotic species evolution. As a consequence numerous parametric methods, using only the information embedded in the genomes, have been designed to detect HGTs. Numerous reports of incongruencies in results of the different methods applied to the same genomes were published. The use of artificial genomes in which all HGT parameters are controlled allows testing different methods in the same conditions. The results of this benchmark concerning 16 representative parametric methods showed a great variety of efficiencies. Some methods work very poorly whatever the type of HGTs and some depend on the conditions or on the metrics used. The best methods in terms of total errors were those using tetranucleotides as criterion for the window methods or those using codon usage for gene based methods and the Kullback-Leibler divergence metric. Window methods are very sensitive but less specific and detect badly lone isolated gene. On the other hand gene based methods are often very specific but lack of sensitivity. We propose using two methods in combination to get the best of each category, a gene based one for specificity and a window based one for sensitivity.
Collapse
Affiliation(s)
- Jennifer Becq
- Dynamique des Structures et Interactions des Macromolécules Biologiques, Institut National de la Santé et de la Recherche Médicale (INSERM) UMR-S 665, Université Paris Diderot, Institut National de la Transfusion Sanguine, Paris, France
| | - Cécile Churlaud
- Molécules Thérapeutiques in silico, Institut National de la Santé et de la Recherche Médicale (INSERM) UMR-S 973, Université Paris Diderot, Paris, France
| | - Patrick Deschavanne
- Molécules Thérapeutiques in silico, Institut National de la Santé et de la Recherche Médicale (INSERM) UMR-S 973, Université Paris Diderot, Paris, France
- * E-mail:
| |
Collapse
|
31
|
Mann S, Li J, Chen YPP. Insights into bacterial genome composition through variable target GC content profiling. J Comput Biol 2010; 17:79-96. [PMID: 20078399 DOI: 10.1089/cmb.2009.0058] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
This study presents a new computational method for guanine (G) and cytosine (C), or GC, content profiling based on the idea of multiple resolution sampling (MRS). The benefit of our new approach over existing techniques follows from its ability to locate significant regions without prior knowledge of the sequence, nor the features being sought. The use of MRS has provided novel insights into bacterial genome composition. Key findings include those that are related to the core composition of bacterial genomes, to the identification of large genomic islands (in Enterobacterial genomes), and to the identification of surface protein determinants in human pathogenic organisms (e.g., Staphylococcus genomes). We observed that bacterial surface binding proteins maintain abnormal GC content, potentially pointing to a viral origin. This study has demonstrated that GC content holds a high informational worth and hints at many underlying evolutionary processes. For online Supplementary Material, see www.liebertonline.com .
Collapse
Affiliation(s)
- Scott Mann
- Faculty of Science and Technology, Deakin University, Melbourne, Victoria, Australia
| | | | | |
Collapse
|
32
|
Arvey AJ, Azad RK, Raval A, Lawrence JG. Detection of genomic islands via segmental genome heterogeneity. Nucleic Acids Res 2009; 37:5255-66. [PMID: 19589805 PMCID: PMC2760805 DOI: 10.1093/nar/gkp576] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
While the recognition of genomic islands can be a powerful mechanism for identifying genes that distinguish related bacteria, few methods have been developed to identify them specifically. Rather, identification of islands often begins with cataloging individual genes likely to have been recently introduced into the genome; regions with many putative alien genes are then examined for other features suggestive of recent acquisition of a large genomic region. When few phylogenetic relatives are available, the identification of alien genes relies on their atypical features relative to the bulk of the genes in the genome. The weakness of these ‘bottom–up’ approaches lies in the difficulty in identifying robustly those genes which are atypical, or phylogenetically restricted, due to recent foreign ancestry. Herein, we apply an alternative ‘top–down’ approach where bacterial genomes are recursively divided into progressively smaller regions, each with uniform composition. In this way, large chromosomal regions with atypical features are identified with high confidence due to the simultaneous analysis of multiple genes. This approach is based on a generalized divergence measure to quantify the compositional difference between segments in a hypothesis-testing framework. We tested the proposed genome island prediction algorithm on both artificial chimeric genomes and genuine bacterial genomes.
Collapse
Affiliation(s)
- Aaron J Arvey
- Department of Computer Science, University of California San Diego, La Jolla, CA 92093, USA
| | | | | | | |
Collapse
|
33
|
The genome sequence of the psychrophilic archaeon, Methanococcoides burtonii: the role of genome evolution in cold adaptation. ISME JOURNAL 2009; 3:1012-35. [DOI: 10.1038/ismej.2009.45] [Citation(s) in RCA: 130] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
34
|
Langille MGI, Brinkman FSL. Bioinformatic detection of horizontally transferred DNA in bacterial genomes. F1000 BIOLOGY REPORTS 2009; 1:25. [PMID: 20948661 PMCID: PMC2920674 DOI: 10.3410/b1-25] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
We highlight a selection of recent research on computational methods and associated challenges surrounding the prediction of bacterial horizontal gene transfer. This research area continues to face controversy, but is becoming more critical as the importance of horizontal gene transfer in medically and ecologically important prokaryotic evolution is further appreciated.
Collapse
Affiliation(s)
- Morgan G I Langille
- Department of Molecular Biology and Biochemistry, Simon Fraser University Burnaby, BC Canada V5A 1S6
| | | |
Collapse
|
35
|
Cortez D, Delaye L, Lazcano A, Becerra A. Composition-based methods to identify horizontal gene transfer. Methods Mol Biol 2009; 532:215-25. [PMID: 19271187 DOI: 10.1007/978-1-60327-853-9_12] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
The detection of horizontal gene transfer (HGT) events has become an increasingly important issue in recent years. Here we discuss a simple theoretical analysis based on the in silico artificial addition of known foreign genes from different prokaryotic groups into the genome of Escherichia coli K12 MG1655. Using this dataset as a control, we have tested the efficiency of four methodologies commonly employed to detect HGT, which are based on (a) the codon adaptation index, codon usage, and GC percentage (CAI/GC); (b) the distributional profile (DP) approach with a gene search in the closely related phylogenetic genomes; (c) the Bayesian model (BM); and (d) the first-order Markov model (MM). All methods exhibit limitations as shown here, with BM and MM giving better approximations. The MM has a better detection rate when genes from closely related organisms are evaluated. The application of the MM to detect recently transferred genes in the genomes of E. coli strain K12 MG1655 shows that this organism has undergone a rather significant amount of HGT, several of which have well-defined functions that appear to be involved in the direct interaction of the organisms with their environment.
Collapse
Affiliation(s)
- Diego Cortez
- Unité de Biologie Moléculaire du Gène chez Extremophiles Institut Pasteur, Paris, France
| | | | | | | |
Collapse
|
36
|
Baran RH, Ko H. Detecting horizontally transferred and essential genes based on dinucleotide relative abundance. DNA Res 2008; 15:267-76. [PMID: 18799480 PMCID: PMC2575891 DOI: 10.1093/dnares/dsn021] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2007] [Accepted: 08/05/2008] [Indexed: 11/20/2022] Open
Abstract
Various methods have been developed to detect horizontal gene transfer in bacteria, based on anomalous nucleotide composition, assuming that compositional features undergo amelioration in the host genome. Evolutionary theory predicts the inevitability of false positives when essential sequences are strongly conserved. Foreign genes could become more detectable on the basis of their higher order compositions if such features ameliorate more rapidly and uniformly than lower order features. This possibility is tested by comparing the heterogeneities of bacterial genomes with respect to strand-independent first- and second-order features, (i) G + C content and (ii) dinucleotide relative abundance, in 1 kb segments. Although statistical analysis confirms that (ii) is less inhomogeneous than (i) in all 12 species examined, extreme anomalies with respect to (ii) in the Escherichia coli K12 genome are typically co-located with essential genes.
Collapse
Affiliation(s)
| | - Hanseok Ko
- Department of Electronics and Computer Engineering, Korea University, Anam-dong, Sungbuk-ku, Seoul 136-702, South Korea
| |
Collapse
|
37
|
Langille MGI, Hsiao WWL, Brinkman FSL. Evaluation of genomic island predictors using a comparative genomics approach. BMC Bioinformatics 2008; 9:329. [PMID: 18680607 PMCID: PMC2518932 DOI: 10.1186/1471-2105-9-329] [Citation(s) in RCA: 200] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2008] [Accepted: 08/05/2008] [Indexed: 01/08/2023] Open
Abstract
Background Genomic islands (GIs) are clusters of genes in prokaryotic genomes of probable horizontal origin. GIs are disproportionately associated with microbial adaptations of medical or environmental interest. Recently, multiple programs for automated detection of GIs have been developed that utilize sequence composition characteristics, such as G+C ratio and dinucleotide bias. To robustly evaluate the accuracy of such methods, we propose that a dataset of GIs be constructed using criteria that are independent of sequence composition-based analysis approaches. Results We developed a comparative genomics approach (IslandPick) that identifies both very probable islands and non-island regions. The approach involves 1) flexible, automated selection of comparative genomes for each query genome, using a distance function that picks appropriate genomes for identification of GIs, 2) identification of regions unique to the query genome, compared with the chosen genomes (positive dataset) and 3) identification of regions conserved across all genomes (negative dataset). Using our constructed datasets, we investigated the accuracy of several sequence composition-based GI prediction tools. Conclusion Our results indicate that AlienHunter has the highest recall, but the lowest measured precision, while SIGI-HMM is the most precise method. SIGI-HMM and IslandPath/DIMOB have comparable overall highest accuracy. Our comparative genomics approach, IslandPick, was the most accurate, compared with a curated list of GIs, indicating that we have constructed suitable datasets. This represents the first evaluation, using diverse and, independent datasets that were not artificially constructed, of the accuracy of several sequence composition-based GI predictors. The caveats associated with this analysis and proposals for optimal island prediction are discussed.
Collapse
Affiliation(s)
- Morgan G I Langille
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada.
| | | | | |
Collapse
|
38
|
Bapteste E, Boucher Y. Lateral gene transfer challenges principles of microbial systematics. Trends Microbiol 2008; 16:200-7. [PMID: 18420414 DOI: 10.1016/j.tim.2008.02.005] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2007] [Revised: 02/13/2008] [Accepted: 02/15/2008] [Indexed: 10/22/2022]
Abstract
Evolutionists strive to learn about the natural historical process that gave rise to various taxa, while also attempting to classify them efficiently and make generalizations about them. The quantitative importance of lateral gene transfer inferred from genomic data, although well acknowledged by microbiologists, is in conflict with the conceptual foundations of the traditional phylogenetic system erected to achieve these goals. To provide a true account of microbial evolution, we suggest developing an alternative conception of natural groups and introduce a new notion--the composite evolutionary unit. Furthermore, we argue that a comprehensive database containing overlapping taxonomical groups would constitute a step forward regarding the classification of microbes in the presence of lateral gene transfer.
Collapse
|
39
|
Kassai-Jáger E, Ortutay C, Tóth G, Vellai T, Gáspári Z. Distribution and evolution of short tandem repeats in closely related bacterial genomes. Gene 2007; 410:18-25. [PMID: 18191346 DOI: 10.1016/j.gene.2007.11.006] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2007] [Revised: 11/08/2007] [Accepted: 11/16/2007] [Indexed: 11/27/2022]
Abstract
Simultaneous identification and comparison of perfect and imperfect microsatellites within a genome is a valuable tool both to overcome the lack of a consensus definition of SSRs and to assess repeat history. Detailed analysis of the overall distribution of perfect and imperfect microsatellites in closely related bacterial taxa is expected to give new insight into the evolution of prokaryotic genomes. We have performed a genome-wide analysis of microsatellite distribution in four Escherichia coli and seven Chlamydial strains. Chlamydial strains generally have a higher density of SSRs and show greater intra-group differences of SSR distribution patterns than E. coli genomes. In most investigated genomes the distribution of the total lengths of matching perfect and imperfect trinucleotide repeats are highly similar, with the notable exception of C. muridarum. Closely related strains show more similar repeat distribution patterns than strains separated by a longer divergence time. The discrepancy between the preferred classes of perfect and imperfect repeats in C. muridarum implies accelerated evolution of SSRs in this particular strain. Our results suggest that microsatellites, although considerably less abundant than in eukaryotic genomes, may nevertheless play an important role in the evolution of prokaryotic genomes and several gene families.
Collapse
Affiliation(s)
- Edit Kassai-Jáger
- Department of Genetics, Eötvös Loránd University, Pázmány Péter sétány 1/C, H-1117 Budapest, Hungary
| | | | | | | | | |
Collapse
|