1
|
Guo X, Guo Y, Chen H, Liu X, He P, Li W, Zhang MQ, Dai Q. Systematic comparison of genome information processing and boundary recognition tools used for genomic island detection. Comput Biol Med 2023; 166:107550. [PMID: 37826950 DOI: 10.1016/j.compbiomed.2023.107550] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 09/12/2023] [Accepted: 09/28/2023] [Indexed: 10/14/2023]
Abstract
Genomic islands are fragments of foreign DNA that are found in bacterial and archaeal genomes, and are typically associated with symbiosis or pathogenesis. While numerous genomic island detection methods have been proposed, there has been limited evaluation of the efficiency of the genome information processing and boundary recognition tools. In this study, we conducted a review of the statistical methods involved in genomic signatures, host signature extraction, informative signature selection, divergence measures, and boundary detection steps in genomic island prediction. We compared the performances of these methods on simulated experiments using alien fragments obtained from both artificial and real genomes. Our results indicate that among the nine genomic signatures evaluated, genomic signature frequency and full probability performed the best. However, their performance declined when normalized to their expectations and variances, such as Z-score and composition vector. Based on our experiments of the E. coli genome, we found that the confidence intervals of the window variances achieved the best performance in the signature extraction of the host, with the best confidence interval being 1.5-2 times the standard error. Ordered kurtosis was most effective in selecting informative signatures from a single genome, without requiring prior knowledge from other datasets. Among the three divergence measures evaluated, the two-sample t-test was the most successful, and a non-overlapping window with a small eye window (size 2) was best suited for identifying compositionally distinct regions. Finally, the maximum of the Markovian Jensen-Shannon divergence score, in terms of GC-content bias, was found to make boundary detection faster while maintaining a similar error rate.
Collapse
Affiliation(s)
- Xiangting Guo
- Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Yichu Guo
- Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Hu Chen
- Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Xiaoqing Liu
- College of Sciences, Hangzhou Dianzi University, Hangzhou, 310018, China
| | - Pingan He
- Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Wenshu Li
- Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Michael Q Zhang
- Center for Systems Biology, University of Texas at Dallas, Richardson, TX, 75080, USA; Center for Synthetic and Systems Biology, TNLIST, Tsinghua University, Beijing, 100084, China
| | - Qi Dai
- Zhejiang Sci-Tech University, Hangzhou, 310018, China; Center for Systems Biology, University of Texas at Dallas, Richardson, TX, 75080, USA.
| |
Collapse
|
2
|
Sengupta S, Azad RK. Leveraging comparative genomics to uncover alien genes in bacterial genomes. Microb Genom 2023; 9:mgen000939. [PMID: 36748570 PMCID: PMC9973850 DOI: 10.1099/mgen.0.000939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
A significant challenge in bacterial genomics is to catalogue genes acquired through the evolutionary process of horizontal gene transfer (HGT). Both comparative genomics and sequence composition-based methods have often been invoked to quantify horizontally acquired genes in bacterial genomes. Comparative genomics methods rely on completely sequenced genomes and therefore the confidence in their predictions increases as the databases become more enriched in completely sequenced genomes. Recent developments including in microbial genome sequencing call for reassessment of alien genes based on information-rich resources currently available. We revisited the comparative genomics approach and developed a new algorithm for alien gene detection. Our algorithm compared favourably with the existing comparative genomics-based methods and is capable of detecting both recent and ancient transfers. It can be used as a standalone tool or in concert with other complementary algorithms for comprehensively cataloguing alien genes in bacterial genomes.
Collapse
Affiliation(s)
- Soham Sengupta
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, Texas, 76203, USA
| | - Rajeev K Azad
- Department of Biological Sciences and BioDiscovery Institute, University of North Texas, Denton, Texas, 76203, USA.,Department of Mathematics, University of North Texas, Denton, Texas, 76203, USA
| |
Collapse
|
3
|
Burks DJ, Azad RK. Mapping Strengths and Weaknesses of Different Clustering Approaches to Deciphering Bacterial Chimerism. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2022; 26:422-439. [PMID: 35925817 DOI: 10.1089/omi.2022.0062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Bacterial genomes are chimeras of DNA of different ancestries. Deconstructing chimeric genomes is central to understanding the evolutionary trajectories of their disparate components and thus the organisms as a whole in the light of their evolutionary contexts. Of specific interest is to delineate and quantify native (vertically inherited) and alien (horizontally acquired) components of bacterial genomes and also specify genomic fractions that represent different donor sources. An agglomerative clustering procedure that prioritizes grouping of proximal similar genomic segments has previously been invoked for this purpose in conjunction with a recursive segmentation procedure. Surprisingly, however, the relative strengths and weaknesses of different clustering approaches to deciphering bacterial chimerism have not yet been investigated, despite the need to robustly interpret tens of thousands of completely sequenced bacterial genomes and nearly complete genome assemblies available in the public databases. To bridge this knowledge gap and develop more robust approaches, we assessed different clustering methods, including segment order based (proximal) clustering, hierarchical clustering, affinity propagation clustering, and a novel network clustering approach on chimeric genomes modeled after bacterial genomes representing a broad spectrum of compositional complexity. Although segment order-based clustering and network clustering compared favorably with the other approaches in discriminating between native and alien DNA at genome optimized settings, network clustering did consistently better than other methods at parametric settings optimized on all test genomes together. Segment order-based clustering and hierarchical clustering outperformed other methods in alien DNA identification while preserving donor identity in the genomes. Our study highlights the strengths and weaknesses of different approaches and suggests how this can be leveraged to achieve a more robust deconstruction of bacterial chimerism.
Collapse
Affiliation(s)
- David J Burks
- Department of Biological Sciences, BioDiscovery Institute, University of North Texas, Denton, Texas, USA
| | - Rajeev K Azad
- Department of Biological Sciences, BioDiscovery Institute, University of North Texas, Denton, Texas, USA
- Department of Mathematics, University of North Texas, Denton, Texas, USA
| |
Collapse
|
4
|
IslandCafe: Compositional Anomaly and Feature Enrichment Assessment for Delineation of Genomic Islands. G3-GENES GENOMES GENETICS 2019; 9:3273-3285. [PMID: 31387857 PMCID: PMC6778810 DOI: 10.1534/g3.119.400562] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
One of the evolutionary forces driving bacterial genome evolution is the acquisition of clusters of genes through horizontal gene transfer (HGT). These genomic islands may confer adaptive advantages to the recipient bacteria, such as, the ability to thwart antibiotics, become virulent or hypervirulent, or acquire novel metabolic traits. Methods for detecting genomic islands either search for markers or features typical of islands or examine anomaly in oligonucleotide composition against the genome background. The former tends to underestimate, missing islands that have the markers either lost or degraded, while the latter tends to overestimate, due to their inability to discriminate compositional atypicality arising because of HGT from those that are a consequence of other biological factors. We propose here a framework that exploits the strengths of both these approaches while bypassing the pitfalls of either. Genomic islands lacking markers are identified by their association with genomic islands with markers. This was made possible by performing marker enrichment and phyletic pattern analyses within an integrated framework of recursive segmentation and clustering. The proposed method, IslandCafe, compared favorably with frequently used methods for genomic island detection on synthetic test datasets and on a test-set of known islands from 15 well-characterized bacterial species. Furthermore, IslandCafe identified novel islands with imprints of likely horizontal acquisition.
Collapse
|
5
|
Huang GD, Liu XM, Huang TL, Xia LC. The statistical power of k-mer based aggregative statistics for alignment-free detection of horizontal gene transfer. Synth Syst Biotechnol 2019; 4:150-156. [PMID: 31508512 PMCID: PMC6723412 DOI: 10.1016/j.synbio.2019.08.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 07/14/2019] [Accepted: 08/05/2019] [Indexed: 12/21/2022] Open
Abstract
Alignment-based database search and sequence comparison are commonly used to detect horizontal gene transfer (HGT). However, with the rapid increase of sequencing depth, hundreds of thousands of contigs are routinely assembled from metagenomics studies, which challenges alignment-based HGT analysis by overwhelming the known reference sequences. Detecting HGT by k-mer statistics thus becomes an attractive alternative. These alignment-free statistics have been demonstrated in high performance and efficiency in whole-genome and transcriptome comparisons. To adapt k-mer statistics for HGT detection, we developed two aggregative statistics TsumS and Tsum*, which subsample metagenome contigs by their representative regions, and summarize the regional D2S and D2* metrics by their upper bounds. We systematically studied the aggregative statistics’ power at different k-mer size using simulations. Our analysis showed that, in general, the power of TsumS and Tsum* increases with sequencing coverage, and reaches a maximum power >80% at k = 6, with 5% Type-I error and the coverage ratio >0.2x. The statistical power of TsumS and Tsum* was evaluated with realistic simulations of HGT mechanism, sequencing depth, read length, and base error. We expect these statistics to be useful distance metrics for identifying HGT in metagenomic studies.
Collapse
Affiliation(s)
- Guan-Da Huang
- School of Physics and Optoelectronics, South China University of Technology, Guangzhou, 510640, China
| | - Xue-Mei Liu
- School of Physics and Optoelectronics, South China University of Technology, Guangzhou, 510640, China
| | - Tian-Lai Huang
- School of Physics and Optoelectronics, South China University of Technology, Guangzhou, 510640, China
| | - Li-C Xia
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, 94305, USA
| |
Collapse
|
6
|
Tao J, Liu X, Yang S, Bao C, He P, Dai Q. An efficient genomic signature ranking method for genomic island prediction from a single genome. J Theor Biol 2019; 467:142-149. [PMID: 30768974 DOI: 10.1016/j.jtbi.2019.02.008] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2018] [Revised: 02/07/2019] [Accepted: 02/11/2019] [Indexed: 01/13/2023]
Abstract
Genomic islands that are associated with microbial adaptations and carry genomic signatures different from that of the host, and thus many methods have been proposed to select the informative genomic signatures from a range of organisms and discriminate genomic islands from the rest of the genome in terms of these signature biases. However, they are of limited use when closely related genomes are unavailable. In the present work, we proposed a kurtosis-based ranking method to select the informative genomic signatures from a single genome. In simulations with alien fragments from artificial and real genomes, the proposed kurtosis-based ranking method efficiently selected the informative genomic signatures from a single genome, without annotated information of genomes or prior knowledge from other datasets. This understanding can be useful to design more powerful method for genomic island detection.
Collapse
Affiliation(s)
- Jin Tao
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Xiaoqing Liu
- College of Sciences, Hangzhou Dianzi University, Hangzhou 310018, People's Republic of China
| | - Siqian Yang
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Chaohui Bao
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Pingan He
- College of Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Qi Dai
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China; Department of Molecular and Cell Biology, University of Texas at Dallas, Richardson, TX 75080, USA.
| |
Collapse
|
7
|
Panda A, Drancourt M, Tuller T, Pontarotti P. Genome-wide analysis of horizontally acquired genes in the genus Mycobacterium. Sci Rep 2018; 8:14817. [PMID: 30287860 PMCID: PMC6172269 DOI: 10.1038/s41598-018-33261-w] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Accepted: 09/07/2018] [Indexed: 12/13/2022] Open
Abstract
Horizontal gene transfer (HGT) was attributed as a major driving force for the innovation and evolution of prokaryotic genomes. Previously, multiple research endeavors were undertaken to decipher HGT in different bacterial lineages. The genus Mycobacterium houses some of the most deadly human pathogens; however, the impact of HGT in Mycobacterium has never been addressed in a systematic way. Previous initiatives to explore the genomic imprints of HGTs in Mycobacterium were focused on few selected species, specifically among the members of Mycobacterium tuberculosis complex. Considering the recent availability of a large number of genomes, the current study was initiated to decipher the probable events of HGTs among 109 completely sequenced Mycobacterium species. Our comprehensive phylogenetic analysis with more than 9,000 families of Mycobacterium proteins allowed us to list several instances of gene transfers spread across the Mycobacterium phylogeny. Moreover, by examining the topology of gene phylogenies here, we identified the species most likely to donate and receive these genes and provided a detailed overview of the putative functions these genes may be involved in. Our study suggested that horizontally acquired foreign genes had played an enduring role in the evolution of Mycobacterium genomes and have contributed to their metabolic versatility and pathogenicity.
Collapse
Affiliation(s)
- Arup Panda
- Aix-Marseille-Univ., IRD, MEPHI, Institut Hospitalo-Universitaire (IHU) Méditerranée Infection, Marseille, France.,Department of Biomedical Engineering, Tel-Aviv University, Ramat Aviv, 69978, Israel
| | - Michel Drancourt
- Aix-Marseille-Univ., IRD, MEPHI, Institut Hospitalo-Universitaire (IHU) Méditerranée Infection, Marseille, France.
| | - Tamir Tuller
- Department of Biomedical Engineering, Tel-Aviv University, Ramat Aviv, 69978, Israel
| | - Pierre Pontarotti
- Aix-Marseille-Univ., IRD, MEPHI, Institut Hospitalo-Universitaire (IHU) Méditerranée Infection, Marseille, France.,CNRS, Marseille, France
| |
Collapse
|
8
|
Abstract
Horizontal or Lateral Gene Transfer (HGT or LGT) is the transmission of portions of genomic DNA between organisms through a process decoupled from vertical inheritance. In the presence of HGT events, different fragments of the genome are the result of different evolutionary histories. This can therefore complicate the investigations of evolutionary relatedness of lineages and species. Also, as HGT can bring into genomes radically different genotypes from distant lineages, or even new genes bearing new functions, it is a major source of phenotypic innovation and a mechanism of niche adaptation. For example, of particular relevance to human health is the lateral transfer of antibiotic resistance and pathogenicity determinants, leading to the emergence of pathogenic lineages. Computational identification of HGT events relies upon the investigation of sequence composition or evolutionary history of genes. Sequence composition-based ("parametric") methods search for deviations from the genomic average, whereas evolutionary history-based ("phylogenetic") approaches identify genes whose evolutionary history significantly differs from that of the host species. The evaluation and benchmarking of HGT inference methods typically rely upon simulated genomes, for which the true history is known. On real data, different methods tend to infer different HGT events, and as a result it can be difficult to ascertain all but simple and clear-cut HGT events.
Collapse
Affiliation(s)
| | - Nives Škunca
- ETH Zurich, Zurich, Switzerland
- Swiss Institute of Bioinformatics, Zurich, Switzerland
| | | | - Christophe Dessimoz
- University College London, London, United Kingdom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| |
Collapse
|
9
|
Abstract
Since the emergence of high-throughput genome sequencing platforms and more recently the next-generation platforms, the genome databases are growing at an astronomical rate. Tremendous efforts have been invested in recent years in understanding intriguing complexities beneath the vast ocean of genomic data. This is apparent in the spurt of computational methods for interpreting these data in the past few years. Genomic data interpretation is notoriously difficult, partly owing to the inherent heterogeneities appearing at different scales. Methods developed to interpret these data often suffer from their inability to adequately measure the underlying heterogeneities and thus lead to confounding results. Here, we present an information entropy-based approach that unravels the distinctive patterns underlying genomic data efficiently and thus is applicable in addressing a variety of biological problems. We show the robustness and consistency of the proposed methodology in addressing three different biological problems of significance—identification of alien DNAs in bacterial genomes, detection of structural variants in cancer cell lines and alignment-free genome comparison.
Collapse
Affiliation(s)
- Rajeev K Azad
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA.
| | | |
Collapse
|
10
|
Lapierre P, Lasek-Nesselquist E, Gogarten JP. The impact of HGT on phylogenomic reconstruction methods. Brief Bioinform 2012; 15:79-90. [DOI: 10.1093/bib/bbs050] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
|
11
|
Xiong D, Xiao F, Liu L, Hu K, Tan Y, He S, Gao X. Towards a better detection of horizontally transferred genes by combining unusual properties effectively. PLoS One 2012; 7:e43126. [PMID: 22905214 PMCID: PMC3419211 DOI: 10.1371/journal.pone.0043126] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2012] [Accepted: 07/16/2012] [Indexed: 02/01/2023] Open
Abstract
Background Horizontal gene transfer (HGT) is one of the major mechanisms contributing to microbial genome diversification. A number of computational methods for finding horizontally transferred genes have been proposed in the past decades; however none of them has provided a reliable detector yet. In existing parametric approaches, only one single compositional property can participate in the detection process, or the results obtained through each single property are just simply combined. It’s known that different properties may mean different information, so the single property can’t sufficiently contain the information encoded by gene sequences. In addition, the class imbalance problem in the datasets, which also results in great errors for the gene detection, hasn’t been considered by the published methods. Here we developed an effective classifier system (Hgtident) that used support vector machine (SVM) by combining unusual properties effectively for HGT detection. Results Our approach Hgtident includes the introduction of more representative datasets, optimization of SVM model, feature selection, handling of imbalance problem in the datasets and extensive performance evaluation via systematic cross-validation methods. Through feature selection, we found that JS-DN and JS-CB have higher discriminating power for HGT detection, while GC1–GC3 and k-mer (k = 1, 2, …, 7) make the least contribution. Extensive experiments indicated the new classifier could reduce Mean error dramatically, and also improve Recall by a certain level. For the testing genomes, compared with the existing popular multiple-threshold approach, on average, our Recall and Mean error was respectively improved by 2.81% and reduced by 26.32%, which means that numerous false positives were identified correctly. Conclusions Hgtident introduced here is an effective approach for better detecting HGT. Combining multiple features of HGT is also essential for a wider range of HGT events detection.
Collapse
Affiliation(s)
- Dapeng Xiong
- Key Laboratory of Intelligent Computing & Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan, People’s Republic of China
| | - Fen Xiao
- Key Laboratory of Intelligent Computing & Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan, People’s Republic of China
| | - Li Liu
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, People’s Republic of China
| | - Kai Hu
- Key Laboratory of Intelligent Computing & Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan, People’s Republic of China
| | - Yanping Tan
- Key Laboratory of Intelligent Computing & Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan, People’s Republic of China
| | - Shunmin He
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, People’s Republic of China
- * E-mail: (SH); (XG)
| | - Xieping Gao
- Key Laboratory of Intelligent Computing & Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan, People’s Republic of China
- * E-mail: (SH); (XG)
| |
Collapse
|
12
|
Elhai J, Liu H, Taton A. Detection of horizontal transfer of individual genes by anomalous oligomer frequencies. BMC Genomics 2012; 13:245. [PMID: 22702893 PMCID: PMC3497702 DOI: 10.1186/1471-2164-13-245] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2011] [Accepted: 05/18/2012] [Indexed: 11/10/2022] Open
Abstract
Background Understanding the history of life requires that we understand the transfer of genetic material across phylogenetic boundaries. Detecting genes that were acquired by means other than vertical descent is a basic step in that process. Detection by discordant phylogenies is computationally expensive and not always definitive. Many have used easily computed compositional features as an alternative procedure. However, different compositional methods produce different predictions, and the effectiveness of any method is not well established. Results The ability of octamer frequency comparisons to detect genes artificially seeded in cyanobacterial genomes was markedly increased by using as a training set those genes that are highly conserved over all bacteria. Using a subset of octamer frequencies in such tests also increased effectiveness, but this depended on the specific target genome and the source of the contaminating genes. The presence of high frequency octamers and the GC content of the contaminating genes were important considerations. A method comprising best practices from these tests was devised, the Core Gene Similarity (CGS) method, and it performed better than simple octamer frequency analysis, codon bias, or GC contrasts in detecting seeded genes or naturally occurring transposons. From a comparison of predictions with phylogenetic trees, it appears that the effectiveness of the method is confined to horizontal transfer events that have occurred recently in evolutionary time. Conclusions The CGS method may be an improvement over existing surrogate methods to detect genes of foreign origin.
Collapse
Affiliation(s)
- Jeff Elhai
- Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, VA 23284, USA.
| | | | | |
Collapse
|
13
|
Abstract
Methods for identifying alien genes in genomes fall into two general classes. Phylogenetic methods examine the distribution of a gene's homologues among genomes to find those with relationships not consistent with vertical inheritance. These approaches include identifying orphan genes which lack homologues in closely related genomes and genes with unduly high levels of similarity to genes in otherwise unrelated genomes. Rigorous statistical tests are available to place confidence intervals for predicted alien genes. Parametric methods examine the compositional properties of genes within a genome to find those with atypical properties, likely indicating the directional mutational pressures of a donor genome. These methods may compare the properties of genes to genomic averages, properties of genes to each other, or properties of large, multigene regions of the chromosome. Here, we discuss the strengths and weaknesses of each approach.
Collapse
Affiliation(s)
- Rajeev K Azad
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, USA
| | | |
Collapse
|
14
|
Bezuidt O, Pierneef R, Mncube K, Lima-Mendez G, Reva ON. Mainstreams of horizontal gene exchange in enterobacteria: consideration of the outbreak of enterohemorrhagic E. coli O104:H4 in Germany in 2011. PLoS One 2011; 6:e25702. [PMID: 22022434 PMCID: PMC3195076 DOI: 10.1371/journal.pone.0025702] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2011] [Accepted: 09/08/2011] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Escherichia coli O104:H4 caused a severe outbreak in Europe in 2011. The strain TY-2482 sequenced from this outbreak allowed the discovery of its closest relatives but failed to resolve ways in which it originated and evolved. On account of the previous statement, may we expect similar upcoming outbreaks to occur recurrently or spontaneously in the future? The inability to answer these questions shows limitations of the current comparative and evolutionary genomics methods. PRINCIPAL FINDINGS The study revealed oscillations of gene exchange in enterobacteria, which originated from marine γ-Proteobacteria. These mobile genetic elements have become recombination hotspots and effective 'vehicles' ensuring a wide distribution of successful combinations of fitness and virulence genes among enterobacteria. Two remarkable peculiarities of the strain TY-2482 and its relatives were observed: i) retaining the genetic primitiveness by these strains as they somehow avoided the main fluxes of horizontal gene transfer which effectively penetrated other enetrobacteria; ii) acquisition of antibiotic resistance genes in a plasmid genomic island of β-Proteobacteria origin which ontologically is unrelated to the predominant genomic islands of enterobacteria. CONCLUSIONS Oscillations of horizontal gene exchange activity were reported which result from a counterbalance between the acquired resistance of bacteria towards existing mobile vectors and the generation of new vectors in the environmental microflora. We hypothesized that TY-2482 may originate from a genetically primitive lineage of E. coli that has evolved in confined geographical areas and brought by human migration or cattle trade onto an intersection of several independent streams of horizontal gene exchange. Development of a system for monitoring the new and most active gene exchange events was proposed.
Collapse
Affiliation(s)
- Oliver Bezuidt
- Bioinformatics and Computational Biology Unit, Department of Biochemistry, University of Pretoria, Pretoria, South Africa
| | - Rian Pierneef
- Bioinformatics and Computational Biology Unit, Department of Biochemistry, University of Pretoria, Pretoria, South Africa
| | - Kingdom Mncube
- Bioinformatics and Computational Biology Unit, Department of Biochemistry, University of Pretoria, Pretoria, South Africa
| | - Gipsi Lima-Mendez
- Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe), Université Libre de Bruxelles, Bruxelles, Belgium
| | - Oleg N. Reva
- Bioinformatics and Computational Biology Unit, Department of Biochemistry, University of Pretoria, Pretoria, South Africa
- * E-mail:
| |
Collapse
|
15
|
Abstract
Because the properties of horizontally-transferred genes will reflect the mutational proclivities of their donor genomes, they often show atypical compositional properties relative to native genes. Parametric methods use these discrepancies to identify bacterial genes recently acquired by horizontal transfer. However, compositional patterns of native genes vary stochastically, leaving no clear boundary between typical and atypical genes. As a result, while strongly atypical genes are readily identified as alien, genes of ambiguous character are poorly classified when a single threshold separates typical and atypical genes. This limitation affects all parametric methods that examine genes independently, and escaping it requires the use of additional genomic information. We propose that the performance of all parametric methods can be improved by using a multiple-threshold approach. First, strongly atypical alien genes and strongly typical native genes would be identified using conservative thresholds. Genes with ambiguous compositional features would then be classified by examining gene context, including the class (native or alien) of flanking genes. By including additional genomic information in a multiple-threshold framework, we observed a remarkable improvement in the performance of several popular, but algorithmically distinct, methods for alien gene detection.
Collapse
Affiliation(s)
- Rajeev K Azad
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | | |
Collapse
|
16
|
Becq J, Churlaud C, Deschavanne P. A benchmark of parametric methods for horizontal transfers detection. PLoS One 2010; 5:e9989. [PMID: 20376325 PMCID: PMC2848678 DOI: 10.1371/journal.pone.0009989] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2009] [Accepted: 03/10/2010] [Indexed: 11/23/2022] Open
Abstract
Horizontal gene transfer (HGT) has appeared to be of importance for prokaryotic species evolution. As a consequence numerous parametric methods, using only the information embedded in the genomes, have been designed to detect HGTs. Numerous reports of incongruencies in results of the different methods applied to the same genomes were published. The use of artificial genomes in which all HGT parameters are controlled allows testing different methods in the same conditions. The results of this benchmark concerning 16 representative parametric methods showed a great variety of efficiencies. Some methods work very poorly whatever the type of HGTs and some depend on the conditions or on the metrics used. The best methods in terms of total errors were those using tetranucleotides as criterion for the window methods or those using codon usage for gene based methods and the Kullback-Leibler divergence metric. Window methods are very sensitive but less specific and detect badly lone isolated gene. On the other hand gene based methods are often very specific but lack of sensitivity. We propose using two methods in combination to get the best of each category, a gene based one for specificity and a window based one for sensitivity.
Collapse
Affiliation(s)
- Jennifer Becq
- Dynamique des Structures et Interactions des Macromolécules Biologiques, Institut National de la Santé et de la Recherche Médicale (INSERM) UMR-S 665, Université Paris Diderot, Institut National de la Transfusion Sanguine, Paris, France
| | - Cécile Churlaud
- Molécules Thérapeutiques in silico, Institut National de la Santé et de la Recherche Médicale (INSERM) UMR-S 973, Université Paris Diderot, Paris, France
| | - Patrick Deschavanne
- Molécules Thérapeutiques in silico, Institut National de la Santé et de la Recherche Médicale (INSERM) UMR-S 973, Université Paris Diderot, Paris, France
- * E-mail:
| |
Collapse
|
17
|
Mallet LV, Becq J, Deschavanne P. Whole genome evaluation of horizontal transfers in the pathogenic fungus Aspergillus fumigatus. BMC Genomics 2010; 11:171. [PMID: 20226043 PMCID: PMC2848249 DOI: 10.1186/1471-2164-11-171] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2010] [Accepted: 03/12/2010] [Indexed: 12/14/2022] Open
Abstract
Background Numerous cases of horizontal transfers (HTs) have been described for eukaryote genomes, but in contrast to prokaryote genomes, no whole genome evaluation of HTs has been carried out. This is mainly due to a lack of parametric methods specially designed to take the intrinsic heterogeneity of eukaryote genomes into account. We applied a simple and tested method based on local variations of genomic signatures to analyze the genome of the pathogenic fungus Aspergillus fumigatus. Results We detected 189 atypical regions containing 214 genes, accounting for about 1 Mb of DNA sequences. However, the fraction of atypical DNA detected was smaller than the average amount detected in the same conditions in prokaryote genomes (3.1% vs 5.6%). It appeared that about one third of these regions contained no annotated genes, a proportion far greater than in prokaryote genomes. When analyzing the origin of these HTs by comparing their signatures to a home made database of species signatures, 3 groups of donor species emerged: bacteria (40%), fungi (25%), and viruses (22%). It is to be noticed that though inter-domain exchanges are confirmed, we only put in evidence very few exchanges between eukaryotic kingdoms. Conclusions In conclusion, we demonstrated that HTs are not negligible in eukaryote genomes, bearing in mind that in our stringent conditions this amount is a floor value, though of a lesser extent than in prokaryote genomes. The biological mechanisms underlying those transfers remain to be elucidated as well as the biological functions of the transferred genes.
Collapse
Affiliation(s)
- Ludovic V Mallet
- Molécules thérapeutiques in silico (MTI), INSERM UMR-M 973, Université Paris Diderot-Paris 7, Bât Lamarck, 35 rue Hélène Brion, 75205 Paris Cedex 13, France
| | | | | |
Collapse
|
18
|
Arvey AJ, Azad RK, Raval A, Lawrence JG. Detection of genomic islands via segmental genome heterogeneity. Nucleic Acids Res 2009; 37:5255-66. [PMID: 19589805 PMCID: PMC2760805 DOI: 10.1093/nar/gkp576] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
While the recognition of genomic islands can be a powerful mechanism for identifying genes that distinguish related bacteria, few methods have been developed to identify them specifically. Rather, identification of islands often begins with cataloging individual genes likely to have been recently introduced into the genome; regions with many putative alien genes are then examined for other features suggestive of recent acquisition of a large genomic region. When few phylogenetic relatives are available, the identification of alien genes relies on their atypical features relative to the bulk of the genes in the genome. The weakness of these ‘bottom–up’ approaches lies in the difficulty in identifying robustly those genes which are atypical, or phylogenetically restricted, due to recent foreign ancestry. Herein, we apply an alternative ‘top–down’ approach where bacterial genomes are recursively divided into progressively smaller regions, each with uniform composition. In this way, large chromosomal regions with atypical features are identified with high confidence due to the simultaneous analysis of multiple genes. This approach is based on a generalized divergence measure to quantify the compositional difference between segments in a hypothesis-testing framework. We tested the proposed genome island prediction algorithm on both artificial chimeric genomes and genuine bacterial genomes.
Collapse
Affiliation(s)
- Aaron J Arvey
- Department of Computer Science, University of California San Diego, La Jolla, CA 92093, USA
| | | | | | | |
Collapse
|
19
|
Abstract
This chapter discusses the pros and cons of the existing computational methods for the detection of horizontal (or lateral) gene transfer and highlights the genome-wide studies utilizing these methods. The impact of horizontal gene transfer (HGT) on prokaryote genome evolution is discussed.
Collapse
|
20
|
Ganesan H, Rakitianskaia AS, Davenport CF, Tümmler B, Reva ON. The SeqWord Genome Browser: an online tool for the identification and visualization of atypical regions of bacterial genomes through oligonucleotide usage. BMC Bioinformatics 2008; 9:333. [PMID: 18687122 PMCID: PMC2528017 DOI: 10.1186/1471-2105-9-333] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2008] [Accepted: 08/07/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Data mining in large DNA sequences is a major challenge in microbial genomics and bioinformatics. Oligonucleotide usage (OU) patterns provide a wealth of information for large scale sequence analysis and visualization. The purpose of this research was to make OU statistical analysis available as a novel web-based tool for functional genomics and annotation. The tool is also available as a downloadable package. RESULTS The SeqWord Genome Browser (SWGB) was developed to visualize the natural compositional variation of DNA sequences. The applet is also used for identification of divergent genomic regions both in annotated sequences of bacterial chromosomes, plasmids, phages and viruses, and in raw DNA sequences prior to annotation by comparing local and global OU patterns. The applet allows fast and reliable identification of clusters of horizontally transferred genomic islands, large multi-domain genes and genes for ribosomal RNA. Within the majority of genomic fragments (also termed genomic core sequence), regions enriched with housekeeping genes, ribosomal proteins and the regions rich in pseudogenes or genetic vestiges may be contrasted. CONCLUSION The SWGB applet presents a range of comprehensive OU statistical parameters calculated for a range of bacterial species, plasmids and phages. It is available on the Internet at http://www.bi.up.ac.za/SeqWord/mhhapplet.php.
Collapse
Affiliation(s)
- Hamilton Ganesan
- Dep of Biochemistry, Bioinformatics and Computational Biology Unit, University of Pretoria, Lynnwood road, Hillcrest, Pretoria, 0002, South Africa.
| | | | | | | | | |
Collapse
|
21
|
Tamames J, Moya A. Estimating the extent of horizontal gene transfer in metagenomic sequences. BMC Genomics 2008; 9:136. [PMID: 18366724 PMCID: PMC2324111 DOI: 10.1186/1471-2164-9-136] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2007] [Accepted: 03/24/2008] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Although the extent of horizontal gene transfer (HGT) in complete genomes has been widely studied, its influence in the evolution of natural communities of prokaryotes remains unknown. The availability of metagenomic sequences allows us to address the study of global patterns of prokaryotic evolution in samples from natural communities. However, the methods that have been commonly used for the study of HGT are not suitable for metagenomic samples. Therefore it is important to develop new methods or to adapt existing ones to be used with metagenomic sequences. RESULTS We have created two different methods that are suitable for the study of HGT in metagenomic samples. The methods are based on phylogenetic and DNA compositional approaches, and have allowed us to assess the extent of possible HGT events in metagenomes for the first time. The methods are shown to be compatible and quite precise, although they probably underestimate the number of possible events. Our results show that the phylogenetic method detects HGT in between 0.8% and 1.5% of the sequences, while DNA compositional methods identify putative HGT in between 2% and 8% of the sequences. These ranges are very similar to these found in complete genomes by related approaches. Both methods act with a different sensitivity since they probably target HGT events of different ages: the compositional method mostly identifies recent transfers, while the phylogenetic is more suitable for the detections of older events. Nevertheless, the study of the number of HGT events in metagenomic sequences from different communities shows a consistent trend for both methods: the lower amount is found for the sequences of the Sargasso Sea metagenome, while the higher quantity is found in the whale fall metagenome from the bottom of the ocean. The significance of these observations is discussed. CONCLUSION The computational approaches that are used to find possible HGT events in complete genomes can be adapted to work with metagenomic samples, where a level of high performance is shown in different metagenomic samples. The percentage of possible HGT events that were observed is close to that found for complete genomes, and different microbiomes show diverse ratios of putative HGT events. This is probably related with both environmental factors and the composition in the species of each particular community.
Collapse
Affiliation(s)
- Javier Tamames
- Instituto Cavanilles de Biodiversidad y Biología Evolutiva. Universidad de Valencia. Polígono La Coma s/n, 46980 Paterna (Valencia), Spain
- CIBER en Epidemiología y Salud Pública (CIBER-ESP), Spain
| | - Andrés Moya
- Instituto Cavanilles de Biodiversidad y Biología Evolutiva. Universidad de Valencia. Polígono La Coma s/n, 46980 Paterna (Valencia), Spain
- CIBER en Epidemiología y Salud Pública (CIBER-ESP), Spain
| |
Collapse
|
22
|
Guy L. Identification and characterization of pathogenicity and other genomic islands using base composition analyses. Future Microbiol 2007; 1:309-16. [PMID: 17661643 DOI: 10.2217/17460913.1.3.309] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Pathogenicity islands (PAIs) are major factors contributing to the pathogenicity of bacteria and to their resistance to antibiotics. In general, genomic islands (GIs), of which PAIs are a subset, increase the fitness of their hosts by providing new functions. With the number of available whole genome sequences growing exponentially, in silico methods have been developed to detect putative PAIs and GIs within them. Compositional methods rely on G+C content differences, codon usage and oligonucleotide biases. Other methods detect the presence of functional elements such as tRNA and mobility genes. Future availability of fast, high-throughput, inexpensive genome sequencing emphasizes the need for user-friendly applications able to detect, characterize and analyze putative GIs and PAIs. It may uncover new aspects of pathogenicity and provide better understanding of the evolution of pathogenic bacteria. These methods will be highly requested when whole genome sequencing technologies will be used by physicians for personal diagnosis.
Collapse
Affiliation(s)
- Lionel Guy
- Département de Microbiologie Fondamentale, Faculté de Biologie et Médecine, Université de Lausanne, Switzerland.
| |
Collapse
|
23
|
Azad RK, Lawrence JG. Detecting laterally transferred genes: use of entropic clustering methods and genome position. Nucleic Acids Res 2007; 35:4629-39. [PMID: 17591616 PMCID: PMC1950545 DOI: 10.1093/nar/gkm204] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Most parametric methods for detecting foreign genes in bacterial genomes use a scoring function that measures the atypicality of a gene with respect to the bulk of the genome. Genes whose features are sufficiently atypical—lying beyond a threshold value—are deemed foreign. Yet these methods fail when the range of features of donor genomes overlaps with that of the recipient genome, leading to misclassification of foreign and native genes; existing parametric methods choose threshold parameters to balance these error rates. To circumvent this problem, we have developed a two-pronged approach to minimize the misclassification of genes. First, beyond classifying genes as merely atypical, a gene clustering method based on Jensen–Shannon entropic divergence identifies classes of foreign genes that are also similar to each other. Second, genome position is used to reassign genes among classes whose composition features overlap. This process minimizes the misclassification of either native or foreign genes that are weakly atypical. The performance of this approach was assessed using artificial chimeric genomes and then applied to the well-characterized Escherichia coli K12 genome. Not only were foreign genes identified with a high degree of accuracy, but genes originating from the same donor organism were effectively grouped.
Collapse
|
24
|
The power of phylogenetic approaches to detect horizontally transferred genes. BMC Evol Biol 2007; 7:45. [PMID: 17376230 PMCID: PMC1847511 DOI: 10.1186/1471-2148-7-45] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2006] [Accepted: 03/21/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Horizontal gene transfer plays an important role in evolution because it sometimes allows recipient lineages to adapt to new ecological niches. High genes transfer frequencies were inferred for prokaryotic and early eukaryotic evolution. Does horizontal gene transfer also impact phylogenetic reconstruction of the evolutionary history of genomes and organisms? The answer to this question depends at least in part on the actual gene transfer frequencies and on the ability to weed out transferred genes from further analyses. Are the detected transfers mainly false positives, or are they the tip of an iceberg of many transfer events most of which go undetected by current methods? RESULTS Phylogenetic detection methods appear to be the method of choice to infer gene transfers, especially for ancient transfers and those followed by orthologous replacement. Here we explore how well some of these methods perform using in silico transfers between the terminal branches of a gamma proteobacterial, genome based phylogeny. For the experiments performed here on average the AU test at a 5% significance level detects 90.3% of the transfers and 91% of the exchanges as significant. Using the Robinson-Foulds distance only 57.7% of the exchanges and 60% of the donations were identified as significant. Analyses using bipartition spectra appeared most successful in our test case. The power of detection was on average 97% using a 70% cut-off and 94.2% with 90% cut-off for identifying conflicting bipartitions, while the rate of false positives was below 4.2% and 2.1% for the two cut-offs, respectively. For all methods the detection rates improved when more intervening branches separated donor and recipient. CONCLUSION Rates of detected transfers should not be mistaken for the actual transfer rates; most analyses of gene transfers remain anecdotal. The method and significance level to identify potential gene transfer events represent a trade-off between the frequency of erroneous identification (false positives) and the power to detect actual transfer events.
Collapse
|
25
|
McMurdie PJ, Behrens SF, Holmes S, Spormann AM. Unusual codon bias in vinyl chloride reductase genes of Dehalococcoides species. Appl Environ Microbiol 2007; 73:2744-7. [PMID: 17308190 PMCID: PMC1855607 DOI: 10.1128/aem.02768-06] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Vinyl chloride reductases (VC-RDase) are the key enzymes for complete microbial reductive dehalogenation of chloroethenes, including the groundwater pollutants tetrachloroethene and trichloroethene. Analysis of the codon usage of the VC-RDase genes vcrA and bvcA showed that these genes are highly unusual and are characterized by a low G+C fraction at the third position. The third position of codons in VC-RDase genes is biased toward the nucleotide T, even though available Dehalococcoides genome sequences indicate the absence of any tRNAs matching codons that end in T. The comparatively high level of abnormality in the codon usage of VC-RDase genes suggests an evolutionary history that is different from that of most other Dehalococcoides genes.
Collapse
Affiliation(s)
- Paul J McMurdie
- Department of Civil and Environmental Engineering, James H Clark Center East Wing, E250A, Stanford University, Stanford, CA 94305-5429, USA
| | | | | | | |
Collapse
|
26
|
Abstract
MOTIVATION Microbial genomes undergo evolutionary processes such as gene family expansion and contraction, variable rates and patterns of sequence substitution and lateral genetic transfer. Simulation tools are essential for both the generation of data under different evolutionary models and the validation of analytical methods on such data. However, meaningful investigation of phenomena such as lateral genetic transfer requires the simultaneous consideration of many underlying evolutionary processes. RESULTS We have developed EvolSimulator, a software package that combines non-stationary sequence and gene family evolution together with models of lateral genetic transfer, within a customizable birth-death model of speciation and extinction. Here, we examine simulated data sets generated with EvolSimulator using existing statistical techniques from the evolutionary literature, showing in detail each component of the simulation strategy. AVAILABILITY Source code, manual and other information are freely available at www.bioinformatics.org.au/evolsim. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Robert G Beiko
- Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada.
| | | |
Collapse
|
27
|
Comas I, Moya A, Azad RK, Lawrence JG, Gonzalez-Candelas F. The evolutionary origin of Xanthomonadales genomes and the nature of the horizontal gene transfer process. Mol Biol Evol 2006; 23:2049-57. [PMID: 16882701 DOI: 10.1093/molbev/msl075] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Determining the influence of horizontal gene transfer (HGT) on phylogenomic analyses and the retrieval of a tree of life is relevant for our understanding of microbial genome evolution. It is particularly difficult to differentiate between phylogenetic incongruence due to noise and that resulting from HGT. We have performed a large-scale, detailed evolutionary analysis of the different phylogenetic signals present in the genomes of Xanthomonadales, a group of Proteobacteria. We show that the presence of phylogenetic noise is not an obstacle to infer past and present HGTs during their evolution. The scenario derived from this analysis and other recently published reports reflect the confounding effects on bacterial phylogenomics of past and present HGT. Although transfers between closely related species are difficult to detect in genome-scale phylogenetic analyses, past transfers to the ancestor of extant groups appear as conflicting signals that occasionally might make impossible to determine the evolutionary origin of the whole genome.
Collapse
Affiliation(s)
- Iñaki Comas
- Instituto Cavanilles de Biodiversidad y Biología Evolutiva, Universidad de Valencia, Valencia, Spain.
| | | | | | | | | |
Collapse
|
28
|
Waack S, Keller O, Asper R, Brodag T, Damm C, Fricke WF, Surovcik K, Meinicke P, Merkl R. Score-based prediction of genomic islands in prokaryotic genomes using hidden Markov models. BMC Bioinformatics 2006; 7:142. [PMID: 16542435 PMCID: PMC1489950 DOI: 10.1186/1471-2105-7-142] [Citation(s) in RCA: 265] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2005] [Accepted: 03/16/2006] [Indexed: 01/25/2023] Open
Abstract
Background Horizontal gene transfer (HGT) is considered a strong evolutionary force shaping the content of microbial genomes in a substantial manner. It is the difference in speed enabling the rapid adaptation to changing environmental demands that distinguishes HGT from gene genesis, duplications or mutations. For a precise characterization, algorithms are needed that identify transfer events with high reliability. Frequently, the transferred pieces of DNA have a considerable length, comprise several genes and are called genomic islands (GIs) or more specifically pathogenicity or symbiotic islands. Results We have implemented the program SIGI-HMM that predicts GIs and the putative donor of each individual alien gene. It is based on the analysis of codon usage (CU) of each individual gene of a genome under study. CU of each gene is compared against a carefully selected set of CU tables representing microbial donors or highly expressed genes. Multiple tests are used to identify putatively alien genes, to predict putative donors and to mask putatively highly expressed genes. Thus, we determine the states and emission probabilities of an inhomogeneous hidden Markov model working on gene level. For the transition probabilities, we draw upon classical test theory with the intention of integrating a sensitivity controller in a consistent manner. SIGI-HMM was written in JAVA and is publicly available. It accepts as input any file created according to the EMBL-format. It generates output in the common GFF format readable for genome browsers. Benchmark tests showed that the output of SIGI-HMM is in agreement with known findings. Its predictions were both consistent with annotated GIs and with predictions generated by different methods. Conclusion SIGI-HMM is a sensitive tool for the identification of GIs in microbial genomes. It allows to interactively analyze genomes in detail and to generate or to test hypotheses about the origin of acquired genes.
Collapse
Affiliation(s)
- Stephan Waack
- Institut für Informatik, Universität Göttingen, Lotzestr. 16–18, 37083 Göttingen, Germany
| | - Oliver Keller
- Institut für Informatik, Universität Göttingen, Lotzestr. 16–18, 37083 Göttingen, Germany
| | - Roman Asper
- Institut für Informatik, Universität Göttingen, Lotzestr. 16–18, 37083 Göttingen, Germany
| | - Thomas Brodag
- Institut für Informatik, Universität Göttingen, Lotzestr. 16–18, 37083 Göttingen, Germany
| | - Carsten Damm
- Institut für Numerische und Angewandte Mathematik, Universität Göttingen, Lotzestr. 16–18, 37083 Göttingen, Germany
| | - Wolfgang Florian Fricke
- Göttingen Genomics Laboratory, Universität Göttingen, Grisebachstr. 8, 37077 Göttingen, Germany
| | - Katharina Surovcik
- Institut für Informatik, Universität Göttingen, Lotzestr. 16–18, 37083 Göttingen, Germany
| | - Peter Meinicke
- Institut für Mikrobiologie und Genetik, Universität Göttingen, Goldschmidtstr. 1, 37077 Göttingen, Germany
| | - Rainer Merkl
- Institut für Biophysik und Physikalische Biochemie, Universität Regensburg, Universitätsstr. 31, 93053 Regensburg, Germany
| |
Collapse
|