1
|
Guo X, Guo Y, Chen H, Liu X, He P, Li W, Zhang MQ, Dai Q. Systematic comparison of genome information processing and boundary recognition tools used for genomic island detection. Comput Biol Med 2023; 166:107550. [PMID: 37826950 DOI: 10.1016/j.compbiomed.2023.107550] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 09/12/2023] [Accepted: 09/28/2023] [Indexed: 10/14/2023]
Abstract
Genomic islands are fragments of foreign DNA that are found in bacterial and archaeal genomes, and are typically associated with symbiosis or pathogenesis. While numerous genomic island detection methods have been proposed, there has been limited evaluation of the efficiency of the genome information processing and boundary recognition tools. In this study, we conducted a review of the statistical methods involved in genomic signatures, host signature extraction, informative signature selection, divergence measures, and boundary detection steps in genomic island prediction. We compared the performances of these methods on simulated experiments using alien fragments obtained from both artificial and real genomes. Our results indicate that among the nine genomic signatures evaluated, genomic signature frequency and full probability performed the best. However, their performance declined when normalized to their expectations and variances, such as Z-score and composition vector. Based on our experiments of the E. coli genome, we found that the confidence intervals of the window variances achieved the best performance in the signature extraction of the host, with the best confidence interval being 1.5-2 times the standard error. Ordered kurtosis was most effective in selecting informative signatures from a single genome, without requiring prior knowledge from other datasets. Among the three divergence measures evaluated, the two-sample t-test was the most successful, and a non-overlapping window with a small eye window (size 2) was best suited for identifying compositionally distinct regions. Finally, the maximum of the Markovian Jensen-Shannon divergence score, in terms of GC-content bias, was found to make boundary detection faster while maintaining a similar error rate.
Collapse
Affiliation(s)
- Xiangting Guo
- Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Yichu Guo
- Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Hu Chen
- Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Xiaoqing Liu
- College of Sciences, Hangzhou Dianzi University, Hangzhou, 310018, China
| | - Pingan He
- Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Wenshu Li
- Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Michael Q Zhang
- Center for Systems Biology, University of Texas at Dallas, Richardson, TX, 75080, USA; Center for Synthetic and Systems Biology, TNLIST, Tsinghua University, Beijing, 100084, China
| | - Qi Dai
- Zhejiang Sci-Tech University, Hangzhou, 310018, China; Center for Systems Biology, University of Texas at Dallas, Richardson, TX, 75080, USA.
| |
Collapse
|
2
|
Genome analysis uncovers the prolific antagonistic and plant growth-promoting potential of endophyte Bacillus velezensis K1. Gene 2022; 836:146671. [PMID: 35714801 DOI: 10.1016/j.gene.2022.146671] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 05/23/2022] [Accepted: 06/10/2022] [Indexed: 11/24/2022]
Abstract
Insights into the application of endophytic bacilli in sustainable agricultural practices have opened up new avenues for the inhibition of soil-borne pathogens and the improvement of plant health. Bacillus subtilis K1, an endophytic bacterium originally isolated from aerial roots of Ficus benghalensis is a potential biocontrol agent secreting a mixture of surfactins, iturins and fengycins. The current study extends the characterization of this bacterium through genomic and comparative genomics approaches. The sequencing of the bacterial genome at Illumina MiSeq platform revealed that it possessed a 4,103,502-bp circular chromosome with 45.98% GC content and 4325 predicted protein-coding sequences. Based on phylogenomics and whole-genome average nucleotide identity, the B. subtilis K1 was taxonomically classified as Bacillus velezensis. The formerly evaluated phenotypic traits viz. C-source utilization and lipopeptide-mediated fungal antagonism were correlated to their molecular determinants. The genome also harbored several genes associated with induced systemic resistance and plant growth promotion i.e, phytohormone production, nitrogen assimilation and reduction, siderophore production, phosphate solubilization, biofilm formation, swarming motility, acetoin and butanediol synthesis. The production of antifungal volatile organic compounds and plant growth promotion was experimentally demonstrated by volatile compound assay and seed germination assay on cumin and groundnut. The isolate also holds great prospects for application as a soil inoculant as indicated by enhancement in the growth of groundnut via in planta pot studies. Bacterial pan-genome analysis based on a comparison of whole genomes with eighteen other Bacillus strains was also conducted. Comparative examination of biosynthetic gene clusters across all genomes indicated that the largest number of gene clusters were harbored by the K1 genome. Based on the findings, we propose K1 as a model for scrutinizing non-ribosomally synthesized peptide synthetase and polyketide synthetase derived molecules.
Collapse
|
3
|
Chakraborty J, Roy RP, Chatterjee R, Chaudhuri P. Performance assessment of genomic island prediction tools with an improved version of Design-Island. Comput Biol Chem 2022; 98:107698. [PMID: 35597186 DOI: 10.1016/j.compbiolchem.2022.107698] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 04/01/2022] [Accepted: 05/11/2022] [Indexed: 11/03/2022]
Abstract
Genomic Islands (GIs) play an important role in the evolution and adaptation of prokaryotes. The origin and extent of ecological diversity of prokaryotes can be analyzed by comparing GIs across closely or distantly related prokaryotes. Understanding the importance of GI and to study the bacterial evolution, several GI prediction tools have been generated. An unsupervised method, Design-Island, was developed to identify GIs using Monte-Carlo statistical test on randomly selected segments of a chromosome. Here, in the present study Design-Island was modified with the incorporation of majority voting, multiple hypothesis testing correction. The performance of the modified version, Design-Island-II was tested and compared with the existing GI prediction tools. The performance assessment and benchmarking of the GI prediction tools require experimentally validated dataset, which is lacking. So, different datasets, generated or taken from literature were utilized to compare the sensitivity (SN), specificity (SP), precision (PPV) and accuracy (AC) of Design-Island-II. It showed substantial enhancement in term of SN, SP, PPV and AC, and significantly reduced the computation time of the algorithm. The performance of Design-Island-II has also been compared with several GI prediction tools using curated dataset of putative horizontally transferred genes. Design-Island-II showed the highest sensitivity and F1 score, comparable specificity, precision and accuracy in comparison to the other available methods. IslandViewer4 and Islander outperformed all the available methods in terms of AC and PPV respectively. Our study suggested Design-Island-II, IslandViewer4 and GIHunter among the top performing GI prediction tools considering both sensitivity and specificity of the methods.
Collapse
Affiliation(s)
- Joyeeta Chakraborty
- Human Genetics Unit, Indian Statistical Institute, 203 B T Road, Kolkata 700 108, India.
| | - Rudra Prasad Roy
- Human Genetics Unit, Indian Statistical Institute, 203 B T Road, Kolkata 700 108, India.
| | - Raghunath Chatterjee
- Human Genetics Unit, Indian Statistical Institute, 203 B T Road, Kolkata 700 108, India.
| | - Probal Chaudhuri
- Theoretical Statistics and Mathematics Unit, Indian Statistical Institute, 203 B T Road, Kolkata 700 108, India.
| |
Collapse
|
4
|
Genomic Island Prediction via Chi-Square Test and Random Forest Algorithm. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:9969751. [PMID: 34122622 PMCID: PMC8169257 DOI: 10.1155/2021/9969751] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 05/14/2021] [Indexed: 12/02/2022]
Abstract
Genomic islands are related to microbial adaptation and carry different genomic characteristics from the host. Therefore, many methods have been proposed to detect genomic islands from the rest of the genome by evaluating its sequence composition. Many sequence features have been proposed, but many of them have not been applied to the identification of genomic islands. In this paper, we present a scheme to predict genomic islands using the chi-square test and random forest algorithm. We extract seven kinds of sequence features and select the important features with the chi-square test. All the selected features are then input into the random forest to predict the genome islands. Three experiments and comparison show that the proposed method achieves the best performance. This understanding can be useful to design more powerful method for the genomic island prediction.
Collapse
|
5
|
Maguire F, Jia B, Gray KL, Lau WYV, Beiko RG, Brinkman FSL. Metagenome-assembled genome binning methods with short reads disproportionately fail for plasmids and genomic Islands. Microb Genom 2020; 6:mgen000436. [PMID: 33001022 PMCID: PMC7660262 DOI: 10.1099/mgen.0.000436] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2020] [Accepted: 09/04/2020] [Indexed: 12/12/2022] Open
Abstract
Metagenomic methods enable the simultaneous characterization of microbial communities without time-consuming and bias-inducing culturing. Metagenome-assembled genome (MAG) binning methods aim to reassemble individual genomes from this data. However, the recovery of mobile genetic elements (MGEs), such as plasmids and genomic islands (GIs), by binning has not been well characterized. Given the association of antimicrobial resistance (AMR) genes and virulence factor (VF) genes with MGEs, studying their transmission is a public-health priority. The variable copy number and sequence composition of MGEs makes them potentially problematic for MAG binning methods. To systematically investigate this issue, we simulated a low-complexity metagenome comprising 30 GI-rich and plasmid-containing bacterial genomes. MAGs were then recovered using 12 current prediction pipelines and evaluated. While 82-94 % of chromosomes could be correctly recovered and binned, only 38-44 % of GIs and 1-29 % of plasmid sequences were found. Strikingly, no plasmid-borne VF nor AMR genes were recovered, and only 0-45 % of AMR or VF genes within GIs. We conclude that short-read MAG approaches, without further optimization, are largely ineffective for the analysis of mobile genes, including those of public-health importance, such as AMR and VF genes. We propose that researchers should explore developing methods that optimize for this issue and consider also using unassembled short reads and/or long-read approaches to more fully characterize metagenomic data.
Collapse
Affiliation(s)
- Finlay Maguire
- Faculty of Computer Science, Dalhousie University, 6050 University Avenue, Halifax, Nova Scotia, B3H 4R2, Canada
| | - Baofeng Jia
- Department of Molecular Biology and Biochemistry, Simon Fraser University, 8888 University Drive, Burnaby, BC V5A 1S6, Canada
| | - Kristen L. Gray
- Department of Molecular Biology and Biochemistry, Simon Fraser University, 8888 University Drive, Burnaby, BC V5A 1S6, Canada
| | - Wing Yin Venus Lau
- Department of Molecular Biology and Biochemistry, Simon Fraser University, 8888 University Drive, Burnaby, BC V5A 1S6, Canada
| | - Robert G. Beiko
- Faculty of Computer Science, Dalhousie University, 6050 University Avenue, Halifax, Nova Scotia, B3H 4R2, Canada
| | - Fiona S. L. Brinkman
- Department of Molecular Biology and Biochemistry, Simon Fraser University, 8888 University Drive, Burnaby, BC V5A 1S6, Canada
| |
Collapse
|
6
|
2SigFinder: the combined use of small-scale and large-scale statistical testing for genomic island detection from a single genome. BMC Bioinformatics 2020; 21:159. [PMID: 32349677 PMCID: PMC7191778 DOI: 10.1186/s12859-020-3501-2] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2020] [Accepted: 04/16/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genomic islands are associated with microbial adaptations, carrying genomic signatures different from the host. Some methods perform an overall test to identify genomic islands based on their local features. However, regions of different scales will display different genomic features. RESULTS We proposed here a novel method "2SigFinder ", the first combined use of small-scale and large-scale statistical testing for genomic island detection. The proposed method was tested by genomic island boundary detection and identification of genomic islands or functional features of real biological data. We also compared the proposed method with the comparative genomics and composition-based approaches. The results indicate that the proposed 2SigFinder is more efficient in identifying genomic islands. CONCLUSIONS From real biological data, 2SigFinder identified genomic islands from a single genome and reported robust results across different experiments, without annotated information of genomes or prior knowledge from other datasets. 2SigHunter identified 25 Pathogenicity, 1 tRNA, 2 Virulence and 2 Repeats from 27 Pathogenicity, 1 tRNA, 2 Virulence and 2 Repeats, and detected 101 Phage and 28 HEG out of 130 Phage and 36 HEGs in S. enterica Typhi CT18, which shows that it is more efficient in detecting functional features associated with GIs.
Collapse
|
7
|
Cavassim MIA, Moeskjær S, Moslemi C, Fields B, Bachmann A, Vilhjálmsson BJ, Schierup MH, W. Young JP, Andersen SU. Symbiosis genes show a unique pattern of introgression and selection within a Rhizobium leguminosarum species complex. Microb Genom 2020; 6:e000351. [PMID: 32176601 PMCID: PMC7276703 DOI: 10.1099/mgen.0.000351] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Accepted: 02/17/2020] [Indexed: 12/22/2022] Open
Abstract
Rhizobia supply legumes with fixed nitrogen using a set of symbiosis genes. These can cross rhizobium species boundaries, but it is unclear how many other genes show similar mobility. Here, we investigate inter-species introgression using de novo assembly of 196 Rhizobium leguminosarum sv. trifolii genomes. The 196 strains constituted a five-species complex, and we calculated introgression scores based on gene-tree traversal to identify 171 genes that frequently cross species boundaries. Rather than relying on the gene order of a single reference strain, we clustered the introgressing genes into four blocks based on population structure-corrected linkage disequilibrium patterns. The two largest blocks comprised 125 genes and included the symbiosis genes, a smaller block contained 43 mainly chromosomal genes, and the last block consisted of three genes with variable genomic location. All introgression events were likely mediated by conjugation, but only the genes in the symbiosis linkage blocks displayed overrepresentation of distinct, high-frequency haplotypes. The three genes in the last block were core genes essential for symbiosis that had, in some cases, been mobilized on symbiosis plasmids. Inter-species introgression is thus not limited to symbiosis genes and plasmids, but other cases are infrequent and show distinct selection signatures.
Collapse
Affiliation(s)
- Maria Izabel A. Cavassim
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Sara Moeskjær
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | - Camous Moslemi
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| | | | - Asger Bachmann
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | | | | | | | - Stig U. Andersen
- Department of Molecular Biology and Genetics, Aarhus University, Aarhus, Denmark
| |
Collapse
|
8
|
Yasuhara-Bell J, Arif M, Busot GY, Mann R, Rodoni B, Stack JP. Comparative Genomic Analysis Confirms Five Genetic Populations of the Select Agent, Rathayibacter toxicus. Microorganisms 2020; 8:E366. [PMID: 32150860 PMCID: PMC7143919 DOI: 10.3390/microorganisms8030366] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Revised: 02/24/2020] [Accepted: 03/03/2020] [Indexed: 02/01/2023] Open
Abstract
Rathayibacter toxicus is a Gram-positive, nematode-vectored bacterium that infects several grass species in the family Poaceae. Unique in its genus, R. toxicus has the smallest genome, possesses a complete CRISPR-Cas system, a vancomycin-resistance cassette, produces tunicamycin, a corynetoxin responsible for livestock deaths in Australia, and is designated a Select Agent in the United States. In-depth, genome-wide analyses performed in this study support the previously designated five genetic populations, with a core genome comprising approximately 80% of the genome for all populations. Results varied as a function of the type of analysis and when using different bioinformatics tools for the same analysis; e.g., some programs failed to identify specific genomic regions that were actually present. The software variance highlights the need to verify bioinformatics results by additional methods; e.g., PCR, mapping genes to genomes, use of multiple algorithms). These analyses suggest the following relationships among populations: RT-IV ↔ RT-I ↔ RT-II ↔ RT-III ↔ RT-V, with RT-IV and RT-V being the most unrelated. This is the most comprehensive analysis of R. toxicus that included populations RT-I and RT-V. Future studies require underrepresented populations and more recent isolates from varied hosts and geographic locations.
Collapse
Affiliation(s)
- Jarred Yasuhara-Bell
- Department of Plant Pathology, Kansas State University, 1712 Claflin Road, 4024 Throckmorton Plant Science Center, Manhattan, KS 66506, USA; (J.Y.-B.); (G.Y.B.)
- Plant Biosecurity Cooperative Research Centre, CRC for National Plant Biosecurity, Level 2, Building 22, Innovation Centre, University Drive, University of Canberra, Bruce, Australian Capital Territory, Canberra 2617, Australia; (M.A.); (R.M.); (B.R.)
| | - Mohammad Arif
- Plant Biosecurity Cooperative Research Centre, CRC for National Plant Biosecurity, Level 2, Building 22, Innovation Centre, University Drive, University of Canberra, Bruce, Australian Capital Territory, Canberra 2617, Australia; (M.A.); (R.M.); (B.R.)
- Department of Plant and Environmental Protection Sciences, University of Hawai`i at Mānoa, Honolulu, HI 96822, USA
| | - Grethel Y. Busot
- Department of Plant Pathology, Kansas State University, 1712 Claflin Road, 4024 Throckmorton Plant Science Center, Manhattan, KS 66506, USA; (J.Y.-B.); (G.Y.B.)
- Plant Biosecurity Cooperative Research Centre, CRC for National Plant Biosecurity, Level 2, Building 22, Innovation Centre, University Drive, University of Canberra, Bruce, Australian Capital Territory, Canberra 2617, Australia; (M.A.); (R.M.); (B.R.)
| | - Rachel Mann
- Plant Biosecurity Cooperative Research Centre, CRC for National Plant Biosecurity, Level 2, Building 22, Innovation Centre, University Drive, University of Canberra, Bruce, Australian Capital Territory, Canberra 2617, Australia; (M.A.); (R.M.); (B.R.)
- Department of Jobs, Precincts and Regions, Microbial Sciences, Pests & Diseases, Agriculture Victoria, AgriBio Centre, La Trobe University, 5 Ring Rd, Bundoora, Victoria 3083, Australia
| | - Brendan Rodoni
- Plant Biosecurity Cooperative Research Centre, CRC for National Plant Biosecurity, Level 2, Building 22, Innovation Centre, University Drive, University of Canberra, Bruce, Australian Capital Territory, Canberra 2617, Australia; (M.A.); (R.M.); (B.R.)
- Department of Jobs, Precincts and Regions, Microbial Sciences, Pests & Diseases, Agriculture Victoria, AgriBio Centre, La Trobe University, 5 Ring Rd, Bundoora, Victoria 3083, Australia
| | - James P. Stack
- Department of Plant Pathology, Kansas State University, 1712 Claflin Road, 4024 Throckmorton Plant Science Center, Manhattan, KS 66506, USA; (J.Y.-B.); (G.Y.B.)
- Plant Biosecurity Cooperative Research Centre, CRC for National Plant Biosecurity, Level 2, Building 22, Innovation Centre, University Drive, University of Canberra, Bruce, Australian Capital Territory, Canberra 2617, Australia; (M.A.); (R.M.); (B.R.)
| |
Collapse
|
9
|
Zhou Y, Zhang W, Wu H, Huang K, Jin J. A high-resolution genomic composition-based method with the ability to distinguish similar bacterial organisms. BMC Genomics 2019; 20:754. [PMID: 31638897 PMCID: PMC6805505 DOI: 10.1186/s12864-019-6119-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Accepted: 09/20/2019] [Indexed: 12/03/2022] Open
Abstract
Background Genomic composition has been found to be species specific and is used to differentiate bacterial species. To date, almost no published composition-based approaches are able to distinguish between most closely related organisms, including intra-genus species and intra-species strains. Thus, it is necessary to develop a novel approach to address this problem. Results Here, we initially determine that the “tetranucleotide-derived z-value Pearson correlation coefficient” (TETRA) approach is representative of other published statistical methods. Then, we devise a novel method called “Tetranucleotide-derived Z-value Manhattan Distance” (TZMD) and compare it with the TETRA approach. Our results show that TZMD reflects the maximal genome difference, while TETRA does not in most conditions, demonstrating in theory that TZMD provides improved resolution. Additionally, our analysis of real data shows that TZMD improves species differentiation and clearly differentiates similar organisms, including similar species belonging to the same genospecies, subspecies and intraspecific strains, most of which cannot be distinguished by TETRA. Furthermore, TZMD is able to determine clonal strains with the TZMD = 0 criterion, which intrinsically encompasses identical composition, high average nucleotide identity and high percentage of shared genomes. Conclusions Our extensive assessment demonstrates that TZMD has high resolution. This study is the first to propose a composition-based method for differentiating bacteria at the strain level and to demonstrate that composition is also strain specific. TZMD is a powerful tool and the first easy-to-use approach for differentiating clonal and non-clonal strains. Therefore, as the first composition-based algorithm for strain typing, TZMD will facilitate bacterial studies in the future.
Collapse
Affiliation(s)
- Yizhuang Zhou
- Laboratory of Hepatobiliary and Pancreatic Surgery, The Affiliated Hospital of Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China. .,Peking-Tsinghua Center for Life Science, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, People's Republic of China.
| | - Wenting Zhang
- Laboratory of Hepatobiliary and Pancreatic Surgery, The Affiliated Hospital of Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China
| | - Huixian Wu
- China-USA Lipids in Health and Disease Research Center, Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China.,Guangxi Key Laboratory of Molecular Medicine in Liver Injury and Repair, Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China
| | - Kai Huang
- Laboratory of Hepatobiliary and Pancreatic Surgery, The Affiliated Hospital of Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China.,China-USA Lipids in Health and Disease Research Center, Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China.,Guangxi Key Laboratory of Molecular Medicine in Liver Injury and Repair, Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China
| | - Junfei Jin
- Laboratory of Hepatobiliary and Pancreatic Surgery, The Affiliated Hospital of Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China. .,China-USA Lipids in Health and Disease Research Center, Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China. .,Guangxi Key Laboratory of Molecular Medicine in Liver Injury and Repair, Guilin Medical University, Guilin, Guangxi, 541001, People's Republic of China.
| |
Collapse
|
10
|
Bertelli C, Brinkman FSL. Improved genomic island predictions with IslandPath-DIMOB. Bioinformatics 2019; 34:2161-2167. [PMID: 29905770 PMCID: PMC6022643 DOI: 10.1093/bioinformatics/bty095] [Citation(s) in RCA: 84] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Accepted: 02/21/2018] [Indexed: 11/23/2022] Open
Abstract
Motivation Genomic islands (GIs) are clusters of genes of probable horizontal origin that play a major role in bacterial and archaeal genome evolution and microbial adaptability. They are of high medical and industrial interest, due to their enrichment in virulence factors, some antimicrobial resistance genes and adaptive metabolic pathways. The development of more sensitive but precise prediction tools, using either sequence composition-based methods or comparative genomics, is needed as large-scale analyses of microbial genomes increase. Results IslandPath-DIMOB, a leading GI prediction tool in the IslandViewer webserver, has now been significantly improved by modifying both the decision algorithm to determine sequence composition biases, and the underlying database of HMM profiles for associated mobility genes. The accuracy of IslandPath-DIMOB and other major software has been assessed using a reference GI dataset predicted by comparative genomics, plus a manually curated dataset from literature review. Compared to the previous version (v0.2.0), this IslandPath-DIMOB v1.0.0 achieves 11.7% and 5.3% increase in recall and precision, respectively. IslandPath-DIMOB has the highest Matthews correlation coefficient among individual prediction methods tested, combining one of the highest recall measures (46.9%) at high precision (87.4%). The only method with higher recall had notably lower precision (55.1%). This new IslandPath-DIMOB v1.0.0 will facilitate more accurate studies of GIs, including their key roles in microbial adaptability of medical, environmental and industrial interest. Availability and implementation IslandPath-DIMOB v1.0.0 is freely available through the IslandViewer webserver {{http://www.pathogenomics.sfu.ca/islandviewer/}} and as standalone software {{https://github.com/brinkmanlab/islandpath/}} under the GNU-GPLv3. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Claire Bertelli
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, Canada
| | - Fiona S L Brinkman
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, Canada
| |
Collapse
|
11
|
Bertelli C, Laird MR, Williams KP, Lau BY, Hoad G, Winsor GL, Brinkman FSL. IslandViewer 4: expanded prediction of genomic islands for larger-scale datasets. Nucleic Acids Res 2019; 45:W30-W35. [PMID: 28472413 PMCID: PMC5570257 DOI: 10.1093/nar/gkx343] [Citation(s) in RCA: 932] [Impact Index Per Article: 186.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2017] [Accepted: 04/18/2017] [Indexed: 11/14/2022] Open
Abstract
IslandViewer (http://www.pathogenomics.sfu.ca/islandviewer/) is a widely-used webserver for the prediction and interactive visualization of genomic islands (GIs, regions of probable horizontal origin) in bacterial and archaeal genomes. GIs disproportionately encode factors that enhance the adaptability and competitiveness of the microbe within a niche, including virulence factors and other medically or environmentally important adaptations. We report here the release of IslandViewer 4, with novel features to accommodate the needs of larger-scale microbial genomics analysis, while expanding GI predictions and improving its flexible visualization interface. A user management web interface as well as an HTTP API for batch analyses are now provided with a secured authentication to facilitate the submission of larger numbers of genomes and the retrieval of results. In addition, IslandViewer's integrated GI predictions from multiple methods have been improved and expanded by integrating the precise Islander method for pre-computed genomes, as well as an updated IslandPath-DIMOB for both pre-computed and user-supplied custom genome analysis. Finally, pre-computed predictions including virulence factors and antimicrobial resistance are now available for 6193 complete bacterial and archaeal strains publicly available in RefSeq. IslandViewer 4 provides key enhancements to facilitate the analysis of GIs and better understand their role in the evolution of successful environmental microbes and pathogens.
Collapse
Affiliation(s)
- Claire Bertelli
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Matthew R Laird
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kelly P Williams
- Systems Biology Department, Sandia National Laboratories, Livermore, CA 94551, USA
| | | | - Britney Y Lau
- Systems Biology Department, Sandia National Laboratories, Livermore, CA 94551, USA
| | - Gemma Hoad
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Geoffrey L Winsor
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Fiona S L Brinkman
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| |
Collapse
|
12
|
Dai Q, Bao C, Hai Y, Ma S, Zhou T, Wang C, Wang Y, Huo W, Liu X, Yao Y, Xuan Z, Chen M, Zhang MQ. MTGIpick allows robust identification of genomic islands from a single genome. Brief Bioinform 2019; 19:361-373. [PMID: 28025178 DOI: 10.1093/bib/bbw118] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Genomic islands (GIs) that are associated with microbial adaptations and carry sequence patterns different from that of the host are sporadically distributed among closely related species. This bias can dominate the signal of interest in GI detection. However, variations still exist among the segments of the host, although no uniform standard exists regarding the best methods of discriminating GIs from the rest of the genome in terms of compositional bias. In the present work, we proposed a robust software, MTGIpick, which used regions with pattern bias showing multiscale difference levels to identify GIs from the host. MTGIpick can identify GIs from a single genome without annotated information of genomes or prior knowledge from other data sets. When real biological data were used, MTGIpick demonstrated better performance than existing methods, as well as revealed potential GIs with accurate sizes missed by existing methods because of a uniform standard. Software and supplementary are freely available at http://bioinfo.zstu.edu.cn/MTGI or https://github.com/bioinfo0706/MTGIpick.
Collapse
Affiliation(s)
- Qi Dai
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China.,Department of Biological Sciences, Center for Systems Biology, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Chaohui Bao
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Yabing Hai
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Sheng Ma
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Tao Zhou
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Cong Wang
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Yunfei Wang
- Department of Biological Sciences, Center for Systems Biology, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Wenwen Huo
- Department of Biological Sciences, Center for Systems Biology, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Xiaoqing Liu
- College of Sciences, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Yuhua Yao
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Zhenyu Xuan
- Department of Biological Sciences, Center for Systems Biology, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Min Chen
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Michael Q Zhang
- Department of Biological Sciences, Center for Systems Biology, University of Texas at Dallas, Richardson, TX 75080, USA.,Division of Bioinformatics, Center for Synthetic and Systems Biology, TNLIST, Tsinghua University, Beijing, 100084, China
| |
Collapse
|
13
|
Tao J, Liu X, Yang S, Bao C, He P, Dai Q. An efficient genomic signature ranking method for genomic island prediction from a single genome. J Theor Biol 2019; 467:142-149. [PMID: 30768974 DOI: 10.1016/j.jtbi.2019.02.008] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2018] [Revised: 02/07/2019] [Accepted: 02/11/2019] [Indexed: 01/13/2023]
Abstract
Genomic islands that are associated with microbial adaptations and carry genomic signatures different from that of the host, and thus many methods have been proposed to select the informative genomic signatures from a range of organisms and discriminate genomic islands from the rest of the genome in terms of these signature biases. However, they are of limited use when closely related genomes are unavailable. In the present work, we proposed a kurtosis-based ranking method to select the informative genomic signatures from a single genome. In simulations with alien fragments from artificial and real genomes, the proposed kurtosis-based ranking method efficiently selected the informative genomic signatures from a single genome, without annotated information of genomes or prior knowledge from other datasets. This understanding can be useful to design more powerful method for genomic island detection.
Collapse
Affiliation(s)
- Jin Tao
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Xiaoqing Liu
- College of Sciences, Hangzhou Dianzi University, Hangzhou 310018, People's Republic of China
| | - Siqian Yang
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Chaohui Bao
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Pingan He
- College of Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Qi Dai
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China; Department of Molecular and Cell Biology, University of Texas at Dallas, Richardson, TX 75080, USA.
| |
Collapse
|
14
|
Bush EC, Clark AE, DeRanek CA, Eng A, Forman J, Heath K, Lee AB, Stoebel DM, Wang Z, Wilber M, Wu H. xenoGI: reconstructing the history of genomic island insertions in clades of closely related bacteria. BMC Bioinformatics 2018; 19:32. [PMID: 29402213 PMCID: PMC5799925 DOI: 10.1186/s12859-018-2038-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Accepted: 01/23/2018] [Indexed: 12/13/2022] Open
Abstract
Background Genomic islands play an important role in microbial genome evolution, providing a mechanism for strains to adapt to new ecological conditions. A variety of computational methods, both genome-composition based and comparative, have been developed to identify them. Some of these methods are explicitly designed to work in single strains, while others make use of multiple strains. In general, existing methods do not identify islands in the context of the phylogeny in which they evolved. Even multiple strain approaches are best suited to identifying genomic islands that are present in one strain but absent in others. They do not automatically recognize islands which are shared between some strains in the clade or determine the branch on which these islands inserted within the phylogenetic tree. Results We have developed a software package, xenoGI, that identifies genomic islands and maps their origin within a clade of closely related bacteria, determining which branch they inserted on. It takes as input a set of sequenced genomes and a tree specifying their phylogenetic relationships. Making heavy use of synteny information, the package builds gene families in a species-tree-aware way, and then attempts to combine into islands those families whose members are adjacent and whose most recent common ancestor is shared. The package provides a variety of text-based analysis functions, as well as the ability to export genomic islands into formats suitable for viewing in a genome browser. We demonstrate the capabilities of the package with several examples from enteric bacteria, including an examination of the evolution of the acid fitness island in the genus Escherichia. In addition we use output from simulations and a set of known genomic islands from the literature to show that xenoGI can accurately identify genomic islands and place them on a phylogenetic tree. Conclusions xenoGI is an effective tool for studying the history of genomic island insertions in a clade of microbes. It identifies genomic islands, and determines which branch they inserted on within the phylogenetic tree for the clade. Such information is valuable because it helps us understand the adaptive path that has produced living species. Electronic supplementary material The online version of this article (10.1186/s12859-018-2038-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Eliot C Bush
- Department of Biology, Harvey Mudd College, 301 Platt Blvd., Claremont, 91711, CA, USA.
| | - Anne E Clark
- Department of Biology, Harvey Mudd College, 301 Platt Blvd., Claremont, 91711, CA, USA.,Current address: Department of Genome Sciences, University of Washington, 3720 15th Ave NE, Seattle, 98195-5065, WA, USA
| | - Carissa A DeRanek
- Department of Biology, Harvey Mudd College, 301 Platt Blvd., Claremont, 91711, CA, USA
| | - Alexander Eng
- Department of Biology, Harvey Mudd College, 301 Platt Blvd., Claremont, 91711, CA, USA.,Current address: Department of Genome Sciences, University of Washington, 3720 15th Ave NE, Seattle, 98195-5065, WA, USA
| | - Juliet Forman
- Department of Biology, Harvey Mudd College, 301 Platt Blvd., Claremont, 91711, CA, USA
| | - Kevin Heath
- Department of Biology, Harvey Mudd College, 301 Platt Blvd., Claremont, 91711, CA, USA.,Current address: Department of Biology and Biotechnology, Worcester Polytechnic Institute, 100 Institute Rd., Worcester, 01609, MA, USA
| | - Alexander B Lee
- Department of Biology, Harvey Mudd College, 301 Platt Blvd., Claremont, 91711, CA, USA.,Current address: Quantitative Biosciences Program, Georgia Institute of Technology, 837 State Street, Atlanta, 30332-0430, GA, USA
| | - Daniel M Stoebel
- Department of Biology, Harvey Mudd College, 301 Platt Blvd., Claremont, 91711, CA, USA
| | - Zunyan Wang
- Department of Biology, Harvey Mudd College, 301 Platt Blvd., Claremont, 91711, CA, USA
| | - Matthew Wilber
- Department of Biology, Harvey Mudd College, 301 Platt Blvd., Claremont, 91711, CA, USA
| | - Helen Wu
- Department of Biology, Harvey Mudd College, 301 Platt Blvd., Claremont, 91711, CA, USA
| |
Collapse
|
15
|
Oliveira Alvarenga D, Moreira LM, Chandler M, Varani AM. A Practical Guide for Comparative Genomics of Mobile Genetic Elements in Prokaryotic Genomes. Methods Mol Biol 2018; 1704:213-242. [PMID: 29277867 DOI: 10.1007/978-1-4939-7463-4_7] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Mobile genetic elements (MGEs) are an important feature of prokaryote genomes but are seldom well annotated and, consequently, are often underestimated. MGEs include transposons (Tn), insertion sequences (ISs), prophages, genomic islands (GEIs), integrons, and integrative and conjugative elements (ICEs). They are intimately involved in genome evolution and promote phenomena such as genomic expansion and rearrangement, emergence of virulence and pathogenicity, and symbiosis. In spite of the annotation bottleneck, there are so far at least 75 different programs and databases dedicated to prokaryotic MGE analysis and annotation, and this number is rapidly growing. Here, we present a practical guide to explore, compare, and visualize prokaryote MGEs using a combination of available software and databases tailored to small scale genome analyses. This protocol can be coupled with expert MGE annotation and exploited for evolutionary and comparative genomic analyses.
Collapse
Affiliation(s)
- Danillo Oliveira Alvarenga
- Departamento de Tecnologia, Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista "Júlio de Mesquita Filho"-UNESP, Jaboticabal, SP, Brazil
| | - Leandro M Moreira
- Departamento de Ciências Biológicas-Núcleo de Pesquisas em Ciências Biológicas-NUPEB, Universidade Federal de Ouro Preto, Ouro Preto, Minas Gerais, Brazil
| | - Mick Chandler
- Laboratoire de Microbiologie et Génétique Moléculaires, CNRS 118, Route de Narbonne, 31062, Toulouse Cedex, France
| | - Alessandro M Varani
- Departamento de Tecnologia, Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista "Júlio de Mesquita Filho"-UNESP, Jaboticabal, SP, Brazil.
| |
Collapse
|
16
|
Comparative genome-scale modelling of Staphylococcus aureus strains identifies strain-specific metabolic capabilities linked to pathogenicity. Proc Natl Acad Sci U S A 2016; 113:E3801-9. [PMID: 27286824 DOI: 10.1073/pnas.1523199113] [Citation(s) in RCA: 160] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Staphylococcus aureus is a preeminent bacterial pathogen capable of colonizing diverse ecological niches within its human host. We describe here the pangenome of S. aureus based on analysis of genome sequences from 64 strains of S. aureus spanning a range of ecological niches, host types, and antibiotic resistance profiles. Based on this set, S. aureus is expected to have an open pangenome composed of 7,411 genes and a core genome composed of 1,441 genes. Metabolism was highly conserved in this core genome; however, differences were identified in amino acid and nucleotide biosynthesis pathways between the strains. Genome-scale models (GEMs) of metabolism were constructed for the 64 strains of S. aureus These GEMs enabled a systems approach to characterizing the core metabolic and panmetabolic capabilities of the S. aureus species. All models were predicted to be auxotrophic for the vitamins niacin (vitamin B3) and thiamin (vitamin B1), whereas strain-specific auxotrophies were predicted for riboflavin (vitamin B2), guanosine, leucine, methionine, and cysteine, among others. GEMs were used to systematically analyze growth capabilities in more than 300 different growth-supporting environments. The results identified metabolic capabilities linked to pathogenic traits and virulence acquisitions. Such traits can be used to differentiate strains responsible for mild vs. severe infections and preference for hosts (e.g., animals vs. humans). Genome-scale analysis of multiple strains of a species can thus be used to identify metabolic determinants of virulence and increase our understanding of why certain strains of this deadly pathogen have spread rapidly throughout the world.
Collapse
|
17
|
Lu B, Leong HW. Computational methods for predicting genomic islands in microbial genomes. Comput Struct Biotechnol J 2016; 14:200-6. [PMID: 27293536 PMCID: PMC4887561 DOI: 10.1016/j.csbj.2016.05.001] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2016] [Revised: 05/01/2016] [Accepted: 05/03/2016] [Indexed: 11/02/2022] Open
Abstract
Clusters of genes acquired by lateral gene transfer in microbial genomes, are broadly referred to as genomic islands (GIs). GIs often carry genes important for genome evolution and adaptation to niches, such as genes involved in pathogenesis and antibiotic resistance. Therefore, GI prediction has gradually become an important part of microbial genome analysis. Despite inherent difficulties in identifying GIs, many computational methods have been developed and show good performance. In this mini-review, we first summarize the general challenges in predicting GIs. Then we group existing GI detection methods by their input, briefly describe representative methods in each group, and discuss their advantages as well as limitations. Finally, we look into the potential improvements for better GI prediction.
Collapse
Affiliation(s)
- Bingxin Lu
- Department of Computer Science, National University of Singapore, 13 Computing Drive, Singapore 117417, Republic of Singapore
| | - Hon Wai Leong
- Department of Computer Science, National University of Singapore, 13 Computing Drive, Singapore 117417, Republic of Singapore
| |
Collapse
|
18
|
Wegmann U, MacKenzie DA, Zheng J, Goesmann A, Roos S, Swarbreck D, Walter J, Crossman LC, Juge N. The pan-genome of Lactobacillus reuteri strains originating from the pig gastrointestinal tract. BMC Genomics 2015; 16:1023. [PMID: 26626322 PMCID: PMC4667477 DOI: 10.1186/s12864-015-2216-7] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2015] [Accepted: 11/16/2015] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Lactobacillus reuteri is a gut symbiont of a wide variety of vertebrate species that has diversified into distinct phylogenetic clades which are to a large degree host-specific. Previous work demonstrated host specificity in mice and begun to determine the mechanisms by which gut colonisation and host restriction is achieved. However, how L. reuteri strains colonise the gastrointestinal (GI) tract of pigs is unknown. RESULTS To gain insight into the ecology of L. reuteri in the pig gut, the genome sequence of the porcine small intestinal isolate L. reuteri ATCC 53608 was completed and consisted of a chromosome of 1.94 Mbp and two plasmids of 138.5 kbp and 9.09 kbp, respectively. Furthermore, we generated draft genomes of four additional L. reuteri strains isolated from pig faeces or lower GI tract, lp167-67, pg-3b, 20-2 and 3c6, and subjected all five genomes to a comparative genomic analysis together with the previously completed genome of strain I5007. A phylogenetic analysis based on whole genomes showed that porcine L. reuteri strains fall into two distinct clades, as previously suggested by multi-locus sequence analysis. These six pig L. reuteri genomes contained a core set of 1364 orthologous gene clusters, as determined by OrthoMCL analysis, that contributed to a pan-genome totalling 3373 gene clusters. Genome comparisons of the six pig L. reuteri strains with 14 L. reuteri strains from other host origins gave a total pan-genome of 5225 gene clusters that included a core genome of 851 gene clusters but revealed that there were no pig-specific genes per se. However, genes specific for and conserved among strains of the two pig phylogenetic lineages were detected, some of which encoded cell surface proteins that could contribute to the diversification of the two lineages and their observed host specificity. CONCLUSIONS This study extends the phylogenetic analysis of L. reuteri strains at a genome-wide level, pointing to distinct evolutionary trajectories of porcine L. reuteri lineages, and providing new insights into the genomic events in L. reuteri that occurred during specialisation to their hosts. The occurrence of two distinct pig-derived clades may reflect differences in host genotype, environmental factors such as dietary components or to evolution from ancestral strains of human and rodent origin following contact with pig populations.
Collapse
Affiliation(s)
- Udo Wegmann
- The Gut Health and Food Safety Programme, Institute of Food Research, Norwich Research Park, Norwich, NR4 7UA, UK.
| | - Donald A MacKenzie
- The Gut Health and Food Safety Programme, Institute of Food Research, Norwich Research Park, Norwich, NR4 7UA, UK.
| | - Jinshui Zheng
- State Key Lab of Agricultural Microbiology, Huazhong Agricultural University, Wuhan, China.
| | - Alexander Goesmann
- Bioinformatics and Systems Biology, Justus-Liebig-Universität, Gießen, 35392, Germany.
| | - Stefan Roos
- Department of Microbiology, Swedish University of Agricultural Sciences, Uppsala, S-750 07, Sweden.
| | - David Swarbreck
- The Genome Analysis Centre, Norwich Research Park, Norwich, NR4 7UH, UK.
| | - Jens Walter
- Department of Agricultural, Food, and Nutritional Science, University of Alberta, Edmonton, AB, T6G 2R3, Canada.
- Department of Biological Sciences, University of Alberta, Edmonton, AB, T6G 2E1, Canada.
| | - Lisa C Crossman
- School of Biological Sciences, University of East Anglia, Norwich, NR4 7TJ, UK.
- SequenceAnalysis.co.uk, NRP Innovation Centre, Norwich, NR4 7UG, UK.
| | - Nathalie Juge
- The Gut Health and Food Safety Programme, Institute of Food Research, Norwich Research Park, Norwich, NR4 7UA, UK.
| |
Collapse
|
19
|
A Computational Framework for Tracing the Origins of Genomic Islands in Prokaryotes. INTERNATIONAL SCHOLARLY RESEARCH NOTICES 2014; 2014:732857. [PMID: 27433520 PMCID: PMC4897231 DOI: 10.1155/2014/732857] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/06/2014] [Revised: 07/27/2014] [Accepted: 07/30/2014] [Indexed: 11/18/2022]
Abstract
Genomic islands (GIs) are chunks of genomic fragments that are acquired from nongenealogical organisms through horizontal gene transfer (HGT). Current researches on studying donor-recipient relationships for HGT are limited at a gene level. As more GIs have been identified and verified, the way of studying donor-recipient relationships can be better modeled by using GIs rather than individual genes. In this paper, we report the development of a computational framework for detecting origins of GIs. The main idea of our computational framework is to identify GIs in a query genome, search candidate genomes that contain genomic regions similar to those GIs in the query genome by BLAST search, and then filter out some candidate genomes if those similar genomic regions are also alien (detected by GI detection tools). We have applied our framework in finding the GI origins for Mycobacterium tuberculosis H37Rv, Herminiimonas arsenicoxydans, and three Thermoanaerobacter species. The predicted results were used to establish the donor-recipient network relationships and visualized by Gephi. Our studies have shown that donor genomes detected by our computational approach were mainly consistent with previous studies. Our framework was implemented with Perl and executed on Windows operating system.
Collapse
|
20
|
Rebets Y, Tokovenko B, Lushchyk I, Rückert C, Zaburannyi N, Bechthold A, Kalinowski J, Luzhetskyy A. Complete genome sequence of producer of the glycopeptide antibiotic Aculeximycin Kutzneria albida DSM 43870T, a representative of minor genus of Pseudonocardiaceae. BMC Genomics 2014; 15:885. [PMID: 25301375 PMCID: PMC4210621 DOI: 10.1186/1471-2164-15-885] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2014] [Accepted: 10/03/2014] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Kutzneria is a representative of a rarely observed genus of the family Pseudonocardiaceae. Kutzneria species were initially placed in the Streptosporangiaceae genus and later reconsidered to be an independent genus of the Pseudonocardiaceae. Kutzneria albida is one of the eight known members of the genus. This strain is a unique producer of the glycosylated polyole macrolide aculeximycin which is active against both bacteria and fungi. Kutzneria albida genome sequencing and analysis allow a deeper understanding of evolution of this genus of Pseudonocardiaceae, provide new insight in the phylogeny of the genus, as well as decipher the hidden secondary metabolic potential of these rare actinobacteria. RESULTS To explore the biosynthetic potential of Kutzneria albida to its full extent, the complete genome was sequenced. With a size of 9,874,926 bp, coding for 8,822 genes, it stands alongside other Pseudonocardiaceae with large circular genomes. Genome analysis revealed 46 gene clusters potentially encoding secondary metabolite biosynthesis pathways. Two large genomic islands were identified, containing regions most enriched with secondary metabolism gene clusters. Large parts of this secondary metabolism "clustome" are dedicated to siderophores production. CONCLUSIONS Kutzneria albida is the first species of the genus Kutzneria with a completely sequenced genome. Genome sequencing allowed identifying the gene cluster responsible for the biosynthesis of aculeximycin, one of the largest known oligosaccharide-macrolide antibiotics. Moreover, the genome revealed 45 additional putative secondary metabolite gene clusters, suggesting a huge biosynthetic potential, which makes Kutzneria albida a very rich source of natural products. Comparison of the Kutzneria albida genome to genomes of other actinobacteria clearly shows its close relations with Pseudonocardiaceae in line with the taxonomic position of the genus.
Collapse
Affiliation(s)
- Yuriy Rebets
- />Helmholtz-Institute for Pharmaceutical Research Saarland, Saarland University Campus, Building C2.3, 66123 Saarbrücken, Germany
| | - Bogdan Tokovenko
- />Helmholtz-Institute for Pharmaceutical Research Saarland, Saarland University Campus, Building C2.3, 66123 Saarbrücken, Germany
| | - Igor Lushchyk
- />Helmholtz-Institute for Pharmaceutical Research Saarland, Saarland University Campus, Building C2.3, 66123 Saarbrücken, Germany
| | - Christian Rückert
- />Center for Biotechnology, Bielefeld University, Universitätsstraße 27, 33615 Bielefeld, Germany
| | - Nestor Zaburannyi
- />Helmholtz-Institute for Pharmaceutical Research Saarland, Saarland University Campus, Building C2.3, 66123 Saarbrücken, Germany
| | - Andreas Bechthold
- />Institut für Pharmazeutische Biologie und Biotechnologie, Albert-Ludwigs Universität, Stefan-Meier-Strasse 19, 79104 Freiburg, Germany
| | - Jörn Kalinowski
- />Center for Biotechnology, Bielefeld University, Universitätsstraße 27, 33615 Bielefeld, Germany
| | - Andriy Luzhetskyy
- />Helmholtz-Institute for Pharmaceutical Research Saarland, Saarland University Campus, Building C2.3, 66123 Saarbrücken, Germany
| |
Collapse
|
21
|
Identifying pathogenicity islands in bacterial pathogenomics using computational approaches. Pathogens 2014; 3:36-56. [PMID: 25437607 PMCID: PMC4235732 DOI: 10.3390/pathogens3010036] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2013] [Revised: 12/30/2013] [Accepted: 01/07/2014] [Indexed: 12/22/2022] Open
Abstract
High-throughput sequencing technologies have made it possible to study bacteria through analyzing their genome sequences. For instance, comparative genome sequence analyses can reveal the phenomenon such as gene loss, gene gain, or gene exchange in a genome. By analyzing pathogenic bacterial genomes, we can discover that pathogenic genomic regions in many pathogenic bacteria are horizontally transferred from other bacteria, and these regions are also known as pathogenicity islands (PAIs). PAIs have some detectable properties, such as having different genomic signatures than the rest of the host genomes, and containing mobility genes so that they can be integrated into the host genome. In this review, we will discuss various pathogenicity island-associated features and current computational approaches for the identification of PAIs. Existing pathogenicity island databases and related computational resources will also be discussed, so that researchers may find it to be useful for the studies of bacterial evolution and pathogenicity mechanisms.
Collapse
|
22
|
Wegmann U, Louis P, Goesmann A, Henrissat B, Duncan SH, Flint HJ. Complete genome of a new Firmicutes species belonging to the dominant human colonic microbiota ('Ruminococcus bicirculans') reveals two chromosomes and a selective capacity to utilize plant glucans. Environ Microbiol 2013; 16:2879-90. [PMID: 23919528 DOI: 10.1111/1462-2920.12217] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2012] [Revised: 06/28/2013] [Accepted: 07/14/2013] [Indexed: 01/22/2023]
Abstract
The recently isolated bacterial strain 80/3 represents one of the most abundant 16S rRNA phylotypes detected in the healthy human large intestine and belongs to the Ruminococcaceae family of Firmicutes. The completed genome sequence reported here is the first for a member of this important family of bacteria from the human colon. The genome comprises two large chromosomes of 2.24 and 0.73 Mbp, leading us to propose the name Ruminococcus bicirculans for this new species. Analysis of the carbohydrate active enzyme complement suggests an ability to utilize certain hemicelluloses, especially β-glucans and xyloglucan, for growth that was confirmed experimentally. The enzymatic machinery enabling the degradation of cellulose and xylan by related cellulolytic ruminococci is however lacking in this species. While the genome indicated the capacity to synthesize purines, pyrimidines and all 20 amino acids, only genes for the synthesis of nicotinate, NAD+, NADP+ and coenzyme A were detected among the essential vitamins and co-factors, resulting in multiple growth requirements. In vivo, these growth factors must be supplied from the diet, host or other gut microorganisms. Other features of ecological interest include two type IV pilins, multiple extracytoplasmic function-sigma factors, a urease and a bile salt hydrolase.
Collapse
Affiliation(s)
- Udo Wegmann
- Gut Health and Food Safety Programme, Institute of Food Research, Norwich Research Park, Norwich, NR4 7UA, UK
| | | | | | | | | | | |
Collapse
|
23
|
Lee CC, Chen YPP, Yao TJ, Ma CY, Lo WC, Lyu PC, Tang CY. GI-POP: A combinational annotation and genomic island prediction pipeline for ongoing microbial genome projects. Gene 2013; 518:114-23. [DOI: 10.1016/j.gene.2012.11.063] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2012] [Accepted: 11/27/2012] [Indexed: 10/27/2022]
|
24
|
Hain T, Ghai R, Billion A, Kuenne CT, Steinweg C, Izar B, Mohamed W, Mraheil MA, Domann E, Schaffrath S, Kärst U, Goesmann A, Oehm S, Pühler A, Merkl R, Vorwerk S, Glaser P, Garrido P, Rusniok C, Buchrieser C, Goebel W, Chakraborty T. Comparative genomics and transcriptomics of lineages I, II, and III strains of Listeria monocytogenes. BMC Genomics 2012; 13:144. [PMID: 22530965 PMCID: PMC3464598 DOI: 10.1186/1471-2164-13-144] [Citation(s) in RCA: 63] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2011] [Accepted: 04/12/2012] [Indexed: 12/13/2022] Open
Abstract
Background Listeria monocytogenes is a food-borne pathogen that causes infections with a high-mortality rate and has served as an invaluable model for intracellular parasitism. Here, we report complete genome sequences for two L. monocytogenes strains belonging to serotype 4a (L99) and 4b (CLIP80459), and transcriptomes of representative strains from lineages I, II, and III, thereby permitting in-depth comparison of genome- and transcriptome -based data from three lineages of L. monocytogenes. Lineage III, represented by the 4a L99 genome is known to contain strains less virulent for humans. Results The genome analysis of the weakly pathogenic L99 serotype 4a provides extensive evidence of virulence gene decay, including loss of several important surface proteins. The 4b CLIP80459 genome, unlike the previously sequenced 4b F2365 genome harbours an intact inlB invasion gene. These lineage I strains are characterized by the lack of prophage genes, as they share only a single prophage locus with other L. monocytogenes genomes 1/2a EGD-e and 4a L99. Comparative transcriptome analysis during intracellular growth uncovered adaptive expression level differences in lineages I, II and III of Listeria, notable amongst which was a strong intracellular induction of flagellar genes in strain 4a L99 compared to the other lineages. Furthermore, extensive differences between strains are manifest at levels of metabolic flux control and phosphorylated sugar uptake. Intriguingly, prophage gene expression was found to be a hallmark of intracellular gene expression. Deletion mutants in the single shared prophage locus of lineage II strain EGD-e 1/2a, the lma operon, revealed severe attenuation of virulence in a murine infection model. Conclusion Comparative genomics and transcriptome analysis of L. monocytogenes strains from three lineages implicate prophage genes in intracellular adaptation and indicate that gene loss and decay may have led to the emergence of attenuated lineages.
Collapse
Affiliation(s)
- Torsten Hain
- Institute of Medical Microbiology, Justus-Liebig-University, Schubertstrasse 81, Giessen, D-35392, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Soares SC, Abreu VAC, Ramos RTJ, Cerdeira L, Silva A, Baumbach J, Trost E, Tauch A, Hirata R, Mattos-Guaraldi AL, Miyoshi A, Azevedo V. PIPS: pathogenicity island prediction software. PLoS One 2012; 7:e30848. [PMID: 22355329 PMCID: PMC3280268 DOI: 10.1371/journal.pone.0030848] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2011] [Accepted: 12/22/2011] [Indexed: 01/08/2023] Open
Abstract
The adaptability of pathogenic bacteria to hosts is influenced by the genomic plasticity of the bacteria, which can be increased by such mechanisms as horizontal gene transfer. Pathogenicity islands play a major role in this type of gene transfer because they are large, horizontally acquired regions that harbor clusters of virulence genes that mediate the adhesion, colonization, invasion, immune system evasion, and toxigenic properties of the acceptor organism. Currently, pathogenicity islands are mainly identified in silico based on various characteristic features: (1) deviations in codon usage, G+C content or dinucleotide frequency and (2) insertion sequences and/or tRNA genetic flanking regions together with transposase coding genes. Several computational techniques for identifying pathogenicity islands exist. However, most of these techniques are only directed at the detection of horizontally transferred genes and/or the absence of certain genomic regions of the pathogenic bacterium in closely related non-pathogenic species. Here, we present a novel software suite designed for the prediction of pathogenicity islands (pathogenicity island prediction software, or PIPS). In contrast to other existing tools, our approach is capable of utilizing multiple features for pathogenicity island detection in an integrative manner. We show that PIPS provides better accuracy than other available software packages. As an example, we used PIPS to study the veterinary pathogen Corynebacterium pseudotuberculosis, in which we identified seven putative pathogenicity islands.
Collapse
Affiliation(s)
- Siomar C. Soares
- Department of General Biology, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Vinícius A. C. Abreu
- Department of Biochemistry and Immunology, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | | | - Louise Cerdeira
- Department of Genetics, Federal University of Pará, Belém, Pará, Brazil
| | - Artur Silva
- Department of Genetics, Federal University of Pará, Belém, Pará, Brazil
| | - Jan Baumbach
- Department of Computer Science, Max-Planck-Institut für Informatik, Saarbrücken, Saarland, Germany
| | - Eva Trost
- Center for Biotechnology, Bielefeld University, Bielefeld, Nordrhein-Westfalen, Germany
| | - Andreas Tauch
- Center for Biotechnology, Bielefeld University, Bielefeld, Nordrhein-Westfalen, Germany
| | - Raphael Hirata
- Microbiology and Immunology Discipline, Medical Sciences Faculty, State University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Ana L. Mattos-Guaraldi
- Microbiology and Immunology Discipline, Medical Sciences Faculty, State University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Anderson Miyoshi
- Department of General Biology, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Vasco Azevedo
- Department of General Biology, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- Department of Biochemistry and Immunology, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- * E-mail:
| |
Collapse
|
26
|
Abstract
Methods for identifying alien genes in genomes fall into two general classes. Phylogenetic methods examine the distribution of a gene's homologues among genomes to find those with relationships not consistent with vertical inheritance. These approaches include identifying orphan genes which lack homologues in closely related genomes and genes with unduly high levels of similarity to genes in otherwise unrelated genomes. Rigorous statistical tests are available to place confidence intervals for predicted alien genes. Parametric methods examine the compositional properties of genes within a genome to find those with atypical properties, likely indicating the directional mutational pressures of a donor genome. These methods may compare the properties of genes to genomic averages, properties of genes to each other, or properties of large, multigene regions of the chromosome. Here, we discuss the strengths and weaknesses of each approach.
Collapse
Affiliation(s)
- Rajeev K Azad
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, USA
| | | |
Collapse
|
27
|
Zhu B, Zhou S, Lou M, Zhu J, Li B, Xie G, Jin G, De Mot R. Characterization and inference of gene gain/loss along burkholderia evolutionary history. Evol Bioinform Online 2011; 7:191-200. [PMID: 22084562 PMCID: PMC3210638 DOI: 10.4137/ebo.s7510] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
A comparative analysis of 60 complete Burkholderia genomes was conducted to obtain insight in the evolutionary history behind the diversity and pathogenicity at species level. A concatenated multiprotein phyletic pattern and a dataset with Burkholderia clusters of orthologous genes (BuCOGs) were constructed. The extent of horizontal gene transfer (HGT) was assessed using a Markov based probabilistic method. A reconstruction of the gene gains and losses history shows that more than half of the Burkholderia genes families are inferred to have experienced HGT at least once during their evolution. Further analysis revealed that the number of gene gain and loss was correlated with the branch length. Genomic islands (GEIs) analysis based on evolutionary history reconstruction not only revealed that most genes in ancient GEIs were gained but also suggested that the fraction of the genome located in GEIs in the small chromosomes is higher than in the large chromosomes in Burkholderia. The mapping of coexpressed genes onto biological pathway schemes revealed that pathogenicity of Burkholderia strains is probably mainly determined by the gained genes in its ancestor. Taken together, our results strongly support that gene gain and loss especially in ancient evolutionary history play an important role in strain divergence, pathogenicity determinants of Burkholderia and GEIs formation.
Collapse
Affiliation(s)
- Bo Zhu
- State Key Laboratory of Rice Biology and Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Ministry of Agriculture, Institute of Biotechnology, Zhejiang University, Hangzhou 310029, China
| | - Shengli Zhou
- Environmental Monitoring Center of Zhejiang Province, Hangzhou 310015, China
| | - Miaomiao Lou
- State Key Laboratory of Rice Biology and Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Ministry of Agriculture, Institute of Biotechnology, Zhejiang University, Hangzhou 310029, China
| | - Jun Zhu
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310029, China
| | - Bin Li
- State Key Laboratory of Rice Biology and Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Ministry of Agriculture, Institute of Biotechnology, Zhejiang University, Hangzhou 310029, China
| | - Guanlin Xie
- State Key Laboratory of Rice Biology and Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Ministry of Agriculture, Institute of Biotechnology, Zhejiang University, Hangzhou 310029, China
| | - GuLei Jin
- State Key Laboratory of Rice Biology and Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Ministry of Agriculture, Institute of Biotechnology, Zhejiang University, Hangzhou 310029, China
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310029, China
| | - René De Mot
- Centre of Microbial and Plant Genetics, Katholieke Universiteit Leuven, 3001 Heverlee-Leuven 3001, Belgium
| |
Collapse
|
28
|
Roos TE, van Passel MWJ. A quantitative account of genomic island acquisitions in prokaryotes. BMC Genomics 2011; 12:427. [PMID: 21864345 PMCID: PMC3176501 DOI: 10.1186/1471-2164-12-427] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2011] [Accepted: 08/24/2011] [Indexed: 12/15/2022] Open
Abstract
Background Microbial genomes do not merely evolve through the slow accumulation of mutations, but also, and often more dramatically, by taking up new DNA in a process called horizontal gene transfer. These innovation leaps in the acquisition of new traits can take place via the introgression of single genes, but also through the acquisition of large gene clusters, which are termed Genomic Islands. Since only a small proportion of all the DNA diversity has been sequenced, it can be hard to find the appropriate donors for acquired genes via sequence alignments from databases. In contrast, relative oligonucleotide frequencies represent a remarkably stable genomic signature in prokaryotes, which facilitates compositional comparisons as an alignment-free alternative for phylogenetic relatedness. In this project, we test whether Genomic Islands identified in individual bacterial genomes have a similar genomic signature, in terms of relative dinucleotide frequencies, and can therefore be expected to originate from a common donor species. Results When multiple Genomic Islands are present within a single genome, we find that up to 28% of these are compositionally very similar to each other, indicative of frequent recurring acquisitions from the same donor to the same acceptor. Conclusions This represents the first quantitative assessment of common directional transfer events in prokaryotic evolutionary history. We suggest that many of the resident Genomic Islands per prokaryotic genome originated from the same source, which may have implications with respect to their regulatory interactions, and for the elucidation of the common origins of these acquired gene clusters.
Collapse
Affiliation(s)
- Tom E Roos
- Genomics Coordination Center, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | | |
Collapse
|
29
|
Rödelsperger C, Sommer RJ. Computational archaeology of the Pristionchus pacificus genome reveals evidence of horizontal gene transfers from insects. BMC Evol Biol 2011; 11:239. [PMID: 21843315 PMCID: PMC3175473 DOI: 10.1186/1471-2148-11-239] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2011] [Accepted: 08/15/2011] [Indexed: 11/10/2022] Open
Abstract
Background The recent sequencing of nematode genomes has laid the basis for comparative genomics approaches to study the impact of horizontal gene transfer (HGT) on the adaptation to new environments and the evolution of parasitism. In the beetle associated nematode Pristionchus pacificus HGT events were found to involve cellulase genes of microbial origin and Diapausin genes that are known from beetles, but not from other nematodes. The insect-to-nematode horizontal transfer is of special interest given that P. pacificus shows a tight association with insects. Results In this study we utilized the observation that horizontally transferred genes often exhibit codon usage patterns more similar to that of the donor than that of the acceptor genome. We introduced GC-normalized relative codon frequencies as a measure to detect characteristic features of P. pacificus orphan genes that show no homology to other nematode genes. We found that atypical codon usage is particularly prevalent in P. pacificus orphans. By comparing codon usage profiles of 71 species, we detected the most significant enrichment in insect-like codon usage profiles. In cross-species comparisons, we identified 509 HGT candidates that show a significantly higher similarity to insect-like profiles than genes with nematode homologs. The most abundant gene family among these genes are non-LTR retrotransposons. Speculating that retrotransposons might have served as carriers of foreign genetic material, we found a significant local clustering tendency of orphan genes in the vicinity of retrotransposons. Conclusions Our study combined codon usage bias, phylogenetic analysis, and genomic colocalization into a general picture of the computational archaeology of the P. pacificus genome and suggests that a substantial fraction of the gene repertoire is of insect origin. We propose that the Pristionchus-beetle association has facilitated HGT and discuss potential vectors of these events.
Collapse
Affiliation(s)
- Christian Rödelsperger
- Department for Evolutionary Biology, Max-Planck Institute for Developmental Biology, Spemannstrasse 37, 72076 Tübingen, Germany
| | | |
Collapse
|
30
|
Abstract
Because the properties of horizontally-transferred genes will reflect the mutational proclivities of their donor genomes, they often show atypical compositional properties relative to native genes. Parametric methods use these discrepancies to identify bacterial genes recently acquired by horizontal transfer. However, compositional patterns of native genes vary stochastically, leaving no clear boundary between typical and atypical genes. As a result, while strongly atypical genes are readily identified as alien, genes of ambiguous character are poorly classified when a single threshold separates typical and atypical genes. This limitation affects all parametric methods that examine genes independently, and escaping it requires the use of additional genomic information. We propose that the performance of all parametric methods can be improved by using a multiple-threshold approach. First, strongly atypical alien genes and strongly typical native genes would be identified using conservative thresholds. Genes with ambiguous compositional features would then be classified by examining gene context, including the class (native or alien) of flanking genes. By including additional genomic information in a multiple-threshold framework, we observed a remarkable improvement in the performance of several popular, but algorithmically distinct, methods for alien gene detection.
Collapse
Affiliation(s)
- Rajeev K Azad
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15260, USA
| | | |
Collapse
|
31
|
von Mandach C, Merkl R. Genes optimized by evolution for accurate and fast translation encode in Archaea and Bacteria a broad and characteristic spectrum of protein functions. BMC Genomics 2010; 11:617. [PMID: 21050470 PMCID: PMC3091758 DOI: 10.1186/1471-2164-11-617] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2010] [Accepted: 11/04/2010] [Indexed: 11/13/2022] Open
Abstract
Background In many microbial genomes, a strong preference for a small number of codons can be observed in genes whose products are needed by the cell in large quantities. This codon usage bias (CUB) improves translational accuracy and speed and is one of several factors optimizing cell growth. Whereas CUB and the overrepresentation of individual proteins have been studied in detail, it is still unclear which high-level metabolic categories are subject to translational optimization in different habitats. Results In a systematic study of 388 microbial species, we have identified for each genome a specific subset of genes characterized by a marked CUB, which we named the effectome. As expected, gene products related to protein synthesis are abundant in both archaeal and bacterial effectomes. In addition, enzymes contributing to energy production and gene products involved in protein folding and stabilization are overrepresented. The comparison of genomes from eleven habitats shows that the environment has only a minor effect on the composition of the effectomes. As a paradigmatic example, we detailed the effectome content of 37 bacterial genomes that are most likely exposed to strongest selective pressure towards translational optimization. These effectomes accommodate a broad range of protein functions like enzymes related to glycolysis/gluconeogenesis and the TCA cycle, ATP synthases, aminoacyl-tRNA synthetases, chaperones, proteases that degrade misfolded proteins, protectants against oxidative damage, as well as cold shock and outer membrane proteins. Conclusions We made clear that effectomes consist of specific subsets of the proteome being involved in several cellular functions. As expected, some functions are related to cell growth and affect speed and quality of protein synthesis. Additionally, the effectomes contain enzymes of central metabolic pathways and cellular functions sustaining microbial life under stress situations. These findings indicate that cell growth is an important but not the only factor modulating translational accuracy and speed by means of CUB.
Collapse
|
32
|
|
33
|
Becq J, Churlaud C, Deschavanne P. A benchmark of parametric methods for horizontal transfers detection. PLoS One 2010; 5:e9989. [PMID: 20376325 PMCID: PMC2848678 DOI: 10.1371/journal.pone.0009989] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2009] [Accepted: 03/10/2010] [Indexed: 11/23/2022] Open
Abstract
Horizontal gene transfer (HGT) has appeared to be of importance for prokaryotic species evolution. As a consequence numerous parametric methods, using only the information embedded in the genomes, have been designed to detect HGTs. Numerous reports of incongruencies in results of the different methods applied to the same genomes were published. The use of artificial genomes in which all HGT parameters are controlled allows testing different methods in the same conditions. The results of this benchmark concerning 16 representative parametric methods showed a great variety of efficiencies. Some methods work very poorly whatever the type of HGTs and some depend on the conditions or on the metrics used. The best methods in terms of total errors were those using tetranucleotides as criterion for the window methods or those using codon usage for gene based methods and the Kullback-Leibler divergence metric. Window methods are very sensitive but less specific and detect badly lone isolated gene. On the other hand gene based methods are often very specific but lack of sensitivity. We propose using two methods in combination to get the best of each category, a gene based one for specificity and a window based one for sensitivity.
Collapse
Affiliation(s)
- Jennifer Becq
- Dynamique des Structures et Interactions des Macromolécules Biologiques, Institut National de la Santé et de la Recherche Médicale (INSERM) UMR-S 665, Université Paris Diderot, Institut National de la Transfusion Sanguine, Paris, France
| | - Cécile Churlaud
- Molécules Thérapeutiques in silico, Institut National de la Santé et de la Recherche Médicale (INSERM) UMR-S 973, Université Paris Diderot, Paris, France
| | - Patrick Deschavanne
- Molécules Thérapeutiques in silico, Institut National de la Santé et de la Recherche Médicale (INSERM) UMR-S 973, Université Paris Diderot, Paris, France
- * E-mail:
| |
Collapse
|
34
|
Mallet LV, Becq J, Deschavanne P. Whole genome evaluation of horizontal transfers in the pathogenic fungus Aspergillus fumigatus. BMC Genomics 2010; 11:171. [PMID: 20226043 PMCID: PMC2848249 DOI: 10.1186/1471-2164-11-171] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2010] [Accepted: 03/12/2010] [Indexed: 12/14/2022] Open
Abstract
Background Numerous cases of horizontal transfers (HTs) have been described for eukaryote genomes, but in contrast to prokaryote genomes, no whole genome evaluation of HTs has been carried out. This is mainly due to a lack of parametric methods specially designed to take the intrinsic heterogeneity of eukaryote genomes into account. We applied a simple and tested method based on local variations of genomic signatures to analyze the genome of the pathogenic fungus Aspergillus fumigatus. Results We detected 189 atypical regions containing 214 genes, accounting for about 1 Mb of DNA sequences. However, the fraction of atypical DNA detected was smaller than the average amount detected in the same conditions in prokaryote genomes (3.1% vs 5.6%). It appeared that about one third of these regions contained no annotated genes, a proportion far greater than in prokaryote genomes. When analyzing the origin of these HTs by comparing their signatures to a home made database of species signatures, 3 groups of donor species emerged: bacteria (40%), fungi (25%), and viruses (22%). It is to be noticed that though inter-domain exchanges are confirmed, we only put in evidence very few exchanges between eukaryotic kingdoms. Conclusions In conclusion, we demonstrated that HTs are not negligible in eukaryote genomes, bearing in mind that in our stringent conditions this amount is a floor value, though of a lesser extent than in prokaryote genomes. The biological mechanisms underlying those transfers remain to be elucidated as well as the biological functions of the transferred genes.
Collapse
Affiliation(s)
- Ludovic V Mallet
- Molécules thérapeutiques in silico (MTI), INSERM UMR-M 973, Université Paris Diderot-Paris 7, Bât Lamarck, 35 rue Hélène Brion, 75205 Paris Cedex 13, France
| | | | | |
Collapse
|
35
|
Gao J, Chen LL. Theoretical methods for identifying important functional genes in bacterial genomes. Res Microbiol 2009; 161:1-8. [PMID: 19900539 DOI: 10.1016/j.resmic.2009.10.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2009] [Revised: 10/05/2009] [Accepted: 10/21/2009] [Indexed: 12/30/2022]
Abstract
Some functional genes, such as essential genes, highly expressed genes and horizontally transferred genes, play important roles in the survival and pathogenicity of bacteria. This review attempts to summarize current computational methods in identifying the above functional genes from bacterial genomes, which is of significant importance in exploring the bacterial genomes.
Collapse
Affiliation(s)
- Junxiang Gao
- School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, PR China
| | | |
Collapse
|
36
|
Barbe V, Cruveiller S, Kunst F, Lenoble P, Meurice G, Sekowska A, Vallenet D, Wang T, Moszer I, Médigue C, Danchin A. From a consortium sequence to a unified sequence: the Bacillus subtilis 168 reference genome a decade later. MICROBIOLOGY (READING, ENGLAND) 2009; 155:1758-1775. [PMID: 19383706 PMCID: PMC2885750 DOI: 10.1099/mic.0.027839-0] [Citation(s) in RCA: 257] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/26/2009] [Revised: 02/25/2009] [Accepted: 02/25/2009] [Indexed: 11/18/2022]
Abstract
Comparative genomics is the cornerstone of identification of gene functions. The immense number of living organisms precludes experimental identification of functions except in a handful of model organisms. The bacterial domain is split into large branches, among which the Firmicutes occupy a considerable space. Bacillus subtilis has been the model of Firmicutes for decades and its genome has been a reference for more than 10 years. Sequencing the genome involved more than 30 laboratories, with different expertises, in a attempt to make the most of the experimental information that could be associated with the sequence. This had the expected drawback that the sequencing expertise was quite varied among the groups involved, especially at a time when sequencing genomes was extremely hard work. The recent development of very efficient, fast and accurate sequencing techniques, in parallel with the development of high-level annotation platforms, motivated the present resequencing work. The updated sequence has been reannotated in agreement with the UniProt protein knowledge base, keeping in perspective the split between the paleome (genes necessary for sustaining and perpetuating life) and the cenome (genes required for occupation of a niche, suggesting here that B. subtilis is an epiphyte). This should permit investigators to make reliable inferences to prepare validation experiments in a variety of domains of bacterial growth and development as well as build up accurate phylogenies.
Collapse
Affiliation(s)
- Valérie Barbe
- CEA, Institut de Génomique, Génoscope, 2 rue Gaston Crémieux, 91057 Évry, France
| | - Stéphane Cruveiller
- CEA, Institut de Génomique, Laboratoire de Génomique Comparative/CNRS UMR8030, Génoscope, 2 rue Gaston Crémieux, 91057 Évry, France
| | - Frank Kunst
- CEA, Institut de Génomique, Génoscope, 2 rue Gaston Crémieux, 91057 Évry, France
| | - Patricia Lenoble
- CEA, Institut de Génomique, Génoscope, 2 rue Gaston Crémieux, 91057 Évry, France
| | - Guillaume Meurice
- Institut Pasteur, Intégration et Analyse Génomiques, 28 rue du Docteur Roux, 75724 Paris Cedex 15, France
| | - Agnieszka Sekowska
- Institut Pasteur, Génétique des Génomes Bactériens/CNRS URA2171, 28 rue du Docteur Roux, 75724 Paris Cedex 15, France
| | - David Vallenet
- CEA, Institut de Génomique, Laboratoire de Génomique Comparative/CNRS UMR8030, Génoscope, 2 rue Gaston Crémieux, 91057 Évry, France
| | - Tingzhang Wang
- Institut Pasteur, Génétique des Génomes Bactériens/CNRS URA2171, 28 rue du Docteur Roux, 75724 Paris Cedex 15, France
| | - Ivan Moszer
- Institut Pasteur, Intégration et Analyse Génomiques, 28 rue du Docteur Roux, 75724 Paris Cedex 15, France
| | - Claudine Médigue
- CEA, Institut de Génomique, Laboratoire de Génomique Comparative/CNRS UMR8030, Génoscope, 2 rue Gaston Crémieux, 91057 Évry, France
| | - Antoine Danchin
- Institut Pasteur, Génétique des Génomes Bactériens/CNRS URA2171, 28 rue du Docteur Roux, 75724 Paris Cedex 15, France
| |
Collapse
|
37
|
Merkl R, Wiezer A. GO4genome: a prokaryotic phylogeny based on genome organization. J Mol Evol 2009; 68:550-62. [PMID: 19436929 PMCID: PMC3085772 DOI: 10.1007/s00239-009-9233-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2008] [Revised: 03/10/2009] [Accepted: 04/03/2009] [Indexed: 11/24/2022]
Abstract
Determining the phylogeny of closely related prokaryotes may fail in an analysis of rRNA or a small set of sequences. Whole-genome phylogeny utilizes the maximally available sample space. For a precise determination of genome similarity, two aspects have to be considered when developing an algorithm of whole-genome phylogeny: (1) gene order conservation is a more precise signal than gene content; and (2) when using sequence similarity, failures in identifying orthologues or the in situ replacement of genes via horizontal gene transfer may give misleading results. GO4genome is a new paradigm, which is based on a detailed analysis of gene function and the location of the respective genes. For characterization of genes, the algorithm uses gene ontology enabling a comparison of function independent of evolutionary relationship. After the identification of locally optimal series of gene functions, their length distribution is utilized to compute a phylogenetic distance. The outcome is a classification of genomes based on metabolic capabilities and their organization. Thus, the impact of effects on genome organization that are not covered by methods of molecular phylogeny can be studied. Genomes of strains belonging to Escherichia coli, Shigella, Streptococcus, Methanosarcina, and Yersinia were analyzed. Differences from the findings of classical methods are discussed.
Collapse
Affiliation(s)
- Rainer Merkl
- Institut für Biophysik und Physikalische Biochemie, Universität Regensburg, 93040, Regensburg, Germany.
| | | |
Collapse
|
38
|
Strittmatter AW, Liesegang H, Rabus R, Decker I, Amann J, Andres S, Henne A, Fricke WF, Martinez-Arias R, Bartels D, Goesmann A, Krause L, Pühler A, Klenk HP, Richter M, Schüler M, Glöckner FO, Meyerdierks A, Gottschalk G, Amann R. Genome sequence of Desulfobacterium autotrophicum HRM2, a marine sulfate reducer oxidizing organic carbon completely to carbon dioxide. Environ Microbiol 2009; 11:1038-55. [PMID: 19187283 PMCID: PMC2702500 DOI: 10.1111/j.1462-2920.2008.01825.x] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2008] [Accepted: 10/25/2008] [Indexed: 01/23/2023]
Abstract
Sulfate-reducing bacteria (SRB) belonging to the metabolically versatile Desulfobacteriaceae are abundant in marine sediments and contribute to the global carbon cycle by complete oxidation of organic compounds. Desulfobacterium autotrophicum HRM2 is the first member of this ecophysiologically important group with a now available genome sequence. With 5.6 megabasepairs (Mbp) the genome of Db. autotrophicum HRM2 is about 2 Mbp larger than the sequenced genomes of other sulfate reducers (SRB). A high number of genome plasticity elements (> 100 transposon-related genes), several regions of GC discontinuity and a high number of repetitive elements (132 paralogous genes Mbp(-1)) point to a different genome evolution when comparing with Desulfovibrio spp. The metabolic versatility of Db. autotrophicum HRM2 is reflected in the presence of genes for the degradation of a variety of organic compounds including long-chain fatty acids and for the Wood-Ljungdahl pathway, which enables the organism to completely oxidize acetyl-CoA to CO(2) but also to grow chemolithoautotrophically. The presence of more than 250 proteins of the sensory/regulatory protein families should enable Db. autotrophicum HRM2 to efficiently adapt to changing environmental conditions. Genes encoding periplasmic or cytoplasmic hydrogenases and formate dehydrogenases have been detected as well as genes for the transmembrane TpII-c(3), Hme and Rnf complexes. Genes for subunits A, B, C and D as well as for the proposed novel subunits L and F of the heterodisulfide reductases are present. This enzyme is involved in energy conservation in methanoarchaea and it is speculated that it exhibits a similar function in the process of dissimilatory sulfate reduction in Db. autotrophicum HRM2.
Collapse
Affiliation(s)
- Axel W Strittmatter
- Göttingen Genomics Laboratory, Georg-August-UniversityGrisebachstr. 8, D-37077 Göttingen, Germany
| | - Heiko Liesegang
- Göttingen Genomics Laboratory, Georg-August-UniversityGrisebachstr. 8, D-37077 Göttingen, Germany
| | - Ralf Rabus
- Max Planck Institute for Marine MicrobiologyCelsiusstr. 1, D-28359 Bremen, Germany
- Institute for Chemistry and Biology of the Marine Environment (ICBM), Carl von Ossietzky University OldenburgCarl-von-Ossietzky Str. 9-11, D-26111 Oldenburg, Germany
| | - Iwona Decker
- Göttingen Genomics Laboratory, Georg-August-UniversityGrisebachstr. 8, D-37077 Göttingen, Germany
| | - Judith Amann
- Max Planck Institute for Marine MicrobiologyCelsiusstr. 1, D-28359 Bremen, Germany
| | - Sönke Andres
- Göttingen Genomics Laboratory, Georg-August-UniversityGrisebachstr. 8, D-37077 Göttingen, Germany
| | - Anke Henne
- Göttingen Genomics Laboratory, Georg-August-UniversityGrisebachstr. 8, D-37077 Göttingen, Germany
| | - Wolfgang Florian Fricke
- Göttingen Genomics Laboratory, Georg-August-UniversityGrisebachstr. 8, D-37077 Göttingen, Germany
| | - Rosa Martinez-Arias
- Göttingen Genomics Laboratory, Georg-August-UniversityGrisebachstr. 8, D-37077 Göttingen, Germany
| | - Daniela Bartels
- Center for Biotechnology (CeBiTec), Bielefeld UniversityUniversitätsstr. 37, D-33615 Bielefeld, Germany
| | - Alexander Goesmann
- Center for Biotechnology (CeBiTec), Bielefeld UniversityUniversitätsstr. 37, D-33615 Bielefeld, Germany
| | - Lutz Krause
- Center for Biotechnology (CeBiTec), Bielefeld UniversityUniversitätsstr. 37, D-33615 Bielefeld, Germany
| | - Alfred Pühler
- Lehrstuhl für Genetik, Fakultät für Biologie, Universität BielefeldD-33594 Bielefeld, Germany
| | - Hans-Peter Klenk
- DSMZ – Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbHInhoffenstraße 7 B, D-38124 Braunschweig, Germany
| | - Michael Richter
- Max Planck Institute for Marine MicrobiologyCelsiusstr. 1, D-28359 Bremen, Germany
| | - Margarete Schüler
- Max Planck Institute for Marine MicrobiologyCelsiusstr. 1, D-28359 Bremen, Germany
| | | | - Anke Meyerdierks
- Max Planck Institute for Marine MicrobiologyCelsiusstr. 1, D-28359 Bremen, Germany
| | - Gerhard Gottschalk
- Göttingen Genomics Laboratory, Georg-August-UniversityGrisebachstr. 8, D-37077 Göttingen, Germany
| | - Rudolf Amann
- Max Planck Institute for Marine MicrobiologyCelsiusstr. 1, D-28359 Bremen, Germany
| |
Collapse
|
39
|
Abstract
Advances in DNA sequencing technologies have promoted the use of genome information as a key component in most of biological studies. In the case of biomining microorganisms, partial and complete genome information has provided critical clues for unraveling their physiology. This information has also provided genetic material for the generation of functional and biodiversity directed markers. In this work, we present a compilation of the most relevant findings based on genomic analysis of the model organism Acidithiobacillus ferrooxidans ATCC23270 that were extended and compared to the recently sequenced genomes of Acithiobacillus thiooxidans and Acidithiobacillus caldus. The phylogenetic relatedness of these three microorganisms has permitted the identification of a shared genomic core that encodes the common metabolic and regulatory functions critical for survival and proliferation in extremely acidic environments. We also identified microorganism-specific genomic components that are predicted to be responsible for the metabolic speciation of these microorganisms. Finally, we evaluated the impact of lateral gene transfer on these genomes in order to determine the functional contribution of this phenomenon to the fitness of these microbial representatives. The information gathered by genomic analyses in the Acidithiobacillus genus will be presented in conjunction with other biomining genomic and metagenomic information in order to generate a more comprehensive picture of the biodiversity, metabolism and ecophysiology of the bioleaching niche.
Collapse
|
40
|
Langille MGI, Brinkman FSL. Bioinformatic detection of horizontally transferred DNA in bacterial genomes. F1000 BIOLOGY REPORTS 2009; 1:25. [PMID: 20948661 PMCID: PMC2920674 DOI: 10.3410/b1-25] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
We highlight a selection of recent research on computational methods and associated challenges surrounding the prediction of bacterial horizontal gene transfer. This research area continues to face controversy, but is becoming more critical as the importance of horizontal gene transfer in medically and ecologically important prokaryotic evolution is further appreciated.
Collapse
Affiliation(s)
- Morgan G I Langille
- Department of Molecular Biology and Biochemistry, Simon Fraser University Burnaby, BC Canada V5A 1S6
| | | |
Collapse
|
41
|
Pavlović-Lazetić GM, Mitić NS, Beljanski MV. n-Gram characterization of genomic islands in bacterial genomes. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2009; 93:241-56. [PMID: 19101056 PMCID: PMC7185697 DOI: 10.1016/j.cmpb.2008.10.014] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2008] [Revised: 09/10/2008] [Accepted: 10/21/2008] [Indexed: 05/27/2023]
Abstract
The paper presents a novel, n-gram-based method for analysis of bacterial genome segments known as genomic islands (GIs). Identification of GIs in bacterial genomes is an important task since many of them represent inserts that may contribute to bacterial evolution and pathogenesis. In order to characterize and distinguish GIs from rest of the genome, binary classification of islands based on n-gram frequency distribution have been performed. It consists of testing the agreement of islands n-gram frequency distributions with the complete genome and backbone sequence. In addition, a statistic based on the maximal order Markov model is used to identify significantly overrepresented and underrepresented n-grams in islands. The results may be used as a basis for Zipf-like analysis suggesting that some of the n-grams are overrepresented in a subset of islands and underrepresented in the backbone, or vice versa, thus complementing the binary classification. The method is applied to strain-specific regions in the Escherichia coli O157:H7 EDL933 genome (O-islands), resulting in two groups of O-islands with different n-gram characteristics. It refines a characterization based on other compositional features such as G+C content and codon usage, and may help in identification of GIs, and also in research and development of adequate drugs targeting virulence genes in them.
Collapse
|
42
|
Mrazek J. Phylogenetic Signals in DNA Composition: Limitations and Prospects. Mol Biol Evol 2009; 26:1163-9. [DOI: 10.1093/molbev/msp032] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
|
43
|
Mitić NS, Pavlović-Lažetić GM, Beljanski MV. Could n-gram analysis contribute to genomic island determination? J Biomed Inform 2008; 41:936-43. [DOI: 10.1016/j.jbi.2008.03.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2007] [Revised: 03/13/2008] [Accepted: 03/13/2008] [Indexed: 11/28/2022]
|
44
|
Osorio H, Martínez V, Nieto PA, Holmes DS, Quatrini R. Microbial iron management mechanisms in extremely acidic environments: comparative genomics evidence for diversity and versatility. BMC Microbiol 2008; 8:203. [PMID: 19025650 PMCID: PMC2631029 DOI: 10.1186/1471-2180-8-203] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2008] [Accepted: 11/24/2008] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Iron is an essential nutrient but can be toxic at high intracellular concentrations and organisms have evolved tightly regulated mechanisms for iron uptake and homeostasis. Information on iron management mechanisms is available for organisms living at circumneutral pH. However, very little is known about how acidophilic bacteria, especially those used for industrial copper bioleaching, cope with environmental iron loads that can be 1018 times the concentration found in pH neutral environments. This study was motivated by the need to fill this lacuna in knowledge. An understanding of how microorganisms thrive in acidic ecosystems with high iron loads requires a comprehensive investigation of the strategies to acquire iron and to coordinate this acquisition with utilization, storage and oxidation of iron through metal responsive regulation. In silico prediction of iron management genes and Fur regulation was carried out for three Acidithiobacilli: Acidithiobacillus ferrooxidans (iron and sulfur oxidizer) A. thiooxidans and A. caldus (sulfur oxidizers) that can live between pH 1 and pH 5 and for three strict iron oxidizers of the Leptospirillum genus that live at pH 1 or below. RESULTS Acidithiobacilli have predicted FeoB-like Fe(II) and Nramp-like Fe(II)-Mn(II) transporters. They also have 14 different TonB dependent ferri-siderophore transporters of diverse siderophore affinity, although they do not produce classical siderophores. Instead they have predicted novel mechanisms for dicitrate synthesis and possibly also for phosphate-chelation mediated iron uptake. It is hypothesized that the unexpectedly large number and diversity of Fe(III)-uptake systems confers versatility to this group of acidophiles, especially in higher pH environments (pH 4-5) where soluble iron may not be abundant. In contrast, Leptospirilla have only a FtrI-Fet3P-like permease and three TonB dependent ferri-dicitrate siderophore systems. This paucity of iron uptake systems could reflect their obligatory occupation of extremely low pH environments where high concentrations of soluble iron may always be available and were oxidized sulfur species might not compromise iron speciation dynamics. Presence of bacterioferritin in the Acidithiobacilli, polyphosphate accumulation functions and variants of FieF-like diffusion facilitators in both Acidithiobacilli and Leptospirilla, indicate that they may remove or store iron under conditions of variable availability. In addition, the Fe(II)-oxidizing capacity of both A. ferrooxidans and Leptospirilla could itself be a way to evade iron stress imposed by readily available Fe(II) ions at low pH. Fur regulatory sites have been predicted for a number of gene clusters including iron related and non-iron related functions in both the Acidithiobacilli and Leptospirilla, laying the foundation for the future discovery of iron regulated and iron-phosphate coordinated regulatory control circuits. CONCLUSION In silico analyses of the genomes of acidophilic bacteria are beginning to tease apart the mechanisms that mediate iron uptake and homeostasis in low pH environments. Initial models pinpoint significant differences in abundance and diversity of iron management mechanisms between Leptospirilla and Acidithiobacilli, and begin to reveal how these two groups respond to iron cycling and iron fluctuations in naturally acidic environments and in industrial operations. Niche partitions and ecological successions between acidophilic microorganisms may be partially explained by these observed differences. Models derived from these analyses pave the way for improved hypothesis testing and well directed experimental investigation. In addition, aspects of these models should challenge investigators to evaluate alternative iron management strategies in non-acidophilic model organisms.
Collapse
Affiliation(s)
- Héctor Osorio
- Center for Bioinformatics and Genome Biology, Fundación Ciencia para la Vida, MIFAB, Santiago, Chile
- Depto. de Ciencias Biologicas, Facultad de Ciencias de la Salud, Universidad Andres Bello, Santiago, Chile
| | - Verónica Martínez
- Center for Bioinformatics and Genome Biology, Fundación Ciencia para la Vida, MIFAB, Santiago, Chile
| | - Pamela A Nieto
- Center for Bioinformatics and Genome Biology, Fundación Ciencia para la Vida, MIFAB, Santiago, Chile
| | - David S Holmes
- Center for Bioinformatics and Genome Biology, Fundación Ciencia para la Vida, MIFAB, Santiago, Chile
- Depto. de Ciencias Biologicas, Facultad de Ciencias de la Salud, Universidad Andres Bello, Santiago, Chile
| | - Raquel Quatrini
- Center for Bioinformatics and Genome Biology, Fundación Ciencia para la Vida, MIFAB, Santiago, Chile
| |
Collapse
|
45
|
Langille MGI, Hsiao WWL, Brinkman FSL. Evaluation of genomic island predictors using a comparative genomics approach. BMC Bioinformatics 2008; 9:329. [PMID: 18680607 PMCID: PMC2518932 DOI: 10.1186/1471-2105-9-329] [Citation(s) in RCA: 200] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2008] [Accepted: 08/05/2008] [Indexed: 01/08/2023] Open
Abstract
Background Genomic islands (GIs) are clusters of genes in prokaryotic genomes of probable horizontal origin. GIs are disproportionately associated with microbial adaptations of medical or environmental interest. Recently, multiple programs for automated detection of GIs have been developed that utilize sequence composition characteristics, such as G+C ratio and dinucleotide bias. To robustly evaluate the accuracy of such methods, we propose that a dataset of GIs be constructed using criteria that are independent of sequence composition-based analysis approaches. Results We developed a comparative genomics approach (IslandPick) that identifies both very probable islands and non-island regions. The approach involves 1) flexible, automated selection of comparative genomes for each query genome, using a distance function that picks appropriate genomes for identification of GIs, 2) identification of regions unique to the query genome, compared with the chosen genomes (positive dataset) and 3) identification of regions conserved across all genomes (negative dataset). Using our constructed datasets, we investigated the accuracy of several sequence composition-based GI prediction tools. Conclusion Our results indicate that AlienHunter has the highest recall, but the lowest measured precision, while SIGI-HMM is the most precise method. SIGI-HMM and IslandPath/DIMOB have comparable overall highest accuracy. Our comparative genomics approach, IslandPick, was the most accurate, compared with a curated list of GIs, indicating that we have constructed suitable datasets. This represents the first evaluation, using diverse and, independent datasets that were not artificially constructed, of the accuracy of several sequence composition-based GI predictors. The caveats associated with this analysis and proposals for optimal island prediction are discussed.
Collapse
Affiliation(s)
- Morgan G I Langille
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada.
| | | | | |
Collapse
|
46
|
Chatterjee R, Chaudhuri K, Chaudhuri P. On detection and assessment of statistical significance of Genomic Islands. BMC Genomics 2008; 9:150. [PMID: 18380895 PMCID: PMC2362129 DOI: 10.1186/1471-2164-9-150] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2007] [Accepted: 04/01/2008] [Indexed: 12/05/2022] Open
Abstract
BACKGROUND Many of the available methods for detecting Genomic Islands (GIs) in prokaryotic genomes use markers such as transposons, proximal tRNAs, flanking repeats etc., or they use other supervised techniques requiring training datasets. Most of these methods are primarily based on the biases in GC content or codon and amino acid usage of the islands. However, these methods either do not use any formal statistical test of significance or use statistical tests for which the critical values and the P-values are not adequately justified. We propose a method, which is unsupervised in nature and uses Monte-Carlo statistical tests based on randomly selected segments of a chromosome. Such tests are supported by precise statistical distribution theory, and consequently, the resulting P-values are quite reliable for making the decision. RESULTS Our algorithm (named Design-Island, an acronym for Detection of Statistically Significant Genomic Island) runs in two phases. Some 'putative GIs' are identified in the first phase, and those are refined into smaller segments containing horizontally acquired genes in the refinement phase. This method is applied to Salmonella typhi CT18 genome leading to the discovery of several new pathogenicity, antibiotic resistance and metabolic islands that were missed by earlier methods. Many of these islands contain mobile genetic elements like phage-mediated genes, transposons, integrase and IS elements confirming their horizontal acquirement. CONCLUSION The proposed method is based on statistical tests supported by precise distribution theory and reliable P-values along with a technique for visualizing statistically significant islands. The performance of our method is better than many other well known methods in terms of their sensitivity and accuracy, and in terms of specificity, it is comparable to other methods.
Collapse
Affiliation(s)
- Raghunath Chatterjee
- Molecular & Human Genetics Division, Indian Institute of Chemical Biology, Jadavpur, Kolkata – 700 032, India
| | - Keya Chaudhuri
- Molecular & Human Genetics Division, Indian Institute of Chemical Biology, Jadavpur, Kolkata – 700 032, India
| | - Probal Chaudhuri
- Theoretical Statistics and Mathematics Unit, Indian Statistical Institute, 203, B.T. Road, Kolkata – 700 108, India
| |
Collapse
|
47
|
Abstract
Recent years have brought a tremendous increase in the amount of sequence data from various bacterial genome sequencing projects, an increase that is projected to accelerate over the next years. Comparative genomics of microbial strains has provided us with unprecedented information to describe a bacterial species and examine for microbial diversity. This has allowed us to define core genomes based on genes commonly present in all strains of a species or genus and to identify dispensable regions in the genome harboring genus-, species-, and even strain-specific genes. Nevertheless, the task of organizing and summarizing the data to extract the most informative features remains a challenging yet critical endeavor. Visualization is an effective way of structuring and presenting such information effectively, in a concise and eloquent fashion. The large-scale views unveil commonalities and differences between the genomes that may shed light on their evolutionary relationships and define characteristics that are typical of pathogenicity or other ecological adaptations. We describe GenomeViz, a tool for comparative visualization of bacterial genomes that allows the user to actively create, modify and query a genome plot in a visually compact, user-friendly, and interactive manner.
Collapse
|
48
|
The genome of Clostridium kluyveri, a strict anaerobe with unique metabolic features. Proc Natl Acad Sci U S A 2008; 105:2128-33. [PMID: 18218779 DOI: 10.1073/pnas.0711093105] [Citation(s) in RCA: 302] [Impact Index Per Article: 18.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Clostridium kluyveri is unique among the clostridia; it grows anaerobically on ethanol and acetate as sole energy sources. Fermentation products are butyrate, caproate, and H2. We report here the genome sequence of C. kluyveri, which revealed new insights into the metabolic capabilities of this well studied organism. A membrane-bound energy-converting NADH:ferredoxin oxidoreductase (RnfCDGEAB) and a cytoplasmic butyryl-CoA dehydrogenase complex (Bcd/EtfAB) coupling the reduction of crotonyl-CoA to butyryl-CoA with the reduction of ferredoxin represent a new energy-conserving module in anaerobes. The genes for NAD-dependent ethanol dehydrogenase and NAD(P)-dependent acetaldehyde dehydrogenase are located next to genes for microcompartment proteins, suggesting that the two enzymes, which are isolated together in a macromolecular complex, form a carboxysome-like structure. Unique for a strict anaerobe, C. kluyveri harbors three sets of genes predicted to encode for polyketide/nonribosomal peptide synthetase hybrides and one set for a nonribosomal peptide synthetase. The latter is predicted to catalyze the synthesis of a new siderophore, which is formed under iron-deficient growth conditions.
Collapse
|
49
|
Guy L. Identification and characterization of pathogenicity and other genomic islands using base composition analyses. Future Microbiol 2007; 1:309-16. [PMID: 17661643 DOI: 10.2217/17460913.1.3.309] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Pathogenicity islands (PAIs) are major factors contributing to the pathogenicity of bacteria and to their resistance to antibiotics. In general, genomic islands (GIs), of which PAIs are a subset, increase the fitness of their hosts by providing new functions. With the number of available whole genome sequences growing exponentially, in silico methods have been developed to detect putative PAIs and GIs within them. Compositional methods rely on G+C content differences, codon usage and oligonucleotide biases. Other methods detect the presence of functional elements such as tRNA and mobility genes. Future availability of fast, high-throughput, inexpensive genome sequencing emphasizes the need for user-friendly applications able to detect, characterize and analyze putative GIs and PAIs. It may uncover new aspects of pathogenicity and provide better understanding of the evolution of pathogenic bacteria. These methods will be highly requested when whole genome sequencing technologies will be used by physicians for personal diagnosis.
Collapse
Affiliation(s)
- Lionel Guy
- Département de Microbiologie Fondamentale, Faculté de Biologie et Médecine, Université de Lausanne, Switzerland.
| |
Collapse
|
50
|
Merkl R. Modelling the evolution of the archeal tryptophan synthase. BMC Evol Biol 2007; 7:59. [PMID: 17425797 PMCID: PMC1854888 DOI: 10.1186/1471-2148-7-59] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2007] [Accepted: 04/10/2007] [Indexed: 11/16/2022] Open
Abstract
Background Microorganisms and plants are able to produce tryptophan. Enzymes catalysing the last seven steps of tryptophan biosynthesis are encoded in the canonical trp operon. Among the trp genes are most frequently trpA and trpB, which code for the alpha and beta subunit of tryptophan synthase. In several prokaryotic genomes, two variants of trpB (named trpB1 or trpB2) occur in different combinations. The evolutionary history of these trpB genes is under debate. Results In order to study the evolution of trp genes, completely sequenced archeal and bacterial genomes containing trpB were analysed. Phylogenetic trees indicated that TrpB sequences constitute four distinct groups; their composition is in agreement with the location of respective genes. The first group consisted exclusively of trpB1 genes most of which belonged to trp operons. Groups two to four contained trpB2 genes. The largest group (trpB2_o) contained trpB2 genes all located outside of operons. Most of these genes originated from species possessing an operon-based trpB1 in addition. Groups three and four pertain to trpB2 genes of those genomes containing exclusively one or two trpB2 genes, but no trpB1. One group (trpB2_i) consisted of trpB2 genes located inside, the other (trpB2_a) of trpB2 genes located outside the trp operon. TrpA and TrpB form a heterodimer and cooperate biochemically. In order to characterise trpB variants and stages of TrpA/TrpB cooperation in silico, several approaches were combined. Phylogenetic trees were constructed for all trp genes; their structure was assessed via bootstrapping. Alternative models of trpB evolution were evaluated with parsimony arguments. The four groups of trpB variants were correlated with archeal speciation. Several stages of TrpA/TrpB cooperation were identified and trpB variants were characterised. Most plausibly, trpB2 represents the predecessor of the modern trpB gene, and trpB1 evolved in an ancestral bacterium. Conclusion In archeal genomes, several stages of trpB evolution, TrpA/TrpB cooperation, and operon formation can be observed. Thus, archeal trp genes may serve as a model system for studying the evolution of protein-protein interactions and operon formation.
Collapse
Affiliation(s)
- Rainer Merkl
- Institut für Biophysik und Physikalische Biochemie, Universität Regensburg, Regensburg, Germany.
| |
Collapse
|