1
|
Guo X, Guo Y, Chen H, Liu X, He P, Li W, Zhang MQ, Dai Q. Systematic comparison of genome information processing and boundary recognition tools used for genomic island detection. Comput Biol Med 2023; 166:107550. [PMID: 37826950 DOI: 10.1016/j.compbiomed.2023.107550] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 09/12/2023] [Accepted: 09/28/2023] [Indexed: 10/14/2023]
Abstract
Genomic islands are fragments of foreign DNA that are found in bacterial and archaeal genomes, and are typically associated with symbiosis or pathogenesis. While numerous genomic island detection methods have been proposed, there has been limited evaluation of the efficiency of the genome information processing and boundary recognition tools. In this study, we conducted a review of the statistical methods involved in genomic signatures, host signature extraction, informative signature selection, divergence measures, and boundary detection steps in genomic island prediction. We compared the performances of these methods on simulated experiments using alien fragments obtained from both artificial and real genomes. Our results indicate that among the nine genomic signatures evaluated, genomic signature frequency and full probability performed the best. However, their performance declined when normalized to their expectations and variances, such as Z-score and composition vector. Based on our experiments of the E. coli genome, we found that the confidence intervals of the window variances achieved the best performance in the signature extraction of the host, with the best confidence interval being 1.5-2 times the standard error. Ordered kurtosis was most effective in selecting informative signatures from a single genome, without requiring prior knowledge from other datasets. Among the three divergence measures evaluated, the two-sample t-test was the most successful, and a non-overlapping window with a small eye window (size 2) was best suited for identifying compositionally distinct regions. Finally, the maximum of the Markovian Jensen-Shannon divergence score, in terms of GC-content bias, was found to make boundary detection faster while maintaining a similar error rate.
Collapse
Affiliation(s)
- Xiangting Guo
- Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Yichu Guo
- Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Hu Chen
- Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Xiaoqing Liu
- College of Sciences, Hangzhou Dianzi University, Hangzhou, 310018, China
| | - Pingan He
- Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Wenshu Li
- Zhejiang Sci-Tech University, Hangzhou, 310018, China
| | - Michael Q Zhang
- Center for Systems Biology, University of Texas at Dallas, Richardson, TX, 75080, USA; Center for Synthetic and Systems Biology, TNLIST, Tsinghua University, Beijing, 100084, China
| | - Qi Dai
- Zhejiang Sci-Tech University, Hangzhou, 310018, China; Center for Systems Biology, University of Texas at Dallas, Richardson, TX, 75080, USA.
| |
Collapse
|
2
|
de la Fuente R, Díaz-Villanueva W, Arnau V, Moya A. Genomic Signature in Evolutionary Biology: A Review. BIOLOGY 2023; 12:biology12020322. [PMID: 36829597 PMCID: PMC9953303 DOI: 10.3390/biology12020322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 02/11/2023] [Accepted: 02/13/2023] [Indexed: 02/19/2023]
Abstract
Organisms are unique physical entities in which information is stored and continuously processed. The digital nature of DNA sequences enables the construction of a dynamic information reservoir. However, the distinction between the hardware and software components in the information flow is crucial to identify the mechanisms generating specific genomic signatures. In this work, we perform a bibliometric analysis to identify the different purposes of looking for particular patterns in DNA sequences associated with a given phenotype. This study has enabled us to make a conceptual breakdown of the genomic signature and differentiate the leading applications. On the one hand, it refers to gene expression profiling associated with a biological function, which may be shared across taxa. This signature is the focus of study in precision medicine. On the other hand, it also refers to characteristic patterns in species-specific DNA sequences. This interpretation plays a key role in comparative genomics, identifying evolutionary relationships. Looking at the relevant studies in our bibliographic database, we highlight the main factors causing heterogeneities in genome composition and how they can be quantified. All these findings lead us to reformulate some questions relevant to evolutionary biology.
Collapse
Affiliation(s)
- Rebeca de la Fuente
- Institute of Integrative Systems Biology (I2Sysbio), University of Valencia and Spanish Research Council (CSIC), 46980 Valencia, Spain
- Correspondence:
| | - Wladimiro Díaz-Villanueva
- Institute of Integrative Systems Biology (I2Sysbio), University of Valencia and Spanish Research Council (CSIC), 46980 Valencia, Spain
| | - Vicente Arnau
- Institute of Integrative Systems Biology (I2Sysbio), University of Valencia and Spanish Research Council (CSIC), 46980 Valencia, Spain
| | - Andrés Moya
- Institute of Integrative Systems Biology (I2Sysbio), University of Valencia and Spanish Research Council (CSIC), 46980 Valencia, Spain
- Foundation for the Promotion of Sanitary and Biomedical Research of the Valencian Community (FISABIO), 46020 Valencia, Spain
- CIBER in Epidemiology and Public Health (CIBEResp), 28029 Madrid, Spain
| |
Collapse
|
3
|
Chakraborty J, Roy RP, Chatterjee R, Chaudhuri P. Performance assessment of genomic island prediction tools with an improved version of Design-Island. Comput Biol Chem 2022; 98:107698. [PMID: 35597186 DOI: 10.1016/j.compbiolchem.2022.107698] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 04/01/2022] [Accepted: 05/11/2022] [Indexed: 11/03/2022]
Abstract
Genomic Islands (GIs) play an important role in the evolution and adaptation of prokaryotes. The origin and extent of ecological diversity of prokaryotes can be analyzed by comparing GIs across closely or distantly related prokaryotes. Understanding the importance of GI and to study the bacterial evolution, several GI prediction tools have been generated. An unsupervised method, Design-Island, was developed to identify GIs using Monte-Carlo statistical test on randomly selected segments of a chromosome. Here, in the present study Design-Island was modified with the incorporation of majority voting, multiple hypothesis testing correction. The performance of the modified version, Design-Island-II was tested and compared with the existing GI prediction tools. The performance assessment and benchmarking of the GI prediction tools require experimentally validated dataset, which is lacking. So, different datasets, generated or taken from literature were utilized to compare the sensitivity (SN), specificity (SP), precision (PPV) and accuracy (AC) of Design-Island-II. It showed substantial enhancement in term of SN, SP, PPV and AC, and significantly reduced the computation time of the algorithm. The performance of Design-Island-II has also been compared with several GI prediction tools using curated dataset of putative horizontally transferred genes. Design-Island-II showed the highest sensitivity and F1 score, comparable specificity, precision and accuracy in comparison to the other available methods. IslandViewer4 and Islander outperformed all the available methods in terms of AC and PPV respectively. Our study suggested Design-Island-II, IslandViewer4 and GIHunter among the top performing GI prediction tools considering both sensitivity and specificity of the methods.
Collapse
Affiliation(s)
- Joyeeta Chakraborty
- Human Genetics Unit, Indian Statistical Institute, 203 B T Road, Kolkata 700 108, India.
| | - Rudra Prasad Roy
- Human Genetics Unit, Indian Statistical Institute, 203 B T Road, Kolkata 700 108, India.
| | - Raghunath Chatterjee
- Human Genetics Unit, Indian Statistical Institute, 203 B T Road, Kolkata 700 108, India.
| | - Probal Chaudhuri
- Theoretical Statistics and Mathematics Unit, Indian Statistical Institute, 203 B T Road, Kolkata 700 108, India.
| |
Collapse
|
4
|
Genomic Island Prediction via Chi-Square Test and Random Forest Algorithm. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:9969751. [PMID: 34122622 PMCID: PMC8169257 DOI: 10.1155/2021/9969751] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Accepted: 05/14/2021] [Indexed: 12/02/2022]
Abstract
Genomic islands are related to microbial adaptation and carry different genomic characteristics from the host. Therefore, many methods have been proposed to detect genomic islands from the rest of the genome by evaluating its sequence composition. Many sequence features have been proposed, but many of them have not been applied to the identification of genomic islands. In this paper, we present a scheme to predict genomic islands using the chi-square test and random forest algorithm. We extract seven kinds of sequence features and select the important features with the chi-square test. All the selected features are then input into the random forest to predict the genome islands. Three experiments and comparison show that the proposed method achieves the best performance. This understanding can be useful to design more powerful method for the genomic island prediction.
Collapse
|
5
|
Bertelli C, Tilley KE, Brinkman FSL. Microbial genomic island discovery, visualization and analysis. Brief Bioinform 2020; 20:1685-1698. [PMID: 29868902 PMCID: PMC6917214 DOI: 10.1093/bib/bby042] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Revised: 04/30/2018] [Indexed: 12/27/2022] Open
Abstract
Horizontal gene transfer (also called lateral gene transfer) is a major mechanism for microbial genome evolution, enabling rapid adaptation and survival in specific niches. Genomic islands (GIs), commonly defined as clusters of bacterial or archaeal genes of probable horizontal origin, are of particular medical, environmental and/or industrial interest, as they disproportionately encode virulence factors and some antimicrobial resistance genes and may harbor entire metabolic pathways that confer a specific adaptation (solvent resistance, symbiosis properties, etc). As large-scale analyses of microbial genomes increases, such as for genomic epidemiology investigations of infectious disease outbreaks in public health, there is increased appreciation of the need to accurately predict and track GIs. Over the past decade, numerous computational tools have been developed to tackle the challenges inherent in accurate GI prediction. We review here the main types of GI prediction methods and discuss their advantages and limitations for a routine analysis of microbial genomes in this era of rapid whole-genome sequencing. An assessment is provided of 20 GI prediction software methods that use sequence-composition bias to identify the GIs, using a reference GI data set from 104 genomes obtained using an independent comparative genomics approach. Finally, we present guidelines to assist researchers in effectively identifying these key genomic regions.
Collapse
Affiliation(s)
- Claire Bertelli
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
| | - Keith E Tilley
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
| | - Fiona S L Brinkman
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
| |
Collapse
|
6
|
Mageeney CM, Lau BY, Wagner JM, Hudson CM, Schoeniger JS, Krishnakumar R, Williams KP. New candidates for regulated gene integrity revealed through precise mapping of integrative genetic elements. Nucleic Acids Res 2020; 48:4052-4065. [PMID: 32182341 PMCID: PMC7192596 DOI: 10.1093/nar/gkaa156] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Revised: 02/26/2020] [Accepted: 02/28/2020] [Indexed: 12/12/2022] Open
Abstract
Integrative genetic elements (IGEs) are mobile multigene DNA units that integrate into and excise from host bacterial genomes. Each IGE usually targets a specific site within a conserved host gene, integrating in a manner that preserves target gene function. However, a small number of bacterial genes are known to be inactivated upon IGE integration and reactivated upon excision, regulating phenotypes of virulence, mutation rate, and terminal differentiation in multicellular bacteria. The list of regulated gene integrity (RGI) cases has been slow-growing because IGEs have been challenging to precisely and comprehensively locate in genomes. We present software (TIGER) that maps IGEs with unprecedented precision and without attB site bias. TIGER uses a comparative genomic, ping-pong BLAST approach, based on the principle that the IGE integration module (i.e. its int-attP region) is cohesive. The resultant IGEs from 2168 genomes, along with integrase phylogenetic analysis and gene inactivation tests, revealed 19 new cases of genes whose integrity is regulated by IGEs (including dut, eccCa1, gntT, hrpB, merA, ompN, prkA, tqsA, traG, yifB, yfaT and ynfE), as well as recovering previously known cases (in sigK, spsM, comK, mlrA and hlb genes). It also recovered known clades of site-promiscuous integrases and identified possible new ones.
Collapse
Affiliation(s)
- Catherine M Mageeney
- Sandia National Laboratories, Systems Biology Department, Livermore, CA 94551-0969, USA
| | - Britney Y Lau
- Sandia National Laboratories, Systems Biology Department, Livermore, CA 94551-0969, USA
| | - Julian M Wagner
- Sandia National Laboratories, Systems Biology Department, Livermore, CA 94551-0969, USA
| | - Corey M Hudson
- Sandia National Laboratories, Systems Biology Department, Livermore, CA 94551-0969, USA
| | - Joseph S Schoeniger
- Sandia National Laboratories, Systems Biology Department, Livermore, CA 94551-0969, USA
| | - Raga Krishnakumar
- Sandia National Laboratories, Systems Biology Department, Livermore, CA 94551-0969, USA
| | - Kelly P Williams
- Sandia National Laboratories, Systems Biology Department, Livermore, CA 94551-0969, USA
| |
Collapse
|
7
|
2SigFinder: the combined use of small-scale and large-scale statistical testing for genomic island detection from a single genome. BMC Bioinformatics 2020; 21:159. [PMID: 32349677 PMCID: PMC7191778 DOI: 10.1186/s12859-020-3501-2] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2020] [Accepted: 04/16/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genomic islands are associated with microbial adaptations, carrying genomic signatures different from the host. Some methods perform an overall test to identify genomic islands based on their local features. However, regions of different scales will display different genomic features. RESULTS We proposed here a novel method "2SigFinder ", the first combined use of small-scale and large-scale statistical testing for genomic island detection. The proposed method was tested by genomic island boundary detection and identification of genomic islands or functional features of real biological data. We also compared the proposed method with the comparative genomics and composition-based approaches. The results indicate that the proposed 2SigFinder is more efficient in identifying genomic islands. CONCLUSIONS From real biological data, 2SigFinder identified genomic islands from a single genome and reported robust results across different experiments, without annotated information of genomes or prior knowledge from other datasets. 2SigHunter identified 25 Pathogenicity, 1 tRNA, 2 Virulence and 2 Repeats from 27 Pathogenicity, 1 tRNA, 2 Virulence and 2 Repeats, and detected 101 Phage and 28 HEG out of 130 Phage and 36 HEGs in S. enterica Typhi CT18, which shows that it is more efficient in detecting functional features associated with GIs.
Collapse
|
8
|
Ribeiro CL, Conde D, Balmant KM, Dervinis C, Johnson MG, McGrath AP, Szewczyk P, Unda F, Finegan CA, Schmidt HW, Miles B, Drost DR, Novaes E, Gonzalez-Benecke CA, Peter GF, Burleigh JG, Martin TA, Mansfield SD, Chang G, Wickett NJ, Kirst M. The uncharacterized gene EVE contributes to vessel element dimensions in Populus. Proc Natl Acad Sci U S A 2020; 117:5059-5066. [PMID: 32041869 PMCID: PMC7060721 DOI: 10.1073/pnas.1912434117] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
The radiation of angiosperms led to the emergence of the vast majority of today's plant species and all our major food crops. Their extraordinary diversification occurred in conjunction with the evolution of a more efficient vascular system for the transport of water, composed of vessel elements. The physical dimensions of these water-conducting specialized cells have played a critical role in angiosperm evolution; they determine resistance to water flow, influence photosynthesis rate, and contribute to plant stature. However, the genetic factors that determine their dimensions are unclear. Here we show that a previously uncharacterized gene, ENLARGED VESSEL ELEMENT (EVE), contributes to the dimensions of vessel elements in Populus, impacting hydraulic conductivity. Our data suggest that EVE is localized in the plasma membrane and is involved in potassium uptake of differentiating xylem cells during vessel development. In plants, EVE first emerged in streptophyte algae, but expanded dramatically among vessel-containing angiosperms. The phylogeny, structure and composition of EVE indicates that it may have been involved in an ancient horizontal gene-transfer event.
Collapse
Affiliation(s)
- Cíntia L Ribeiro
- Plant Molecular and Cellular Biology Graduate Program, University of Florida, Gainesville, FL 32611
- School of Forest Resources and Conservation, University of Florida, Gainesville, FL 32611
| | - Daniel Conde
- School of Forest Resources and Conservation, University of Florida, Gainesville, FL 32611
| | - Kelly M Balmant
- School of Forest Resources and Conservation, University of Florida, Gainesville, FL 32611
| | - Christopher Dervinis
- School of Forest Resources and Conservation, University of Florida, Gainesville, FL 32611
| | | | - Aaron P McGrath
- Skaggs School of Pharmacy and Pharmaceutical Science, University of California San Diego, La Jolla, CA 92093
| | - Paul Szewczyk
- Skaggs School of Pharmacy and Pharmaceutical Science, University of California San Diego, La Jolla, CA 92093
| | - Faride Unda
- Department of Wood Science, Faculty of Forestry, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Christina A Finegan
- Plant Molecular and Cellular Biology Graduate Program, University of Florida, Gainesville, FL 32611
| | - Henry W Schmidt
- School of Forest Resources and Conservation, University of Florida, Gainesville, FL 32611
| | - Brianna Miles
- School of Forest Resources and Conservation, University of Florida, Gainesville, FL 32611
| | - Derek R Drost
- Plant Molecular and Cellular Biology Graduate Program, University of Florida, Gainesville, FL 32611
| | - Evandro Novaes
- School of Forest Resources and Conservation, University of Florida, Gainesville, FL 32611
| | | | - Gary F Peter
- Plant Molecular and Cellular Biology Graduate Program, University of Florida, Gainesville, FL 32611
- School of Forest Resources and Conservation, University of Florida, Gainesville, FL 32611
- Genetics Institute, University of Florida, Gainesville, FL 32611
| | - J Gordon Burleigh
- Plant Molecular and Cellular Biology Graduate Program, University of Florida, Gainesville, FL 32611
- Department of Biology, University of Florida, Gainesville, FL 32611
| | - Timothy A Martin
- School of Forest Resources and Conservation, University of Florida, Gainesville, FL 32611
| | - Shawn D Mansfield
- Department of Wood Science, Faculty of Forestry, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
| | - Geoffrey Chang
- Skaggs School of Pharmacy and Pharmaceutical Science, University of California San Diego, La Jolla, CA 92093
- Department of Pharmacology, School of Medicine, University of California San Diego, La Jolla, CA 92093
| | - Norman J Wickett
- Plant Science and Conservation, Chicago Botanic Garden, Glencoe, IL 60622
- Plant Biology and Conservation, Northwestern University, Evanston, IL 60208
| | - Matias Kirst
- Plant Molecular and Cellular Biology Graduate Program, University of Florida, Gainesville, FL 32611;
- School of Forest Resources and Conservation, University of Florida, Gainesville, FL 32611
- Genetics Institute, University of Florida, Gainesville, FL 32611
| |
Collapse
|
9
|
Seiler E, Trappe K, Renard BY. Where did you come from, where did you go: Refining metagenomic analysis tools for horizontal gene transfer characterisation. PLoS Comput Biol 2019; 15:e1007208. [PMID: 31335917 PMCID: PMC6677323 DOI: 10.1371/journal.pcbi.1007208] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Revised: 08/02/2019] [Accepted: 06/24/2019] [Indexed: 12/22/2022] Open
Abstract
Horizontal gene transfer (HGT) has changed the way we regard evolution. Instead of waiting for the next generation to establish new traits, especially bacteria are able to take a shortcut via HGT that enables them to pass on genes from one individual to another, even across species boundaries. The tool Daisy offers the first HGT detection approach based on read mapping that provides complementary evidence compared to existing methods. However, Daisy relies on the acceptor and donor organism involved in the HGT being known. We introduce DaisyGPS, a mapping-based pipeline that is able to identify acceptor and donor reference candidates of an HGT event based on sequencing reads. Acceptor and donor identification is akin to species identification in metagenomic samples based on sequencing reads, a problem addressed by metagenomic profiling tools. However, acceptor and donor references have certain properties such that these methods cannot be directly applied. DaisyGPS uses MicrobeGPS, a metagenomic profiling tool tailored towards estimating the genomic distance between organisms in the sample and the reference database. We enhance the underlying scoring system of MicrobeGPS to account for the sequence patterns in terms of mapping coverage of an acceptor and donor involved in an HGT event, and report a ranked list of reference candidates. These candidates can then be further evaluated by tools like Daisy to establish HGT regions. We successfully validated our approach on both simulated and real data, and show its benefits in an investigation of an outbreak involving Methicillin-resistant Staphylococcus aureus data.
Collapse
Affiliation(s)
- Enrico Seiler
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Efficient Algorithms for Omics Data, Max Planck Institute for Molecular Genetics, and Algorithmic Bioinformatics, Institute for Bioinformatics, Freie Universität Berlin, Berlin, Germany
| | - Kathrin Trappe
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Bernhard Y. Renard
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| |
Collapse
|
10
|
Dai Q, Bao C, Hai Y, Ma S, Zhou T, Wang C, Wang Y, Huo W, Liu X, Yao Y, Xuan Z, Chen M, Zhang MQ. MTGIpick allows robust identification of genomic islands from a single genome. Brief Bioinform 2019; 19:361-373. [PMID: 28025178 DOI: 10.1093/bib/bbw118] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Genomic islands (GIs) that are associated with microbial adaptations and carry sequence patterns different from that of the host are sporadically distributed among closely related species. This bias can dominate the signal of interest in GI detection. However, variations still exist among the segments of the host, although no uniform standard exists regarding the best methods of discriminating GIs from the rest of the genome in terms of compositional bias. In the present work, we proposed a robust software, MTGIpick, which used regions with pattern bias showing multiscale difference levels to identify GIs from the host. MTGIpick can identify GIs from a single genome without annotated information of genomes or prior knowledge from other data sets. When real biological data were used, MTGIpick demonstrated better performance than existing methods, as well as revealed potential GIs with accurate sizes missed by existing methods because of a uniform standard. Software and supplementary are freely available at http://bioinfo.zstu.edu.cn/MTGI or https://github.com/bioinfo0706/MTGIpick.
Collapse
Affiliation(s)
- Qi Dai
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China.,Department of Biological Sciences, Center for Systems Biology, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Chaohui Bao
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Yabing Hai
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Sheng Ma
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Tao Zhou
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Cong Wang
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Yunfei Wang
- Department of Biological Sciences, Center for Systems Biology, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Wenwen Huo
- Department of Biological Sciences, Center for Systems Biology, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Xiaoqing Liu
- College of Sciences, Hangzhou Dianzi University, Hangzhou 310018, China
| | - Yuhua Yao
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China
| | - Zhenyu Xuan
- Department of Biological Sciences, Center for Systems Biology, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Min Chen
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Michael Q Zhang
- Department of Biological Sciences, Center for Systems Biology, University of Texas at Dallas, Richardson, TX 75080, USA.,Division of Bioinformatics, Center for Synthetic and Systems Biology, TNLIST, Tsinghua University, Beijing, 100084, China
| |
Collapse
|
11
|
Tao J, Liu X, Yang S, Bao C, He P, Dai Q. An efficient genomic signature ranking method for genomic island prediction from a single genome. J Theor Biol 2019; 467:142-149. [PMID: 30768974 DOI: 10.1016/j.jtbi.2019.02.008] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2018] [Revised: 02/07/2019] [Accepted: 02/11/2019] [Indexed: 01/13/2023]
Abstract
Genomic islands that are associated with microbial adaptations and carry genomic signatures different from that of the host, and thus many methods have been proposed to select the informative genomic signatures from a range of organisms and discriminate genomic islands from the rest of the genome in terms of these signature biases. However, they are of limited use when closely related genomes are unavailable. In the present work, we proposed a kurtosis-based ranking method to select the informative genomic signatures from a single genome. In simulations with alien fragments from artificial and real genomes, the proposed kurtosis-based ranking method efficiently selected the informative genomic signatures from a single genome, without annotated information of genomes or prior knowledge from other datasets. This understanding can be useful to design more powerful method for genomic island detection.
Collapse
Affiliation(s)
- Jin Tao
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Xiaoqing Liu
- College of Sciences, Hangzhou Dianzi University, Hangzhou 310018, People's Republic of China
| | - Siqian Yang
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Chaohui Bao
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Pingan He
- College of Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China
| | - Qi Dai
- College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, People's Republic of China; Department of Molecular and Cell Biology, University of Texas at Dallas, Richardson, TX 75080, USA.
| |
Collapse
|
12
|
da Silva Filho AC, Raittz RT, Guizelini D, De Pierri CR, Augusto DW, Dos Santos-Weiss ICR, Marchaukoski JN. Comparative Analysis of Genomic Island Prediction Tools. Front Genet 2018; 9:619. [PMID: 30631340 PMCID: PMC6315130 DOI: 10.3389/fgene.2018.00619] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 11/23/2018] [Indexed: 12/11/2022] Open
Abstract
Tools for genomic island prediction use strategies for genomic comparison analysis and sequence composition analysis. The goal of comparative analysis is to identify unique regions in the genomes of related organisms, whereas sequence composition analysis evaluates and relates the composition of specific regions with other regions in the genome. The goal of this study was to qualitatively and quantitatively evaluate extant genomic island predictors. We chose tools reported to produce significant results using sequence composition prediction, comparative genomics, and hybrid genomics methods. To maintain diversity, the tools were applied to eight complete genomes of organisms with distinct characteristics and belonging to different families. Escherichia coli CFT073 was used as a control and considered as the gold standard because its islands were previously curated in vitro. The results of predictions with the gold standard were manually curated, and the content and characteristics of each predicted island were analyzed. For other organisms, we created GenBank (GBK) files using Artemis software for each predicted island. We copied only the amino acid sequences from the coding sequence and constructed a multi-FASTA file for each predictor. We used BLASTp to compare all results and generate hits to evaluate similarities and differences among the predictions. Comparison of the results with the gold standard revealed that GIPSy produced the best results, covering ~91% of the composition and regions of the islands, followed by Alien Hunter (81%), IslandViewer (47.8%), Predict Bias (31%), GI Hunter (17%), and Zisland Explorer (16%). The tools with the best results in the analyzes of the set of organisms were the same ones that presented better performance in the tests with the gold standard.
Collapse
Affiliation(s)
- Antonio Camilo da Silva Filho
- Department of Bioinformatics, Professional and Technical Education Sector, Federal University of Parana, Curitiba, Brazil
| | - Roberto Tadeu Raittz
- Department of Bioinformatics, Professional and Technical Education Sector, Federal University of Parana, Curitiba, Brazil
| | - Dieval Guizelini
- Department of Bioinformatics, Professional and Technical Education Sector, Federal University of Parana, Curitiba, Brazil
| | | | - Diônata Willian Augusto
- Department of Bioinformatics, Professional and Technical Education Sector, Federal University of Parana, Curitiba, Brazil
| | | | - Jeroniza Nunes Marchaukoski
- Department of Bioinformatics, Professional and Technical Education Sector, Federal University of Parana, Curitiba, Brazil
| |
Collapse
|
13
|
Clasen FJ, Pierneef RE, Slippers B, Reva O. EuGI: a novel resource for studying genomic islands to facilitate horizontal gene transfer detection in eukaryotes. BMC Genomics 2018; 19:323. [PMID: 29724163 PMCID: PMC5934851 DOI: 10.1186/s12864-018-4724-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2017] [Accepted: 04/25/2018] [Indexed: 11/17/2022] Open
Abstract
Background Genomic islands (GIs) are inserts of foreign DNA that have potentially arisen through horizontal gene transfer (HGT). There are evidences that GIs can contribute significantly to the evolution of prokaryotes. The acquisition of GIs through HGT in eukaryotes has, however, been largely unexplored. In this study, the previously developed GI prediction tool, SeqWord Gene Island Sniffer (SWGIS), is modified to predict GIs in eukaryotic chromosomes. Artificial simulations are used to estimate ratios of predicting false positive and false negative GIs by inserting GIs into different test chromosomes and performing the SWGIS v2.0 algorithm. Using SWGIS v2.0, GIs are then identified in 36 fungal, 22 protozoan and 8 invertebrate genomes. Results SWGIS v2.0 predicts GIs in large eukaryotic chromosomes based on the atypical nucleotide composition of these regions. Averages for predicting false negative and false positive GIs were 20.1% and 11.01% respectively. A total of 10,550 GIs were identified in 66 eukaryotic species with 5299 of these GIs coding for at least one functional protein. The EuGI web-resource, freely accessible at http://eugi.bi.up.ac.za, was developed that allows browsing the database created from identified GIs and genes within GIs through an interactive and visual interface. Conclusions SWGIS v2.0 along with the EuGI database, which houses GIs identified in 66 different eukaryotic species, and the EuGI web-resource, provide the first comprehensive resource for studying HGT in eukaryotes. Electronic supplementary material The online version of this article (10.1186/s12864-018-4724-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Frederick Johannes Clasen
- Centre for Bioinformatics and Computational Biology; Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria 0002, Private Bag X20, Hatfield, 0028, South Africa. .,Forestry and Agricultural Biotechnology Institute; Department of Biochemistry , Genetics and Microbiology, University of Pretoria, Pretoria, 0002, South Africa.
| | - Rian Ewald Pierneef
- Centre for Bioinformatics and Computational Biology; Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria 0002, Private Bag X20, Hatfield, 0028, South Africa
| | - Bernard Slippers
- Forestry and Agricultural Biotechnology Institute; Department of Biochemistry , Genetics and Microbiology, University of Pretoria, Pretoria, 0002, South Africa
| | - Oleg Reva
- Centre for Bioinformatics and Computational Biology; Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria 0002, Private Bag X20, Hatfield, 0028, South Africa
| |
Collapse
|
14
|
Dupont PY, Cox MP. Genomic Data Quality Impacts Automated Detection of Lateral Gene Transfer in Fungi. G3 (BETHESDA, MD.) 2017; 7:1301-1314. [PMID: 28235827 PMCID: PMC5386878 DOI: 10.1534/g3.116.038448] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/13/2016] [Accepted: 02/17/2017] [Indexed: 12/26/2022]
Abstract
Lateral gene transfer (LGT, also known as horizontal gene transfer), an atypical mechanism of transferring genes between species, has almost become the default explanation for genes that display an unexpected composition or phylogeny. Numerous methods of detecting LGT events all rely on two fundamental strategies: primary structure composition or gene tree/species tree comparisons. Discouragingly, the results of these different approaches rarely coincide. With the wealth of genome data now available, detection of laterally transferred genes is increasingly being attempted in large uncurated eukaryotic datasets. However, detection methods depend greatly on the quality of the underlying genomic data, which are typically complex for eukaryotes. Furthermore, given the automated nature of genomic data collection, it is typically impractical to manually verify all protein or gene models, orthology predictions, and multiple sequence alignments, requiring researchers to accept a substantial margin of error in their datasets. Using a test case comprising plant-associated genomes across the fungal kingdom, this study reveals that composition- and phylogeny-based methods have little statistical power to detect laterally transferred genes. In particular, phylogenetic methods reveal extreme levels of topological variation in fungal gene trees, the vast majority of which show departures from the canonical species tree. Therefore, it is inherently challenging to detect LGT events in typical eukaryotic genomes. This finding is in striking contrast to the large number of claims for laterally transferred genes in eukaryotic species that routinely appear in the literature, and questions how many of these proposed examples are statistically well supported.
Collapse
Affiliation(s)
- Pierre-Yves Dupont
- Statistics and Bioinformatics Group, Institute of Fundamental Sciences, Massey University, Palmerston North 4442, New Zealand
- the Bio-Protection Research Centre, Massey University, Palmerston North 4442, New Zealand
| | - Murray P Cox
- Statistics and Bioinformatics Group, Institute of Fundamental Sciences, Massey University, Palmerston North 4442, New Zealand
- the Bio-Protection Research Centre, Massey University, Palmerston North 4442, New Zealand
| |
Collapse
|
15
|
Trappe K, Marschall T, Renard BY. Detecting horizontal gene transfer by mapping sequencing reads across species boundaries. Bioinformatics 2016; 32:i595-i604. [DOI: 10.1093/bioinformatics/btw423] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
|