1
|
Yu JF, Chen QL, Ren J, Yang YL, Wang JH, Sun X. Analysis of the multi-copied genes and the impact of the redundant protein coding sequences on gene annotation in prokaryotic genomes. J Theor Biol 2015; 376:8-14. [PMID: 25865522 DOI: 10.1016/j.jtbi.2015.04.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2014] [Revised: 03/09/2015] [Accepted: 04/01/2015] [Indexed: 10/23/2022]
Abstract
The important roles of duplicated genes in evolutional process have been recognized in bacteria, archaebacteria and eukaryotes, while there is very little study on the multi-copied protein coding genes that share sequence identity of 100%. In this paper, the multi-copied protein coding genes in a number of prokaryotic genomes are comprehensively analyzed firstly. The results show that 0-15.93% of the protein coding genes in each genome are multi-copied genes and 0-16.49% of the protein coding genes in each genome are highly similar with the sequence identity ≥ 80%. Function and COG (Clusters of Orthologous Groups of proteins) analysis shows that 64.64% of multi-copied genes concentrate on the function of transposase and 86.28% of the COG assigned multi-copied genes concentrate on the COG code of 'L'. Furthermore, the impact of redundant protein coding sequences on the gene prediction results is studied. The results show that the problem of protein coding sequence redundancies cannot be ignored and the consistency of the gene annotation results before and after excluding the redundant sequences is negatively related with the sequences redundancy degree of the protein coding sequences in the training set.
Collapse
Affiliation(s)
- Jia-Feng Yu
- Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China; State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China.
| | - Qing-Li Chen
- Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China; College of life science, Shandong Normal University, Jinan 250358, China
| | - Jing Ren
- Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Yan-Ling Yang
- School of Physics and Electronic Information, Dezhou University, Dezhou 253023, China
| | - Ji-Hua Wang
- Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China; School of Physics and Electronic Information, Dezhou University, Dezhou 253023, China
| | - Xiao Sun
- State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China
| |
Collapse
|
2
|
Yu JF, Dou XH, Wang HB, Sun X, Zhao HY, Wang JH. A Novel Cylindrical Representation for Characterizing Intrinsic Properties of Protein Sequences. J Chem Inf Model 2015; 55:1261-70. [PMID: 25945398 DOI: 10.1021/ci500577m] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The composition and sequence order of amino acid residues are the two most important characteristics to describe a protein sequence. Graphical representations facilitate visualization of biological sequences and produce biologically useful numerical descriptors. In this paper, we propose a novel cylindrical representation by placing the 20 amino acid residue types in a circle and sequence positions along the z axis. This representation allows visualization of the composition and sequence order of amino acids at the same time. Ten numerical descriptors and one weighted numerical descriptor have been developed to quantitatively describe intrinsic properties of protein sequences on the basis of the cylindrical model. Their applications to similarity/dissimilarity analysis of nine ND5 proteins indicated that these numerical descriptors are more effective than several classical numerical matrices. Thus, the cylindrical representation obtained here provides a new useful tool for visualizing and charactering protein sequences. An online server is available at http://biophy.dzu.edu.cn:8080/CNumD/input.jsp .
Collapse
Affiliation(s)
- Jia-Feng Yu
- †Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China.,‡State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China
| | - Xiang-Hua Dou
- †Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Hong-Bo Wang
- †Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China
| | - Xiao Sun
- ‡State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China
| | - Hui-Ying Zhao
- §Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, Queensland 4000, Australia
| | - Ji-Hua Wang
- †Shandong Provincial Key Laboratory of Functional Macromolecular Biophysics, Institute of Biophysics, Dezhou University, Dezhou 253023, China.,∥College of Physics and Electronic Information, Dezhou University, Dezhou 253023, China
| |
Collapse
|
3
|
Yu JF, Guo J, Liu QB, Hou Y, Xiao K, Chen QL, Wang JH, Sun X. A hybrid strategy for comprehensive annotation of the protein coding genes in prokaryotic genome. Genes Genomics 2015. [DOI: 10.1007/s13258-014-0263-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
4
|
-Biao Guo F, Lin Y, -Ling Chen L. Recognition of Protein-coding Genes Based on Z-curve Algorithms. Curr Genomics 2014; 15:95-103. [PMID: 24822027 PMCID: PMC4009845 DOI: 10.2174/1389202915999140328162724] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2013] [Revised: 11/19/2013] [Accepted: 11/20/2013] [Indexed: 01/18/2023] Open
Abstract
Recognition of protein-coding genes, a classical bioinformatics issue, is an absolutely needed step for annotating newly sequenced genomes. The Z-curve algorithm, as one of the most effective methods on this issue, has been successfully applied in annotating or re-annotating many genomes, including those of bacteria, archaea and viruses. Two Z-curve based ab initio gene-finding programs have been developed: ZCURVE (for bacteria and archaea) and ZCURVE_V (for viruses and phages). ZCURVE_C (for 57 bacteria) and Zfisher (for any bacterium) are web servers for re-annotation of bacterial and archaeal genomes. The above four tools can be used for genome annotation or re-annotation, either independently or combined with the other gene-finding programs. In addition to recognizing protein-coding genes and exons, Z-curve algorithms are also effective in recognizing promoters and translation start sites. Here, we summarize the applications of Z-curve algorithms in gene finding and genome annotation.
Collapse
Affiliation(s)
- Feng -Biao Guo
- Center of Bioinformatics and Key Laboratory for NeuroInformation of the Ministry of Education, University of Elec-tronic Science and Technology of China, Chengdu, 610054, China
| | - Yan Lin
- Department of Physics, Tianjin University, Tianjin 300072, China
| | - Ling -Ling Chen
- cCollege of Life Science and Technology, Huazhong Agricultural University, Wuhan, 430070, China
| |
Collapse
|
5
|
Guo FB, Xiong L, Teng JLL, Yuen KY, Lau SKP, Woo PCY. Re-annotation of protein-coding genes in 10 complete genomes of Neisseriaceae family by combining similarity-based and composition-based methods. DNA Res 2013; 20:273-86. [PMID: 23571676 PMCID: PMC3686433 DOI: 10.1093/dnares/dst009] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
In this paper, we performed a comprehensive re-annotation of protein-coding genes by a systematic method combining composition- and similarity-based approaches in 10 complete bacterial genomes of the family Neisseriaceae. First, 418 hypothetical genes were predicted as non-coding using the composition-based method and 413 were eliminated from the gene list. Both the scatter plot and cluster of orthologous groups (COG) fraction analyses supported the result. Second, from 20 to 400 hypothetical proteins were assigned with functions in each of the 10 strains based on the homology search. Among newly assigned functions, 397 are so detailed to have definite gene names. Third, 106 genes missed by the original annotations were picked up by an ab initio gene finder combined with similarity alignment. Transcriptional experiments validated the effectiveness of this method in Laribacter hongkongensis and Chromobacterium violaceum. Among the 106 newly found genes, some deserve particular interests. For example, 27 transposases were newly found in Neiserria meningitidis alpha14. In Neiserria gonorrhoeae NCCP11945, four new genes with putative functions and definite names (nusG, rpsN, rpmD and infA) were found and homologues of them usually are essential for survival in bacteria. The updated annotations for the 10 Neisseriaceae genomes provide a more accurate prediction of protein-coding genes and a more detailed functional information of hypothetical proteins. It will benefit research into the lifestyle, metabolism, environmental adaption and pathogenicity of the Neisseriaceae species. The re-annotation procedure could be used directly, or after the adaption of detailed methods, for checking annotations of any other bacterial or archaeal genomes.
Collapse
Affiliation(s)
- Feng-Biao Guo
- Department of Microbiology, The University of Hong Kong, Special Administrative Region, Hong Kong, People's Republic of China
| | | | | | | | | | | |
Collapse
|
6
|
Systematic characterization of hypothetical proteins in Synechocystis sp. PCC 6803 reveals proteins functionally relevant to stress responses. Gene 2012; 512:6-15. [PMID: 23063937 DOI: 10.1016/j.gene.2012.10.004] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2012] [Revised: 10/03/2012] [Accepted: 10/05/2012] [Indexed: 11/22/2022]
Abstract
We described here a global detection and functional inference of hypothetical proteins involved in stress response in Synechocystis sp. PCC 6803. In the study, we first applied an iTRAQ-LC-MS/MS based quantitative proteomics to the Synechocystis cells grown under five stress conditions. The analysis detected a total of 807 hypothetical proteins with high confidence. Among them, 480 were differentially regulated. We then applied a Weighted Gene Co-expression Network Analysis approach to construct transcriptional networks for Synechocystis under nutrient limitation and osmotic stress conditions using transcriptome datasets. The analysis showed that 305 and 467 coding genes of hypothetical proteins were functionally relevant to nutrient limitation and osmotic stress, respectively. A comparison of responsive hypothetical proteins to all stress conditions allowed identification of 22 hypothetical proteins commonly responsive to all stresses, suggesting they may be part of the core stress responses in Synechocystis. Finally, functional inference of these core stress responsive proteins using both sequence similarity and non-similarity approaches was conducted. The study provided new insights into the stress response networks in Synechocystis, and also demonstrated that a combination of experimental "OMICS" and bioinformatics methodologies could improve functional annotation for hypothetical proteins.
Collapse
|
7
|
Reddy JS, Kumar R, Watt JM, Lawrence ML, Burgess SC, Nanduri B. Transcriptome profile of a bovine respiratory disease pathogen: Mannheimia haemolytica PHL213. BMC Bioinformatics 2012; 13 Suppl 15:S4. [PMID: 23046475 PMCID: PMC3439734 DOI: 10.1186/1471-2105-13-s15-s4] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Background Computational methods for structural gene annotation have propelled gene discovery but face certain drawbacks with regards to prokaryotic genome annotation. Identification of transcriptional start sites, demarcating overlapping gene boundaries, and identifying regulatory elements such as small RNA are not accurate using these approaches. In this study, we re-visit the structural annotation of Mannheimia haemolytica PHL213, a bovine respiratory disease pathogen. M. haemolytica is one of the causative agents of bovine respiratory disease that results in about $3 billion annual losses to the cattle industry. We used RNA-Seq and analyzed the data using freely-available computational methods and resources. The aim was to identify previously unannotated regions of the genome using RNA-Seq based expression profile to complement the existing annotation of this pathogen. Results Using the Illumina Genome Analyzer, we generated 9,055,826 reads (average length ~76 bp) and aligned them to the reference genome using Bowtie. The transcribed regions were analyzed using SAMTOOLS and custom Perl scripts in conjunction with BLAST searches and available gene annotation information. The single nucleotide resolution map enabled the identification of 14 novel protein coding regions as well as 44 potential novel sRNA. The basal transcription profile revealed that 2,506 of the 2,837 annotated regions were expressed in vitro, at 95.25% coverage, representing all broad functional gene categories in the genome. The expression profile also helped identify 518 potential operon structures involving 1,086 co-expressed pairs. We also identified 11 proteins with mutated/alternate start codons. Conclusions The application of RNA-Seq based transcriptome profiling to structural gene annotation helped correct existing annotation errors and identify potential novel protein coding regions and sRNA. We used computational tools to predict regulatory elements such as promoters and terminators associated with the novel expressed regions for further characterization of these novel functional elements. Our study complements the existing structural annotation of Mannheimia haemolytica PHL213 based on experimental evidence. Given the role of sRNA in virulence gene regulation and stress response, potential novel sRNA described in this study can form the framework for future studies to determine the role of sRNA, if any, in M. haemolytica pathogenesis.
Collapse
Affiliation(s)
- Joseph S Reddy
- College of Veterinary Medicine, Mississippi State University, Mississippi State, MS 39762, USA
| | | | | | | | | | | |
Collapse
|
8
|
Kim H, Webster C, Roberts JKM, Kositsawat J, Hung LW, Terwilliger TC, Kim CY. Enhancement of crystallization with nucleotide ligands identified by dye-ligand affinity chromatography. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2012; 13:71-9. [PMID: 22286688 PMCID: PMC3375012 DOI: 10.1007/s10969-012-9124-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2011] [Accepted: 01/06/2012] [Indexed: 11/24/2022]
Abstract
Ligands interacting with Mycobacterium tuberculosis recombinant proteins were identified through use of the ability of Cibacron Blue F3GA dye to interact with nucleoside/nucleotide binding proteins, and the effects of these ligands on crystallization were examined. Co-crystallization with ligands enhanced crystallization and enabled X-ray diffraction data to be collected to a resolution of atleast 2.7 Å for 5 of 10 proteins tested. Additionally, clues about individual proteins’ functions were obtained from their interactions with each of a panel of ligands.
Collapse
Affiliation(s)
- Heungbok Kim
- Bioscience Division, Los Alamos National Laboratory, MS M888, Los Alamos, NM 87545 USA
| | - Cecelia Webster
- Department of Biochemistry, University of California, Riverside, CA 92521 USA
| | | | | | - Li-Wei Hung
- Physics Division, Los Alamos National Laboratory, MS D454, Los Alamos, NM 87545 USA
| | - Thomas C. Terwilliger
- Bioscience Division, Los Alamos National Laboratory, MS M888, Los Alamos, NM 87545 USA
| | - Chang-Yub Kim
- Bioscience Division, Los Alamos National Laboratory, MS M888, Los Alamos, NM 87545 USA
| |
Collapse
|
9
|
Du MZ, Guo FB, Chen YY. Gene re-annotation in genome of the extremophile Pyrobaculum aerophilum by using bioinformatics methods. J Biomol Struct Dyn 2012; 29:391-401. [PMID: 21875157 DOI: 10.1080/07391102.2011.10507393] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
In this paper, we re-annotated the genome of Pyrobaculum aerophilum str. IM2, particularly for hypothetical ORFs. The annotation process includes three parts. Firstly and most importantly, 23 new genes, which were missed in the original annotation, are found by combining similarity search and the ab initio gene finding approaches. Among these new genes, five have significant similarities with function-known genes and the rest have significant similarities with hypothetical ORFs contained in other genomes. Secondly, the coding potentials of the 1645 hypothetical ORFs are re-predicted by using 33 Z curve variables combined with Fisher linear discrimination method. With the accuracy being 99.68%, 25 originally annotated hypothetical ORFs are recognized as non-coding by our method. Thirdly, 80 hypothetical ORFs are assigned with potential functions by using similarity search with BLAST program. Re-annotation of the genome will benefit related researches on this hyperthermophilic crenarchaeon. Also, the re-annotation procedure could be taken as a reference for other archaeal genomes. Details of the revised annotation are freely available at http://cobi.uestc.edu.cn/resource/paero/
Collapse
Affiliation(s)
- Meng-Ze Du
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | | | | |
Collapse
|
10
|
Yu JF, Xiao K, Jiang DK, Guo J, Wang JH, Sun X. An integrative method for identifying the over-annotated protein-coding genes in microbial genomes. DNA Res 2011; 18:435-49. [PMID: 21903723 PMCID: PMC3223076 DOI: 10.1093/dnares/dsr030] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
The falsely annotated protein-coding genes have been deemed one of the major causes accounting for the annotating errors in public databases. Although many filtering approaches have been designed for the over-annotated protein-coding genes, some are questionable due to the resultant increase in false negative. Furthermore, there is no webserver or software specifically devised for the problem of over-annotation. In this study, we propose an integrative algorithm for detecting the over-annotated protein-coding genes in microorganisms. Overall, an average accuracy of 99.94% is achieved over 61 microbial genomes. The extremely high accuracy indicates that the presented algorithm is efficient to differentiate the protein-coding genes from the non-coding open reading frames. Abundant analyses show that the predicting results are reliable and the integrative algorithm is robust and convenient. Our analysis also indicates that the over-annotated protein-coding genes can cause the false positive of horizontal gene transfers detection. The webserver of the proposed algorithm can be freely accessible from www.cbi.seu.edu.cn/RPGM.
Collapse
Affiliation(s)
- Jia-Feng Yu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China.
| | | | | | | | | | | |
Collapse
|
11
|
Li M, Shen X, Yan J, Han H, Zheng B, Liu D, Cheng H, Zhao Y, Rao X, Wang C, Tang J, Hu F, Gao GF. GI-type T4SS-mediated horizontal transfer of the 89K pathogenicity island in epidemic Streptococcus suis serotype 2. Mol Microbiol 2011; 79:1670-83. [PMID: 21244532 PMCID: PMC3132442 DOI: 10.1111/j.1365-2958.2011.07553.x] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Pathogenicity islands (PAIs), a distinct type of genomic island (GI), play important roles in the rapid adaptation and increased virulence of pathogens. 89K is a newly identified PAI in epidemic Streptococcus suis isolates that are related to the two recent large-scale outbreaks of human infection in China. However, its mechanism of evolution and contribution to the epidemic spread of S. suis 2 remain unknown. In this study, the potential for mobilization of 89K was evaluated, and its putative transfer mechanism was investigated. We report that 89K can spontaneously excise to form an extrachromosomal circular product. The precise excision is mediated by an 89K-borne integrase through site-specific recombination, with help from an excisionase. The 89K excision intermediate acts as a substrate for lateral transfer to non-89K S. suis 2 recipients, where it reintegrates site-specifically into the target site. The conjugal transfer of 89K occurred via a GI type IV secretion system (T4SS) encoded in 89K, at a frequency of 10(-6) transconjugants per donor. This is the first demonstration of horizontal transfer of a Gram-positive PAI mediated by a GI-type T4SS. We propose that these genetic events are important in the emergence, pathogenesis and persistence of epidemic S. suis 2 strains.
Collapse
Affiliation(s)
- Ming Li
- CAS Key Laboratory of Pathogenic Microbiology and Immunology, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Re-annotation is an essential step in systems biology modeling of functional genomics data. PLoS One 2010; 5:e10642. [PMID: 20498845 PMCID: PMC2871057 DOI: 10.1371/journal.pone.0010642] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2009] [Accepted: 04/14/2010] [Indexed: 11/19/2022] Open
Abstract
One motivation of systems biology research is to understand gene functions and interactions from functional genomics data such as that derived from microarrays. Up-to-date structural and functional annotations of genes are an essential foundation of systems biology modeling. We propose that the first essential step in any systems biology modeling of functional genomics data, especially for species with recently sequenced genomes, is gene structural and functional re-annotation. To demonstrate the impact of such re-annotation, we structurally and functionally re-annotated a microarray developed, and previously used, as a tool for disease research. We quantified the impact of this re-annotation on the array based on the total numbers of structural- and functional-annotations, the Gene Annotation Quality (GAQ) score, and canonical pathway coverage. We next quantified the impact of re-annotation on systems biology modeling using a previously published experiment that used this microarray. We show that re-annotation improves the quantity and quality of structural- and functional-annotations, allows a more comprehensive Gene Ontology based modeling, and improves pathway coverage for both the whole array and a differentially expressed mRNA subset. Our results also demonstrate that re-annotation can result in a different knowledge outcome derived from previous published research findings. We propose that, because of this, re-annotation should be considered to be an essential first step for deriving value from functional genomics data.
Collapse
|
13
|
Luo C, Hu GQ, Zhu H. Genome reannotation of Escherichia coli CFT073 with new insights into virulence. BMC Genomics 2009; 10:552. [PMID: 19930606 PMCID: PMC2785843 DOI: 10.1186/1471-2164-10-552] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2009] [Accepted: 11/22/2009] [Indexed: 11/30/2022] Open
Abstract
Background As one of human pathogens, the genome of Uropathogenic Escherichia coli strain CFT073 was sequenced and published in 2002, which was significant in pathogenetic bacterial genomics research. However, the current RefSeq annotation of this pathogen is now outdated to some degree, due to missing or misannotation of some essential genes associated with its virulence. We carried out a systematic reannotation by combining automated annotation tools with manual efforts to provide a comprehensive understanding of virulence for the CFT073 genome. Results The reannotation excluded 608 coding sequences from the RefSeq annotation. Meanwhile, a total of 299 coding sequences were newly added, about one third of them are found in genomic island (GI) regions while more than one fifth of them are located in virulence related regions pathogenicity islands (PAIs). Furthermore, there are totally 341 genes were relocated with their translational initiation sites (TISs), which resulted in a high quality of gene start annotation. In addition, 94 pseudogenes annotated in RefSeq were thoroughly inspected and updated. The number of miscellaneous genes (sRNAs) has been updated from 6 in RefSeq to 46 in the reannotation. Based on the adjustment in the reannotation, subsequent analysis were conducted by both general and case studies on new virulence factors or new virulence-associated genes that are crucial during the urinary tract infections (UTIs) process, including invasion, colonization, nutrition uptaking and population density control. Furthermore, miscellaneous RNAs collected in the reannotation are believed to contribute to the virulence of strain CFT073. The reannotation including the nucleotide data, the original RefSeq annotation, and all reannotated results is freely available via http://mech.ctb.pku.edu.cn/CFT073/. Conclusion As a result, the reannotation presents a more comprehensive picture of mechanisms of uropathogenicity of UPEC strain CFT073. The new genes change the view of its uropathogenicity in many respects, particularly by new genes in GI regions and new virulence-associated factors. The reannotation thus functions as an important source by providing new information about genomic structure and organization, and gene function. Moreover, we expect that the detailed analysis will facilitate the studies for exploration of novel virulence mechanisms and help guide experimental design.
Collapse
Affiliation(s)
- Chengwei Luo
- State Key Laboratory for Turbulence and Complex Systems, and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing 100871, China
| | | | | |
Collapse
|
14
|
Gao J, Chen LL. Theoretical methods for identifying important functional genes in bacterial genomes. Res Microbiol 2009; 161:1-8. [PMID: 19900539 DOI: 10.1016/j.resmic.2009.10.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2009] [Revised: 10/05/2009] [Accepted: 10/21/2009] [Indexed: 12/30/2022]
Abstract
Some functional genes, such as essential genes, highly expressed genes and horizontally transferred genes, play important roles in the survival and pathogenicity of bacteria. This review attempts to summarize current computational methods in identifying the above functional genes from bacterial genomes, which is of significant importance in exploring the bacterial genomes.
Collapse
Affiliation(s)
- Junxiang Gao
- School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, PR China
| | | |
Collapse
|
15
|
Juhas M, Crook DW, Hood DW. Type IV secretion systems: tools of bacterial horizontal gene transfer and virulence. Cell Microbiol 2008; 10:2377-86. [PMID: 18549454 PMCID: PMC2688673 DOI: 10.1111/j.1462-5822.2008.01187.x] [Citation(s) in RCA: 178] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Type IV secretion systems (T4SSs) are multisubunit cell-envelope-spanning structures, ancestrally related to bacterial conjugation machines, which transfer proteins and nucleoprotein complexes across membranes. T4SSs mediate horizontal gene transfer, thus contributing to genome plasticity and the evolution of pathogens through dissemination of antibiotic resistance and virulence genes. Moreover, T4SSs are also used for the delivery of bacterial effector proteins across the bacterial membrane and the plasmatic membrane of eukaryotic host cell, thus contributing directly to pathogenicity. T4SSs are usually encoded by multiple genes organized into a single functional unit. Based on a number of features, the organization of genetic determinants, shared homologies and evolutionary relationships, T4SSs have been divided into several groups. Type F and P (type IVA) T4SSs resembling the archetypal VirB/VirD4 system of Agrobacterium tumefaciens are considered to be the paradigm of type IV secretion, while type I (type IVB) T4SSs are found in intracellular bacterial pathogens, Legionella pneumophila and Coxiella burnetii. Several novel T4SSs have been identified recently and their functions await investigation. The most recently described GI type T4SSs play a key role in the horizontal transfer of a wide variety of genomic islands derived from a broad spectrum of bacterial strains.
Collapse
Affiliation(s)
- Mario Juhas
- Clinical Microbiology and Infectious Diseases, Nuffield Department of Clinical Laboratory Sciences, University of Oxford, Oxford OX3 9DU, UK.
| | | | | |
Collapse
|