Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Robison K, Gilbert W, Church GM. Large scale bacterial gene discovery by similarity search. Nat Genet 1994;7:205-14. [PMID: 7920643 DOI: 10.1038/ng0694-205] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]

Number

Cited by Other Article(s)

Proteogenomic Analysis Provides Novel Insight into Genome Annotation and Nitrogen Metabolism in Nostoc sp. PCC 7120. Microbiol Spectr 2021;9:e0049021. [PMID: 34523988 PMCID: PMC8557916 DOI: 10.1128/spectrum.00490-21] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open

Abstract

Cyanobacteria, capable of oxygenic photosynthesis, play a vital role in nitrogen and carbon cycles. Nostoc sp. PCC 7120 (Nostoc 7120) is a model cyanobacterium commonly used to study cell differentiation and nitrogen metabolism. Although its genome was released in 2002, a high-quality genome annotation remains unavailable for this model cyanobacterium. Therefore, in this study, we performed an in-depth proteogenomic analysis based on high-resolution mass spectrometry (MS) data to refine the genome annotation of Nostoc 7120. We unambiguously identified 5,519 predicted protein-coding genes and revealed 26 novel genes, 75 revised genes, and 27 different kinds of posttranslational modifications in Nostoc 7120. A subset of these novel proteins were further validated at both the mRNA and peptide levels. Functional analysis suggested that many newly annotated proteins may participate in nitrogen or cadmium/mercury metabolism in Nostoc 7120. Moreover, we constructed an updated Nostoc 7120 database based on our proteogenomic results and presented examples of how the updated database could be used to improve the annotation of proteomic data. Our study provides the most comprehensive annotation of the Nostoc 7120 genome thus far and will serve as a valuable resource for the study of nitrogen metabolism in Nostoc 7120.

IMPORTANCE Cyanobacteria are a large group of prokaryotes capable of oxygenic photosynthesis and play a vital role in nitrogen and carbon cycles on Earth. Nostoc 7120 is a commonly used model cyanobacterium for studying cell differentiation and nitrogen metabolism. In this study, we presented the first comprehensive draft map of the Nostoc 7120 proteome and a wide range of posttranslational modifications. In addition, we constructed an updated database of Nostoc 7120 based on our proteogenomic results and presented examples of how the updated database could be used for system-level studies of Nostoc 7120. Our study provides the most comprehensive annotation of Nostoc 7120 genome and a valuable resource for the study of nitrogen metabolism in this model cyanobacterium.

Collapse

Fickett JW. The gene identification problem: an overview for developers. ACTA ACUST UNITED AC 2006;20:103-18. [PMID: 16749184 DOI: 10.1016/s0097-8485(96)80012-x] [Citation(s) in RCA: 93] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Claesson MJ, van Sinderen D. BlastXtract--a new way of exploring translated searches. Bioinformatics 2005;21:3667-8. [PMID: 16046492 DOI: 10.1093/bioinformatics/bti598] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Genomic data modeling. INFORM SYST 2003. [DOI: 10.1016/s0306-4379(02)00071-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Shibuya T, Rigoutsos I. Dictionary-driven prokaryotic gene finding. Nucleic Acids Res 2002;30:2710-25. [PMID: 12060689 PMCID: PMC117281 DOI: 10.1093/nar/gkf338] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Bocs S, Danchin A, Médigue C. Re-annotation of genome microbial coding-sequences: finding new genes and inaccurately annotated genes. BMC Bioinformatics 2002;3:5. [PMID: 11879526 PMCID: PMC77393 DOI: 10.1186/1471-2105-3-5] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2001] [Accepted: 02/05/2002] [Indexed: 11/21/2022] Open

Abstract

BACKGROUND

Analysis of any newly sequenced bacterial genome starts with the identification of protein-coding genes. Despite the accumulation of multiple complete genome sequences, which provide useful comparisons with close relatives among other organisms during the annotation process, accurate gene prediction remains quite difficult. A major reason for this situation is that genes are tightly packed in prokaryotes, resulting in frequent overlap. Thus, detection of translation initiation sites and/or selection of the correct coding regions remain difficult unless appropriate biological knowledge (about the structure of a gene) is imbedded in the approach.

RESULTS

We have developed a new program that automatically identifies biologically significant candidate genes in a bacterial genome. Twenty-six complete prokaryotic genomes were analyzed using this tool, and the accuracy of gene finding was assessed by comparison with existing annotations. This analysis revealed that, despite the enormous effort of genome program annotators, a small but not negligible number of genes annotated within the framework of sequencing projects are likely to be partially inaccurate or plainly wrong. Moreover, the analysis of several putative new genes shows that, as expected, many short genes have escaped annotation. In most cases, these new genes revealed frameshifts that could be either artifacts or genuine frameshifts. Some entirely unexpected new genes have also been identified. This allowed us to get a more complete picture of prokaryotic genomes. The results of this procedure are progressively integrated into the SWISS-PROT reference databank.

CONCLUSIONS

The results described in the present study show that our procedure is very satisfactory in terms of gene finding accuracy. Except in few cases, discrepancies between our results and annotations provided by individual authors can be accounted for by the nature of each annotation process or by specific characteristics of some genomes. This stresses that close cooperation between scientists, regular update and curation of the findings in databases are clearly required to reduce the level of errors in genome annotation (and hence in reducing the unfortunate spreading of errors through centralized data libraries).

Collapse

Raghavan S, Ouzounis CA. Novel coding regions in four complete archaeal genomes. Nucleic Acids Res 1999;27:4405-8. [PMID: 10536149 PMCID: PMC148723 DOI: 10.1093/nar/27.22.4405] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Hayes WS, Borodovsky M. How to interpret an anonymous bacterial genome: machine learning approach to gene identification. Genome Res 1998;8:1154-71. [PMID: 9847079 DOI: 10.1101/gr.8.11.1154] [Citation(s) in RCA: 92] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Fickett JW. Predictive methods using nucleotide sequences. METHODS OF BIOCHEMICAL ANALYSIS 1998;39:231-45. [PMID: 9707933 DOI: 10.1002/9780470110607.ch10] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]

Frishman D, Mironov A, Mewes HW, Gelfand M. Combining diverse evidence for gene recognition in completely sequenced bacterial genomes. Nucleic Acids Res 1998;26:2941-7. [PMID: 9611239 PMCID: PMC147632 DOI: 10.1093/nar/26.12.2941] [Citation(s) in RCA: 141] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

cellulomonas sp. Purine Nucleoside Phosphorylase (PNP). ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 1998. [DOI: 10.1007/978-1-4615-5381-6_51] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register]

Smith DR, Richterich P, Rubenfield M, Rice PW, Butler C, Lee HM, Kirst S, Gundersen K, Abendschan K, Xu Q, Chung M, Deloughery C, Aldredge T, Maher J, Lundstrom R, Tulig C, Falls K, Imrich J, Torrey D, Engelstein M, Breton G, Madan D, Nietupski R, Seitz B, Connelly S, McDougall S, Safer H, Gibson R, Doucette-Stamm L, Eiglmeier K, Bergh S, Cole ST, Robison K, Richterich L, Johnson J, Church GM, Mao JI. Multiplex sequencing of 1.5 Mb of the Mycobacterium leprae genome. Genome Res 1997;7:802-19. [PMID: 9267804 DOI: 10.1101/gr.7.8.802] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]

Bhatia U, Robison K, Gilbert W. Dealing with database explosion: a cautionary note. Science 1997;276:1724-5. [PMID: 9206831 DOI: 10.1126/science.276.5319.1724] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]

Nölling J, Reeve JN. Growth- and substrate-dependent transcription of the formate dehydrogenase (fdhCAB) operon in Methanobacterium thermoformicicum Z-245. J Bacteriol 1997;179:899-908. [PMID: 9006048 PMCID: PMC178775 DOI: 10.1128/jb.179.3.899-908.1997] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open

Abstract

The formate dehydrogenase-encoding fdhCAB operon and flanking genes have been cloned and sequenced from Methanobacterium thermoformicicum Z-245. fdh transcription was shown to be initiated 21 bp upstream from fdhC, although most fdh transcripts terminated or were processed between fdhC and fdhA. The resulting fdhC, fdhAB, and fdhCAB transcripts were present at all growth stages in cells growing on formate but were barely detectable during early exponential growth on H2 plus CO2. The levels of the fdh transcripts did, however, increase dramatically in cells growing on H2 plus CO2, coincident with the decrease in the growth rate and the onset of constant methanogenesis that occurred when culture densities reached an optical density at 600 nm of approximately 0.5. The mth transcript that encodes the H2-dependent methenyl-H4 MPT reductase (MTH) and the frh and mvh transcripts that encode the coenzyme F420-reducing (FRH) and nonreducing (MVH) hydrogenases, respectively, were also present in cells growing on formate, consistent with the synthesis of three hydrogenases, MTH, FRH, and MVH, in the absence of exogenously supplied H2. Reducing the H2 supply to M. thermoformicicum cells growing on H2 plus CO2 reduced the growth rate and CH4 production but increased frh and fdh transcription and also increased transcription of the mtd, mer, and mcr genes that encode enzymes that catalyze steps 4, 5, and 7, respectively, in the pathway of CO2 reduction to CH4. Reducing the H2 supply to a level insufficient for growth resulted in the disappearance of all methane gene transcripts except the mcr transcript, which increased. Regions flanking the fdhCAB operon in M. thermoformicicum Z-245 were used as probes to clone the homologous region from the Methanobacterium thermoautotrophicum deltaH genome. Sequencing revealed the presence of very similar genes except that the genome of M. thermoautotrophicum, a methanogen incapable of growth on formate, lacked the fdhCAB operon.

Collapse

Koonin EV, Mushegian AR. Complete genome sequences of cellular life forms: glimpses of theoretical evolutionary genomics. Curr Opin Genet Dev 1996;6:757-62. [PMID: 8994848 DOI: 10.1016/s0959-437x(96)80032-3] [Citation(s) in RCA: 49] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]

Brown JR. Preparing for the flood: evolutionary biology in the age of genomics. Trends Ecol Evol 1996;11:510-3. [DOI: 10.1016/s0169-5347(96)20082-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Alvarez CE, Robison K, Gilbert W. Novel Gq alpha isoform is a candidate transducer of rhodopsin signaling in a Drosophila testes-autonomous pacemaker. Proc Natl Acad Sci U S A 1996;93:12278-82. [PMID: 8901571 PMCID: PMC37981 DOI: 10.1073/pnas.93.22.12278] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open

Berben G. Nitrobacter winogradskyi cytochrome c oxidase genes are organized in a repeated gene cluster. Antonie Van Leeuwenhoek 1996;69:305-15. [PMID: 8836428 DOI: 10.1007/bf00399619] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]

Jovanovic G, Weiner L, Model P. Identification, nucleotide sequence, and characterization of PspF, the transcriptional activator of the Escherichia coli stress-induced psp operon. J Bacteriol 1996;178:1936-45. [PMID: 8606168 PMCID: PMC177889 DOI: 10.1128/jb.178.7.1936-1945.1996] [Citation(s) in RCA: 112] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open

Koonin EV, Mushegian AR, Rudd KE. Sequencing and analysis of bacterial genomes. Curr Biol 1996;6:404-16. [PMID: 8723345 DOI: 10.1016/s0960-9822(02)00508-0] [Citation(s) in RCA: 82] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

Tatusov RL, Mushegian AR, Bork P, Brown NP, Hayes WS, Borodovsky M, Rudd KE, Koonin EV. Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli. Curr Biol 1996;6:279-91. [PMID: 8805245 DOI: 10.1016/s0960-9822(02)00478-5] [Citation(s) in RCA: 207] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]

Abstract

BACKGROUND

The 1.83 Megabase (Mb) sequence of the Haemophilus influenzae chromosome, the first completed genome sequence of a cellular life form, has been recently reported. Approximately 75 % of the 4.7 Mb genome sequence of Escherichia coli is also available. The life styles of the two bacteria are very different - H. influenzae is an obligate parasite that lives in human upper respiratory mucosa and can be cultivated only on rich media, whereas E. coli is a saprophyte that can grow on minimal media. A detailed comparison of the protein products encoded by these two genomes is expected to provide valuable insights into bacterial cell physiology and genome evolution.

RESULTS

We describe the results of computer analysis of the amino-acid sequences of 1703 putative proteins encoded by the complete genome of H. influenzae. We detected sequence similarity to proteins in current databases for 92 % of the H. influenzae protein sequences, and at least a general functional prediction was possible for 83 %. A comparison of the H. influenzae protein sequences with those of 3010 proteins encoded by the sequenced 75 % of the E. coli genome revealed 1128 pairs of apparent orthologs, with an average of 59 % identity. In contrast to the high similarity between orthologs, the genome organization and the functional repertoire of genes in the two bacteria were remarkably different. The smaller genome size of H. influenzae is explained, to a large extent, by a reduction in the number of paralogous genes. There was no long range colinearity between the E. coli and H. influenzae gene orders, but over 70 % of the orthologous genes were found in short conserved strings, only about half of which were operons in E. coli. Superposition of the H. influenzae enzyme repertoire upon the known E. coli metabolic pathways allowed us to reconstruct similar and alternative pathways in H. influenzae and provides an explanation for the known nutritional requirements.

CONCLUSIONS

By comparing proteins encoded by the two bacterial genomes, we have shown that extensive gene shuffling and variation in the extent of gene paralogy are major trends in bacterial evolution; this comparison has also allowed us to deduce crucial aspects of the largely uncharacterized metabolism of H. influenzae.

Collapse

Robison K, Gilbert W, Church GM. More Haemophilus and Mycoplasma Genes. Science 1996. [DOI: 10.1126/science.271.5253.1302-b] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]

Koonin EV, Tatusov RL, Rudd KE. Protein sequence comparison at genome scale. Methods Enzymol 1996;266:295-322. [PMID: 8743691 DOI: 10.1016/s0076-6879(96)66020-0] [Citation(s) in RCA: 52] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

Liao X, Charlebois I, Ouellet C, Morency MJ, Dewar K, Lightfoot J, Foster J, Siehnel R, Schweizer H, Lam JS, Hancock REW, Levesque RC. Physical mapping of 32 genetic markers on the Pseudomonas aeruginosa PAO1 chromosome. MICROBIOLOGY (READING, ENGLAND) 1996;142 ( Pt 1):79-86. [PMID: 8581173 DOI: 10.1099/13500872-142-1-79] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]

Affiliation(s)

Xiaowen Liao Department of Microbiology and Immunology, University of British Columbia, 300-6174 University Boulevard, Vancouver BC, Canada V6T 1Z3
Isabelle Charlebois Microbiologie Moléculaire et Génie des Protéines, Département de Microbiologie, Faculté de Médecine, Pavillon Charles-Eugène-Marchand, Université Laval, Ste-Foy, Québec, Canada G1K 7P4
Catherine Ouellet Microbiologie Moléculaire et Génie des Protéines, Département de Microbiologie, Faculté de Médecine, Pavillon Charles-Eugène-Marchand, Université Laval, Ste-Foy, Québec, Canada G1K 7P4
Marie-Josée Morency Microbiologie Moléculaire et Génie des Protéines, Département de Microbiologie, Faculté de Médecine, Pavillon Charles-Eugène-Marchand, Université Laval, Ste-Foy, Québec, Canada G1K 7P4
Ken Dewar Microbiologie Moléculaire et Génie des Protéines, Département de Microbiologie, Faculté de Médecine, Pavillon Charles-Eugène-Marchand, Université Laval, Ste-Foy, Québec, Canada G1K 7P4
Jeff Lightfoot Microbiologie Moléculaire et Génie des Protéines, Département de Microbiologie, Faculté de Médecine, Pavillon Charles-Eugène-Marchand, Université Laval, Ste-Foy, Québec, Canada G1K 7P4
Jennifer Foster Department of Microbiology, University of Guelph, Guelph, Ontario, Canada N1G 2W1
Richard Siehnel Department of Microbiology and Immunology, University of British Columbia, 300-6174 University Boulevard, Vancouver BC, Canada V6T 1Z3
Herbert Schweizer Department of Medical Microbiology and Infectious Diseases, University of Calgary, Calgary, Alberta, Canada T2N 4N1
Joseph S Lam Department of Microbiology, University of Guelph, Guelph, Ontario, Canada N1G 2W1
Robert E W Hancock Department of Microbiology and Immunology, University of British Columbia, 300-6174 University Boulevard, Vancouver BC, Canada V6T 1Z3
Roger C Levesque Microbiologie Moléculaire et Génie des Protéines, Département de Microbiologie, Faculté de Médecine, Pavillon Charles-Eugène-Marchand, Université Laval, Ste-Foy, Québec, Canada G1K 7P4

Collapse

Koonin EV, Tatusov RL, Rudd KE. Sequence similarity analysis of Escherichia coli proteins: functional and evolutionary implications. Proc Natl Acad Sci U S A 1995;92:11921-5. [PMID: 8524875 PMCID: PMC40515 DOI: 10.1073/pnas.92.25.11921] [Citation(s) in RCA: 82] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open

Borodovsky M, McIninch JD, Koonin EV, Rudd KE, Médigue C, Danchin A. Detection of new genes in a bacterial genome using Markov models for three gene classes. Nucleic Acids Res 1995;23:3554-62. [PMID: 7567469 PMCID: PMC307237 DOI: 10.1093/nar/23.17.3554] [Citation(s) in RCA: 96] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open

Sanderson KE, Hessel A, Rudd KE. Genetic map of Salmonella typhimurium, edition VIII. Microbiol Rev 1995;59:241-303. [PMID: 7603411 PMCID: PMC239362 DOI: 10.1128/mr.59.2.241-303.1995] [Citation(s) in RCA: 79] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]

Darcy TJ, Sandman K, Reeve JN. Methanobacterium formicicum, a mesophilic methanogen, contains three HFo histones. J Bacteriol 1995;177:858-60. [PMID: 7836329 PMCID: PMC176673 DOI: 10.1128/jb.177.3.858-860.1995] [Citation(s) in RCA: 25] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open

Smith RF, King KY. Identification of a eukaryotic-like protein kinase gene in Archaebacteria. Protein Sci 1995;4:126-9. [PMID: 7773169 PMCID: PMC2142968 DOI: 10.1002/pro.5560040115] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]

Borodovsky M, Rudd KE, Koonin EV. Intrinsic and extrinsic approaches for detecting genes in a bacterial genome. Nucleic Acids Res 1994;22:4756-67. [PMID: 7984428 PMCID: PMC308528 DOI: 10.1093/nar/22.22.4756] [Citation(s) in RCA: 80] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open

Abstract

The unannotated regions of the Escherichia coli genome DNA sequence from the EcoSeq6 database, totaling 1,278 'intergenic' sequences of the combined length of 359,279 basepairs, were analyzed using computer-assisted methods with the aim of identifying putative unknown genes. The proposed strategy for finding new genes includes two key elements: i) prediction of expressed open reading frames (ORFs) using the GeneMark method based on Markov chain models for coding and non-coding regions of Escherichia coli DNA, and ii) search for protein sequence similarities using programs based on the BLAST algorithm and programs for motif identification. A total of 354 putative expressed ORFs were predicted by GeneMark. Using the BLASTX and TBLASTN programs, it was shown that 208 ORFs located in the unannotated regions of the E. coli chromosome are significantly similar to other protein sequences. Identification of 182 ORFs as probable genes was supported by GeneMark and BLAST, comprising 51.4% of the GeneMark 'hits' and 87.5% of the BLAST 'hits'. 73 putative new genes, comprising 20.6% of the GeneMark predictions, belong to ancient conserved protein families that include both eubacterial and eukaryotic members. This value is close to the overall proportion of highly conserved sequences among eubacterial proteins, indicating that the majority of the putative expressed ORFs that are predicted by GeneMark, but have no significant BLAST hits, nevertheless are likely to be real genes. The majority of the putative genes identified by BLAST search have been described since the release of the EcoSeq6 database, but about 70 genes have not been detected so far. Among these new identifications are genes encoding proteins with a variety of predicted functions including dehydrogenases, kinases, several other metabolic enzymes, ATPases, rRNA methyltransferases, membrane proteins, and different types of regulatory proteins.

Collapse