1
|
Assaf R, Xia F, Stevens R. Detecting operons in bacterial genomes via visual representation learning. Sci Rep 2021; 11:2124. [PMID: 33483546 PMCID: PMC7822928 DOI: 10.1038/s41598-021-81169-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Accepted: 12/30/2020] [Indexed: 12/05/2022] Open
Abstract
Contiguous genes in prokaryotes are often arranged into operons. Detecting operons plays a critical role in inferring gene functionality and regulatory networks. Human experts annotate operons by visually inspecting gene neighborhoods across pileups of related genomes. These visual representations capture the inter-genic distance, strand direction, gene size, functional relatedness, and gene neighborhood conservation, which are the most prominent operon features mentioned in the literature. By studying these features, an expert can then decide whether a genomic region is part of an operon. We propose a deep learning based method named Operon Hunter that uses visual representations of genomic fragments to make operon predictions. Using transfer learning and data augmentation techniques facilitates leveraging the powerful neural networks trained on image datasets by re-training them on a more limited dataset of extensively validated operons. Our method outperforms the previously reported state-of-the-art tools, especially when it comes to predicting full operons and their boundaries accurately. Furthermore, our approach makes it possible to visually identify the features influencing the network’s decisions to be subsequently cross-checked by human experts.
Collapse
Affiliation(s)
- Rida Assaf
- Department of Computer Science, University of Chicago, Chicago, 60637, USA.
| | - Fangfang Xia
- Computing Environment and Life Sciences Division, Argonne National Laboratory, Lemont, 60439, USA.,Data Science and Learning Division, Argonne National Laboratory, Lemont, 60439, USA
| | - Rick Stevens
- The University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, 60637, USA.,Computing Environment and Life Sciences Division, Argonne National Laboratory, Lemont, 60439, USA
| |
Collapse
|
2
|
Fernandez-Garcia L, Kim JS, Tomas M, Wood TK. Toxins of toxin/antitoxin systems are inactivated primarily through promoter mutations. J Appl Microbiol 2019; 127:1859-1868. [PMID: 31429177 DOI: 10.1111/jam.14414] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Revised: 07/24/2019] [Accepted: 08/12/2019] [Indexed: 11/27/2022]
Abstract
AIMS Given the extreme toxicity of some of the toxins of toxin-antitoxin (TA) systems, we were curious how the cell silences toxins, if the antitoxin is inactivated or independent toxins are obtained via horizontal gene transfer. METHODS AND RESULTS Growth curves of Escherichia coli K12 BW25113 harbouring plasmid pCA24N to produce RalR, MqsR, GhoT or Hha toxins, showed toxin inactivation after 3 h. Sequencing plasmids from these cultures revealed toxin inactivation occurred primarily due to consistent deletions in the promoter. The lack of mutation in the structural genes was corroborated by a bioinformatics analysis of 1000 E. coli genomes which showed both conservation and little variability in the four toxin genes. For those strains that lacked a mutation in the plasmid, single nucleotide polymorphism analysis was performed to identify that chromosomal mutations iraM and mhpR inactivate the toxins GhoT and MqsR/GhoT respectively. CONCLUSION We find that the RalR (type I), MqsR (type II), GhoT (type V) and Hha (type VII) toxins are inactivated primarily by a mutation that inactivates the toxin promoter or via the chromosomal mutations iraM and mhpR. SIGNIFICANCE AND IMPACT OF THE STUDY This study demonstrates toxins of TA systems may be inactivated by mutations that primarily affect the toxin gene promoter instead of the toxin structural gene.
Collapse
Affiliation(s)
- L Fernandez-Garcia
- Department of Chemical Engineering, Pennsylvania State University, University Park, PA, USA.,Microbiology Department-Research Institute Biomedical A Coruña (INIBIC), Hospital A Coruña (CHUAC), University of A Coruña (UDC), A Coruña, Spain
| | - J-S Kim
- Infectious Disease Research Center, Korea Research Institute of Bioscience & Biotechnology (KRIBB), Yuseong-gu, Daejeon, South Korea
| | - M Tomas
- Infectious Disease Research Center, Korea Research Institute of Bioscience & Biotechnology (KRIBB), Yuseong-gu, Daejeon, South Korea
| | - T K Wood
- Department of Chemical Engineering, Pennsylvania State University, University Park, PA, USA.,Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA
| |
Collapse
|
3
|
Zaidi SSA, Zhang X. Computational operon prediction in whole-genomes and metagenomes. Brief Funct Genomics 2018; 16:181-193. [PMID: 27659221 DOI: 10.1093/bfgp/elw034] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Microbial diversity in unique environmental settings enables abrupt responses catalysed by altering the gene regulation and formation of gene clusters called operons. Operons increases bacterial adaptability, which in turn increases their survival. This review article presents the emergence of computational operon prediction methods for whole microbial genomes and metagenomes, and discusses their strengths and limitations. Most of the whole-genome operon prediction methods struggle to generalize on unrelated genomes. The applicability of universal whole-genome operon prediction methods to metagenomic data is an interesting yet less investigated question. We have evaluated the potential of various operon prediction features for genomic and metagenomic data. Most of operon prediction methods with high accuracy have been compiled into databases. Despite of the high predictive performance, the data among many databases are not completely consistent for similar species. We performed a correlation analysis between the computationally predicted operon databases and experimentally validated data for Escherichia coli, Bacillus subtilis and Mycobacterium tuberculosis. Operon prediction for most of the less characterized microbes cannot be verified due to absence of experimentally validated operons. The generation of validated information for other microbes would test the authenticity of operon databases for other less annotated microbes as well. Advances in sequencing technologies and development of better analysis methods will help researchers to overcome the technological hurdles (such as long sequencing reads and improved contig size) and further improve operon predictions and better utilize operonic information.
Collapse
|
4
|
Chuang LY, Yang CH, Tsai JH, Yang CH. Operon prediction using chaos embedded particle swarm optimization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1299-1309. [PMID: 24384714 DOI: 10.1109/tcbb.2013.63] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Operons contain valuable information for drug design and determining protein functions. Genes within an operon are co-transcribed to a single-strand mRNA and must be coregulated. The identification of operons is, thus, critical for a detailed understanding of the gene regulations. However, currently used experimental methods for operon detection are generally difficult to implement and time consuming. In this paper, we propose a chaotic binary particle swarm optimization (CBPSO) to predict operons in bacterial genomes. The intergenic distance, participation in the same metabolic pathway and the cluster of orthologous groups (COG) properties of the Escherichia coli genome are used to design a fitness function. Furthermore, the Bacillus subtilis, Pseudomonas aeruginosa PA01, Staphylococcus aureus and Mycobacterium tuberculosis genomes are tested and evaluated for accuracy, sensitivity, and specificity. The computational results indicate that the proposed method works effectively in terms of enhancing the performance of the operon prediction. The proposed method also achieved a good balance between sensitivity and specificity when compared to methods from the literature.
Collapse
Affiliation(s)
| | - Cheng-Huei Yang
- National Kaohsiung Institute of Marine Technology, Kaohsiung
| | - Jui-Hung Tsai
- National Kaohsiung University of Applied Sciences, Kaohsiung
| | - Cheng-Hong Yang
- National Kaohsiung University of Applied Sciences, Kaohsiung
| |
Collapse
|
5
|
Bloodworth RAM, Gislason AS, Cardona ST. Burkholderia cenocepacia conditional growth mutant library created by random promoter replacement of essential genes. Microbiologyopen 2013; 2:243-58. [PMID: 23389959 PMCID: PMC3633349 DOI: 10.1002/mbo3.71] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2012] [Revised: 12/24/2012] [Accepted: 01/08/2013] [Indexed: 01/05/2023] Open
Abstract
Identification of essential genes by construction of conditional knockouts with inducible promoters allows the identification of essential genes and creation of conditional growth (CG) mutants that are then available as genetic tools for further studies. We used large-scale transposon delivery of the rhamnose-inducible promoter, PrhaB followed by robotic screening of rhamnose-dependent growth to construct a genomic library of 106 Burkholderia cenocepacia CG mutants. Transposon insertions were found where PrhaB was in the same orientation of widely conserved, well-characterized essential genes as well as genes with no previous records of essentiality in other microorganisms. Using previously reported global gene-expression analyses, we demonstrate that PrhaB can achieve the wide dynamic range of expression levels required for essential genes when the promoter is delivered randomly and mutants with rhamnose-dependent growth are selected. We also show specific detection of the target of an antibiotic, novobiocin, by enhanced sensitivity of the corresponding CG mutant (PrhaB controlling gyrB expression) within the library. Modulation of gene expression to achieve 30-60% of wild-type growth created conditions for specific hypersensitivity demonstrating the value of the CG mutant library for chemogenomic experiments. In summary, CG mutants can be obtained on a large scale by random delivery of a tightly regulated inducible promoter into the bacterial chromosome followed by a simple screening for the CG phenotype, without previous information on gene essentiality.
Collapse
Affiliation(s)
- Ruhi A M Bloodworth
- Department of Microbiology, University of Manitoba, Winnipeg, Manitoba, Canada
| | | | | |
Collapse
|
6
|
A global analysis of adaptive evolution of operons in cyanobacteria. Antonie van Leeuwenhoek 2012; 103:331-46. [PMID: 22987250 DOI: 10.1007/s10482-012-9813-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2012] [Accepted: 09/06/2012] [Indexed: 01/04/2023]
Abstract
Operons are an important feature of prokaryotic genomes. Evolution of operons is hypothesized to be adaptive and has contributed significantly towards coordinated optimization of functions. Two conflicting theories, based on (i) in situ formation to achieve co-regulation and (ii) horizontal gene transfer of functionally linked gene clusters, are generally considered to explain why and how operons have evolved. Furthermore, effects of operon evolution on genomic traits such as intergenic spacing, operon size and co-regulation are relatively less explored. Based on the conservation level in a set of diverse prokaryotes, we categorize the operonic gene pair associations and in turn the operons as ancient and recently formed. This allowed us to perform a detailed analysis of operonic structure in cyanobacteria, a morphologically and physiologically diverse group of photoautotrophs. Clustering based on operon conservation showed significant similarity with the 16S rRNA-based phylogeny, which groups the cyanobacterial strains into three clades. Clade C, dominated by strains that are believed to have undergone genome reduction, shows a larger fraction of operonic genes that are tightly packed in larger sized operons. Ancient operons are in general larger, more tightly packed, better optimized for co-regulation and part of key cellular processes. A sub-clade within Clade B, which includes Synechocystis sp. PCC 6803, shows a reverse trend in intergenic spacing. Our results suggest that while in situ formation and vertical descent may be a dominant mechanism of operon evolution in cyanobacteria, optimization of intergenic spacing and co-regulation are part of an ongoing process in the life-cycle of operons.
Collapse
|
7
|
Chuang LY, Chang HW, Tsai JH, Yang CH. Features for computational operon prediction in prokaryotes. Brief Funct Genomics 2012; 11:291-9. [PMID: 22753776 DOI: 10.1093/bfgp/els024] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Accurate prediction of operons can improve the functional annotation and application of genes within operons in prokaryotes. Here, we review several features: (i) intergenic distance, (ii) metabolic pathways, (iii) homologous genes, (iv) promoters and terminators, (v) gene order conservation, (vi) microarray, (vii) clusters of orthologous groups, (viii) gene length ratio, (ix) phylogenetic profiles, (x) operon length/size and (xi) STRING database scores, as well as some other features, which have been applied in recent operon prediction methods in prokaryotes in the literature. Based on a comparison of the prediction performances of these features, we conclude that other, as yet undiscovered features, or feature selection with a receiver operating characteristic analysis before algorithm processing can improve operon prediction in prokaryotes.
Collapse
Affiliation(s)
- Li-Yeh Chuang
- Department of Chemical Engineering & Institute of Biotechnology and Chemical Engineering, I-Shou University, Taiwan
| | | | | | | |
Collapse
|
8
|
Identification of TATA and TATA-less promoters in plant genomes by integrating diversity measure, GC-Skew and DNA geometric flexibility. Genomics 2010; 97:112-20. [PMID: 21112384 DOI: 10.1016/j.ygeno.2010.11.002] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2010] [Revised: 11/05/2010] [Accepted: 11/12/2010] [Indexed: 11/20/2022]
Abstract
Accurate identification of core promoters is important for gaining more insight about the understanding of the eukaryotic transcription regulation. In this study, the authors focused on the biologically realistic promoter prediction of plant genomes. By analyzing the correlative conservation, GC-compositional bias and specific structural patterns of TATA and TATA-less promoters in PlantPromDB, a hybrid multi-feature approach based on support vector machine (SVM) for predicting the two types of promoters were developed by integrating local word content, GC-Skew and DNA geometric flexibility. Compared with the TSSP-TCM program on the same test dataset, better prediction results were obtained. Especially for the TATA-less promoter, the accuracy is 10% higher than the result of TSSP-TCM program. The good performance of the hybrid promoters and the experimental data also indicate that our method has the ability to locate the promoter region of the plant genome.
Collapse
|
9
|
Abstract
An operon is a fundamental unit of transcription and contains specific functional genes for the construction and regulation of networks at the entire genome level. The correct prediction of operons is vital for understanding gene regulations and functions in newly sequenced genomes. As experimental methods for operon detection tend to be nontrivial and time consuming, various methods for operon prediction have been proposed in the literature. In this study, a binary particle swarm optimization is used for operon prediction in bacterial genomes. The intergenic distance, participation in the same metabolic pathway, the cluster of orthologous groups, the gene length ratio and the operon length are used to design a fitness function. We trained the proper values on the Escherichia coli genome, and used the above five properties to implement feature selection. Finally, our study used the intergenic distance, metabolic pathway and the gene length ratio property to predict operons. Experimental results show that the prediction accuracy of this method reached 92.1%, 93.3% and 95.9% on the Bacillus subtilis genome, the Pseudomonas aeruginosa PA01 genome and the Staphylococcus aureus genome, respectively. This method has enabled us to predict operons with high accuracy for these three genomes, for which only limited data on the properties of the operon structure exists.
Collapse
Affiliation(s)
- Li-Yeh Chuang
- Department of Chemical Engineering, I-Shou University, Kaohsiung, Taiwan
| | | | | |
Collapse
|
10
|
Taboada B, Verde C, Merino E. High accuracy operon prediction method based on STRING database scores. Nucleic Acids Res 2010; 38:e130. [PMID: 20385580 PMCID: PMC2896540 DOI: 10.1093/nar/gkq254] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We present a simple and highly accurate computational method for operon prediction, based on intergenic distances and functional relationships between the protein products of contiguous genes, as defined by STRING database (Jensen,L.J., Kuhn,M., Stark,M., Chaffron,S., Creevey,C., Muller,J., Doerks,T., Julien,P., Roth,A., Simonovic,M. et al. (2009) STRING 8–a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res., 37, D412–D416). These two parameters were used to train a neural network on a subset of experimentally characterized Escherichia coli and Bacillus subtilis operons. Our predictive model was successfully tested on the set of experimentally defined operons in E. coli and B. subtilis, with accuracies of 94.6 and 93.3%, respectively. As far as we know, these are the highest accuracies ever obtained for predicting bacterial operons. Furthermore, in order to evaluate the predictable accuracy of our model when using an organism's data set for the training procedure, and a different organism's data set for testing, we repeated the E. coli operon prediction analysis using a neural network trained with B. subtilis data, and a B. subtilis analysis using a neural network trained with E. coli data. Even for these cases, the accuracies reached with our method were outstandingly high, 91.5 and 93%, respectively. These results show the potential use of our method for accurately predicting the operons of any other organism. Our operon predictions for fully-sequenced genomes are available at http://operons.ibt.unam.mx/OperonPredictor/.
Collapse
Affiliation(s)
- Blanca Taboada
- Centro de Ciencias Aplicadas y Desarrollo Tecnológico, Universidad Nacional Autónoma de México, México, D.F., México
| | | | | |
Collapse
|
11
|
Brouwer RWW, Kuipers OP, van Hijum SAFT. The relative value of operon predictions. Brief Bioinform 2008; 9:367-75. [PMID: 18420711 DOI: 10.1093/bib/bbn019] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
For most organisms, computational operon predictions are the only source of genome-wide operon information. Operon prediction methods described in literature are based on (a combination of) the following five criteria: (i) intergenic distance, (ii) conserved gene clusters, (iii) functional relation, (iv) sequence elements and (v) experimental evidence. The performance estimates of operon predictions reported in literature cannot directly be compared due to differences in methods and data used in these studies. Here, we survey the current status of operon prediction methods. Based on a comparison of the performance of operon predictions on Escherichia coli and Bacillus subtilis we conclude that there is still room for improvement. We expect that existing and newly generated genomics and transcriptomics data will further improve accuracy of operon prediction methods.
Collapse
|
12
|
Co-regulation of metabolic genes is better explained by flux coupling than by network distance. PLoS Comput Biol 2008; 4:e26. [PMID: 18225949 PMCID: PMC2211535 DOI: 10.1371/journal.pcbi.0040026] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2007] [Accepted: 12/10/2007] [Indexed: 12/31/2022] Open
Abstract
To what extent can modes of gene regulation be explained by systems-level properties of metabolic networks? Prior studies on co-regulation of metabolic genes have mainly focused on graph-theoretical features of metabolic networks and demonstrated a decreasing level of co-expression with increasing network distance, a naïve, but widely used, topological index. Others have suggested that static graph representations can poorly capture dynamic functional associations, e.g., in the form of dependence of metabolic fluxes across genes in the network. Here, we systematically tested the relative importance of metabolic flux coupling and network position on gene co-regulation, using a genome-scale metabolic model of Escherichia coli. After validating the computational method with empirical data on flux correlations, we confirm that genes coupled by their enzymatic fluxes not only show similar expression patterns, but also share transcriptional regulators and frequently reside in the same operon. In contrast, we demonstrate that network distance per se has relatively minor influence on gene co-regulation. Moreover, the type of flux coupling can explain refined properties of the regulatory network that are ignored by simple graph-theoretical indices. Our results underline the importance of studying functional states of cellular networks to define physiologically relevant associations between genes and should stimulate future developments of novel functional genomic tools. Why do certain genes in a biological network show tight transcriptional co-regulation while others are more or less independently regulated? Prior studies showed that the degree of co-regulation between enzymatic genes decreases with their distance in the metabolic network. However, there are fundamental reasons to suspect that network distance is an incomplete descriptor of functional coherence (hence gene co-regulation), and other, biochemically more relevant measures, have been proposed to capture the functional dependencies between enzymes. We systematically examine whether flux coupling, a biochemically sound and computationally tractable measure of functional interaction between reactions, can better explain gene co-regulation than network distance in the metabolisms of Escherichia coli and Saccharomyces cerevisiae. After validating the flux coupling method using published experimental data on in vivo flux correlations (i.e., coherence of reaction usage), we demonstrate that it not only outperforms metabolic network distance in relation to in vivo flux correlations, but also in explaining transcriptional co-regulation and operonic organization. Future functional genomics studies could benefit from the concept of flux coupling by using it as a basis to test the reliability of computationally predicted functional associations.
Collapse
|
13
|
Charaniya S, Mehra S, Lian W, Jayapal KP, Karypis G, Hu WS. Transcriptome dynamics-based operon prediction and verification in Streptomyces coelicolor. Nucleic Acids Res 2007; 35:7222-36. [PMID: 17959654 PMCID: PMC2175336 DOI: 10.1093/nar/gkm501] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Streptomyces spp. produce a variety of valuable secondary metabolites, which are regulated in a spatio-temporal manner by a complex network of inter-connected gene products. Using a compilation of genome-scale temporal transcriptome data for the model organism, Streptomyces coelicolor, under different environmental and genetic perturbations, we have developed a supervised machine-learning method for operon prediction in this microorganism. We demonstrate that, using features dependent on transcriptome dynamics and genome sequence, a support vector machines (SVM)-based classification algorithm can accurately classify >90% of gene pairs in a set of known operons. Based on model predictions for the entire genome, we verified the co-transcription of more than 250 gene pairs by RT-PCR. These results vastly increase the database of known operons in S. coelicolor and provide valuable information for exploring gene function and regulation to harness the potential of this differentiating microorganism for synthesis of natural products.
Collapse
Affiliation(s)
- Salim Charaniya
- Department of Chemical Engineering and Materials Science, University of Minnesota, 421 Washington Avenue SE, Minneapolis, MN 55455-0132, USA
| | | | | | | | | | | |
Collapse
|
14
|
Liu B, Li S, Wang Y, Lu L, Li Y, Cai Y. Predicting the protein SUMO modification sites based on Properties Sequential Forward Selection (PSFS). Biochem Biophys Res Commun 2007; 358:136-9. [PMID: 17470363 DOI: 10.1016/j.bbrc.2007.04.097] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2007] [Accepted: 04/12/2007] [Indexed: 11/24/2022]
Abstract
Protein SUMO modification is an important post-translational modification and the optimization of prediction methods remains a challenge. Here, by using Support Vector Machines algorithm (SVM), a novel computational method was developed for SUMO modification site prediction based on Sequential Forward Selection (SFS) of hundreds of amino acid properties, which are collected by Amino Acid Index database (http://www.genome.jp/aaindex). Our method also compares with the 0/1 system, in which the 20 amino acids are represented by 20-dimensional vectors (A = 00000000000000000001, C = 00000000000000000010 and so on). The overall accuracy of leave-one-out cross-validation for our method reaches 89.18%, which is higher than 0/1 system. It indicated that the SUMO modification prediction process is highly related to the amino acid property and this approach here provide a helpful tool for further investigation of the SUMO modification and identification of sumoylation sites in proteins. The software is available at http://www.biosino.org/sumo.
Collapse
Affiliation(s)
- Boshu Liu
- Bioinformatics Center, Key Lab of Systems Biology, Shanghai Institute for Biological Sciences, Chinese Academy of Sciences, China
| | | | | | | | | | | |
Collapse
|
15
|
Dam P, Olman V, Harris K, Su Z, Xu Y. Operon prediction using both genome-specific and general genomic information. Nucleic Acids Res 2006; 35:288-98. [PMID: 17170009 PMCID: PMC1802555 DOI: 10.1093/nar/gkl1018] [Citation(s) in RCA: 141] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
We have carried out a systematic analysis of the contribution of a set of selected features that include three new features to the accuracy of operon prediction. Our analyses have led to a number of new insights about operon prediction, including that (i) different features have different levels of discerning power when used on adjacent gene pairs with different ranges of intergenic distance, (ii) certain features are universally useful for operon prediction while others are more genome-specific and (iii) the prediction reliability of operons is dependent on intergenic distances. Based on these new insights, our newly developed operon-prediction program achieves more accurate operon prediction than the previous ones, and it uses features that are most readily available from genomic sequences. Our prediction results indicate that our (non-linear) decision tree-based classifier can predict operons in a prokaryotic genome very accurately when a substantial number of operons in the genome are already known. For example, the prediction accuracy of our program can reach 90.2 and 93.7% on Bacillus subtilis and Escherichia coli genomes, respectively. When no such information is available, our (linear) logistic function-based classifier can reach the prediction accuracy at 84.6 and 83.3% for E.coli and B.subtilis, respectively.
Collapse
Affiliation(s)
- Phuongan Dam
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, University of GeorgiaAthens, GA, USA
| | - Victor Olman
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, University of GeorgiaAthens, GA, USA
- Institute of Bioinformatics, University of GeorgiaAthens, GA, USA
| | - Kyle Harris
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, University of GeorgiaAthens, GA, USA
| | - Zhengchang Su
- Center for Bioinformatics Research, Department of Computer Science, University of North Carolina at CharlotteCharlotte, NC, USA
| | - Ying Xu
- Computational Systems Biology Laboratory, Department of Biochemistry and Molecular Biology, University of GeorgiaAthens, GA, USA
- Institute of Bioinformatics, University of GeorgiaAthens, GA, USA
- To whom correspondence should be addressed. Tel: +1 706 542 9779; Fax: +1 706 542 9751;
| |
Collapse
|