Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Müller A, MacCallum RM, Sternberg MJ. Benchmarking PSI-BLAST in genome annotation. J Mol Biol 1999;293:1257-71. [PMID: 10547299 DOI: 10.1006/jmbi.1999.3233] [Citation(s) in RCA: 89] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Number

Cited by Other Article(s)

Proj M, De Jonghe S, Van Loy T, Jukič M, Meden A, Ciber L, Podlipnik Č, Grošelj U, Konc J, Schols D, Gobec S. A Set of Experimentally Validated Decoys for the Human CC Chemokine Receptor 7 (CCR7) Obtained by Virtual Screening. Front Pharmacol 2022;13:855653. [PMID: 35370691 PMCID: PMC8972196 DOI: 10.3389/fphar.2022.855653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Accepted: 02/28/2022] [Indexed: 11/21/2022] Open

Gao M, Lund-Andersen P, Morehead A, Mahmud S, Chen C, Chen X, Giri N, Roy RS, Quadir F, Effler TC, Prout R, Abraham S, Elwasif W, Haas NQ, Skolnick J, Cheng J, Sedova A. High-Performance Deep Learning Toolbox for Genome-Scale Prediction of Protein Structure and Function. WORKSHOP ON MACHINE LEARNING IN HPC ENVIRONMENTS. WORKSHOP ON MACHINE LEARNING IN HPC ENVIRONMENTS 2021;2021:46-57. [PMID: 35112110 PMCID: PMC8802329 DOI: 10.1109/mlhpc54614.2021.00010] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]

Gao M, Skolnick J. A novel sequence alignment algorithm based on deep learning of the protein folding code. Bioinformatics 2021;37:490-496. [PMID: 32960943 PMCID: PMC8599902 DOI: 10.1093/bioinformatics/btaa810] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Revised: 08/11/2020] [Accepted: 09/08/2020] [Indexed: 11/12/2022] Open

Kargar F, Savardashtaki A, Mortazavi M, Mahani MT, Amani AM, Ghasemi Y, Nezafat N. In SilicoStudy of 1, 4 Alpha Glucan Branching Enzyme and Substrate Docking Studies. CURR PROTEOMICS 2020. [DOI: 10.2174/1570164616666190401204009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Hydrogen-Cycling during Solventogenesis in Clostridium acetobutylicum American Type Culture Collection (ATCC) 824 Requires the [NiFe]-Hydrogenase for Energy Conservation. FERMENTATION 2018. [DOI: 10.3390/fermentation4030055] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

Abstract Clostridium acetobutylicum has traditionally been used for production of acetone, butanol, and ethanol (ABE). Butanol is a commodity chemical due in part to its suitability as a biofuel; however, the current yield of this product from biological systems is not economically feasible as an alternative fuel source. Understanding solvent phase physiology, solvent tolerance, and their genetic underpinning is key for future strain optimization of the bacterium. This study shows the importance of a [NiFe]-hydrogenase in solvent phase physiology. C. acetobutylicum genes ca_c0810 and ca_c0811, annotated as a HypF and HypD maturation factor, were found to be required for [NiFe]-hydrogenase activity. They were shown to be part of a polycistronic operon with other hyp genes. Hydrogenase activity assays of the ΔhypF/hypD mutant showed an almost complete inactivation of the [NiFe]-hydrogenase. Metabolic studies comparing ΔhypF/hypD and wild type (WT) strains in planktonic and sessile conditions indicated the hydrogenase was important for solvent phase metabolism. For the mutant, reabsorption of acetate and butyrate was inhibited during solventogenesis in planktonic cultures, and less ABE was produced. During sessile growth, the ΔhypF/hypD mutant had higher initial acetone: butanol ratios, which is consistent with the inability to obtain reduced cofactors via H2 uptake. In sessile conditions, the ΔhypF/hypD mutant was inhibited in early solventogenesis, but it appeared to remodel its metabolism and produced mainly butanol in late solventogenesis without the uptake of acids. Energy filtered transmission electron microscopy (EFTEM) mapped Pd(II) reduction via [NiFe]-hydrogenase induced H2 oxidation at the extracelluar side of the membrane on WT cells. A decrease of Pd(0) deposits on ΔhypF/hypD comparatively to WT indicates that the [NiFe]-hydrogenase contributed to the Pd(II) reduction. Calculations of reaction potentials during acidogenesis and solventogenesis predict the [NiFe]-hydrogenase can couple NAD+ reduction with membrane transport of electrons. Extracellular oxidation of H2 combined with the potential for electron transport across the membrane indicate that the [NiFe}-hydrogenase contributes to proton motive force maintenance via hydrogen cycling. Collapse

Skariyachan S. Exploring the Potential of Herbal Ligands Toward Multidrug-Resistant Bacterial Pathogens by Computational Drug Discovery. TRANSLATIONAL BIOINFORMATICS AND ITS APPLICATION 2017. [DOI: 10.1007/978-94-024-1045-7_4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]

Saripella GV, Sonnhammer ELL, Forslund K. Benchmarking the next generation of homology inference tools. Bioinformatics 2016;32:2636-41. [PMID: 27256311 PMCID: PMC5013910 DOI: 10.1093/bioinformatics/btw305] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2015] [Accepted: 05/05/2016] [Indexed: 12/21/2022] Open

Abstract

Motivation: Over the last decades, vast numbers of sequences were deposited in public databases. Bioinformatics tools allow homology and consequently functional inference for these sequences. New profile-based homology search tools have been introduced, allowing reliable detection of remote homologs, but have not been systematically benchmarked. To provide such a comparison, which can guide bioinformatics workflows, we extend and apply our previously developed benchmark approach to evaluate the ‘next generation’ of profile-based approaches, including CS-BLAST, HHSEARCH and PHMMER, in comparison with the non-profile based search tools NCBI-BLAST, USEARCH, UBLAST and FASTA.

Method: We generated challenging benchmark datasets based on protein domain architectures within either the PFAM + Clan, SCOP/Superfamily or CATH/Gene3D domain definition schemes. From each dataset, homologous and non-homologous protein pairs were aligned using each tool, and standard performance metrics calculated. We further measured congruence of domain architecture assignments in the three domain databases.

Results: CSBLAST and PHMMER had overall highest accuracy. FASTA, UBLAST and USEARCH showed large trade-offs of accuracy for speed optimization.

Conclusion: Profile methods are superior at inferring remote homologs but the difference in accuracy between methods is relatively small. PHMMER and CSBLAST stand out with the highest accuracy, yet still at a reasonable computational cost. Additionally, we show that less than 0.1% of Swiss-Prot protein pairs considered homologous by one database are considered non-homologous by another, implying that these classifications represent equivalent underlying biological phenomena, differing mostly in coverage and granularity.

Availability and Implementation: Benchmark datasets and all scripts are placed at (http://sonnhammer.org/download/Homology_benchmark).

Contact:forslund@embl.de

Supplementary information: Supplementary data are available at Bioinformatics online.

Collapse

Žváček C, Friedrichs G, Heizinger L, Merkl R. An assessment of catalytic residue 3D ensembles for the prediction of enzyme function. BMC Bioinformatics 2015;16:359. [PMID: 26538500 PMCID: PMC4634577 DOI: 10.1186/s12859-015-0807-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Accepted: 10/29/2015] [Indexed: 12/03/2022] Open

Abstract

Background

The central element of each enzyme is the catalytic site, which commonly catalyzes a single biochemical reaction with high specificity. It was unclear to us how often sites that catalyze the same or highly similar reactions evolved on different, i. e. non-homologous protein folds and how similar their 3D poses are. Both similarities are key criteria for assessing the usability of pose comparison for function prediction.

Results

We have analyzed the SCOP database on the superfamily level in order to estimate the number of non-homologous enzymes possessing the same function according to their EC number. 89 % of the 873 substrate-specific functions (four digit EC number) assigned to mono-functional, single-domain enzymes were only found in one superfamily. For a reaction-specific grouping (three digit EC number), this value dropped to 35 %, indicating that in approximately 65 % of all enzymes the same function evolved in two or more non-homologous proteins.

For these isofunctional enzymes, structural similarity of the catalytic sites may help to predict function, because neither high sequence similarity nor identical folds are required for a comparison. To assess the specificity of catalytic 3D poses, we compiled the redundancy-free set ENZ_SITES, which comprises 695 sites, whose composition and function are well-defined. We compared their poses with the help of the program Superpose3D and determined classification performance. If the sites were from different superfamilies, the number of true and false positive predictions was similarly high, both for a coarse and a detailed grouping of enzyme function. Moreover, classification performance did not improve drastically, if we additionally used homologous sites to predict function.

Conclusions

For a large number of enzymatic functions, dissimilar sites evolved that catalyze the same reaction and it is the individual substrate that determines the arrangement of the catalytic site and its local environment. These substrate-specific requirements turn the comparison of catalytic residues into a weak classifier for the prediction of enzyme function.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0807-6) contains supplementary material, which is available to authorized users.

Collapse

Ghouzam Y, Postic G, de Brevern AG, Gelly JC. Improving protein fold recognition with hybrid profiles combining sequence and structure evolution. Bioinformatics 2015;31:3782-9. [PMID: 26254434 DOI: 10.1093/bioinformatics/btv462] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2015] [Accepted: 08/02/2015] [Indexed: 11/13/2022] Open

Wagner I, Volkmer M, Sharan M, Villaveces JM, Oswald F, Surendranath V, Habermann BH. morFeus: a web-based program to detect remotely conserved orthologs using symmetrical best hits and orthology network scoring. BMC Bioinformatics 2014;15:263. [PMID: 25096057 PMCID: PMC4137093 DOI: 10.1186/1471-2105-15-263] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2014] [Accepted: 07/21/2014] [Indexed: 02/04/2023] Open

Jagilinki BP, Gadewal N, Mehta H, Mahadik H, Pandey V, Sawant U, A Wadegaonkar P, Goyal P, Kumar S, K Varma A. Conserved residues at the MAPKs binding interfaces that regulate transcriptional machinery. J Biomol Struct Dyn 2014;33:852-60. [PMID: 24739067 DOI: 10.1080/07391102.2014.915764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Mishra S, Saxena A, Sangwan RS. Fundamentals of Homology Modeling Steps and Comparison among Important Bioinformatics Tools: An Overview. ACTA ACUST UNITED AC 2013. [DOI: 10.17311/sciintl.2013.237.252] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]

Homology modeling and analysis of structure predictions of the bovine rhinitis B virus RNA dependent RNA polymerase (RdRp). Int J Mol Sci 2012;13:8998-9013. [PMID: 22942748 PMCID: PMC3430279 DOI: 10.3390/ijms13078998] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2012] [Revised: 07/03/2012] [Accepted: 07/11/2012] [Indexed: 11/16/2022] Open

Abstract

Bovine Rhinitis B Virus (BRBV) is a picornavirus responsible for mild respiratory infection of cattle. It is probably the least characterized among the aphthoviruses. BRBV is the closest relative known to Foot and Mouth Disease virus (FMDV) with a ~43% identical polyprotein sequence and as much as 67% identical sequence for the RNA dependent RNA polymerase (RdRp), which is also known as 3D polymerase (3D^pol). In the present study we carried out phylogenetic analysis, structure based sequence alignment and prediction of three-dimensional structure of BRBV 3D^pol using a combination of different computational tools. Model structures of BRBV 3D^pol were verified for their stereochemical quality and accuracy. The BRBV 3D^pol structure predicted by SWISS-MODEL exhibited highest scores in terms of stereochemical quality and accuracy, which were in the range of 2Å resolution crystal structures. The active site, nucleic acid binding site and overall structure were observed to be in agreement with the crystal structure of unliganded as well as template/primer (T/P), nucleotide tri-phosphate (NTP) and pyrophosphate (PPi) bound FMDV 3D^pol (PDB, 1U09 and 2E9Z). The closest proximity of BRBV and FMDV 3D^pol as compared to human rhinovirus type 16 (HRV-16) and rabbit hemorrhagic disease virus (RHDV) 3D^pols is also substantiated by phylogeny analysis and root-mean square deviation (RMSD) between C-α traces of the polymerase structures. The absence of positively charged α-helix at C terminal, significant differences in non-covalent interactions especially salt bridges and CH-pi interactions around T/P channel of BRBV 3D^pol compared to FMDV 3D^pol, indicate that despite a very high homology to FMDV 3D^pol, BRBV 3D^pol may adopt a different mechanism for handling its substrates and adapting to physiological requirements. Our findings will be valuable in the design of structure-function interventions and identification of molecular targets for drug design applicable to Aphthovirus RdRps.

Collapse

Repertoire of Protein Kinases Encoded in the Genome of Takifugu rubripes. Comp Funct Genomics 2012;2012:258284. [PMID: 22666085 PMCID: PMC3359783 DOI: 10.1155/2012/258284] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2011] [Revised: 02/14/2012] [Accepted: 02/28/2012] [Indexed: 12/02/2022] Open

Kumar M, Ahmad S, Ahmad E, Saifi MA, Khan RH. In silico prediction and analysis of Caenorhabditis EF-hand containing proteins. PLoS One 2012;7:e36770. [PMID: 22701514 PMCID: PMC3360750 DOI: 10.1371/journal.pone.0036770] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2012] [Accepted: 04/12/2012] [Indexed: 01/12/2023] Open

Abstract

Calcium (Ca⁺²) is a ubiquitous messenger in eukaryotes including Caenorhabditis. Ca⁺²-mediated signalling processes are usually carried out through well characterized proteins like calmodulin (CaM) and other Ca⁺² binding proteins (CaBP). These proteins interact with different targets and activate it by bringing conformational changes. Majority of the EF-hand proteins in Caenorhabditis contain Ca⁺² binding motifs. Here, we have performed homology modelling of CaM-like proteins using the crystal structure of Drosophila melanogaster CaM as a template. Molecular docking was applied to explore the binding mechanism of CaM-like proteins and IQ1 motif which is a ∼25 residues and conform to the consensus sequence (I, L, V)QXXXRXXXX(R,K) to serve as a binding site for different EF hand proteins. We made an attempt to identify all the EF-hand (a helix-loop-helix structure characterized by a 12 residues loop sequence involved in metal coordination) containing proteins and their Ca⁺² binding affinity in Caenorhabditis by analysing the complete genome sequence. Docking studies revealed that F165, F169, L29, E33, F44, L57, M61, M96, M97, M108, G65, V115, F93, N104, E144 of CaM-like protein is involved in the interaction with IQ1 motif. A maximum of 170 EF-hand proteins and 39 non-EF-hand proteins with Ca⁺²/metal binding motif were identified. Diverse proteins including enzyme, transcription, translation and large number of unknown proteins have one or more putative EF-hands. Phylogenetic analysis revealed seven major classes/groups that contain some families of proteins. Various domains that we identified in the EF-hand proteins (uncharacterized) would help in elucidating their functions. It is the first report of its kind where calcium binding loop sequences of EF-hand proteins were analyzed to decipher their calcium affinities. Variation in Ca⁺²-binding affinity of EF-hand CaBP could be further used to study the behaviour of these proteins. Our analyses postulated that Ca⁺² is likely to be key player in Caenorhabditis cell signalling.

Collapse

González-Díaz H, Muíño L, Anadón AM, Romaris F, Prado-Prado FJ, Munteanu CR, Dorado J, Sierra AP, Mezo M, González-Warleta M, Gárate T, Ubeira FM. MISS-Prot: web server for self/non-self discrimination of protein residue networks in parasites; theory and experiments in Fasciola peptides and Anisakis allergens. MOLECULAR BIOSYSTEMS 2011;7:1938-55. [PMID: 21468430 DOI: 10.1039/c1mb05069a] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Abstract

Infections caused by human parasites (HPs) affect the poorest 500 million people worldwide but chemotherapy has become expensive, toxic, and/or less effective due to drug resistance. On the other hand, many 3D structures in Protein Data Bank (PDB) remain without function annotation. We need theoretical models to quickly predict biologically relevant Parasite Self Proteins (PSP), which are expressed differentially in a given parasite and are dissimilar to proteins expressed in other parasites and have a high probability to become new vaccines (unique sequence) or drug targets (unique 3D structure). We present herein a model for PSPs in eight different HPs (Ascaris, Entamoeba, Fasciola, Giardia, Leishmania, Plasmodium, Trypanosoma, and Toxoplasma) with 90% accuracy for 15 341 training and validation cases. The model combines protein residue networks, Markov Chain Models (MCM) and Artificial Neural Networks (ANN). The input parameters are the spectral moments of the Markov transition matrix for electrostatic interactions associated with the protein residue complex network calculated with the MARCH-INSIDE software. We implemented this model in a new web-server called MISS-Prot (MARCH-INSIDE Scores for Self-Proteins). MISS-Prot was programmed using PHP/HTML/Python and MARCH-INSIDE routines and is freely available at: . This server is easy to use by non-experts in Bioinformatics who can carry out automatic online upload and prediction with 3D structures deposited at PDB (mode 1). We can also study outcomes of Peptide Mass Fingerprinting (PMFs) and MS/MS for query proteins with unknown 3D structures (mode 2). We illustrated the use of MISS-Prot in experimental and/or theoretical studies of peptides from Fasciola hepatica cathepsin proteases or present on 10 Anisakis simplex allergens (Ani s 1 to Ani s 10). In doing so, we combined electrophoresis (1DE), MALDI-TOF Mass Spectroscopy, and MASCOT to seek sequences, Molecular Mechanics + Molecular Dynamics (MM/MD) to generate 3D structures and MISS-Prot to predict PSP scores. MISS-Prot also allows the prediction of PSP proteins in 16 additional species including parasite hosts, fungi pathogens, disease transmission vectors, and biotechnologically relevant organisms.

Collapse

Monson R, Foulds I, Foweraker J, Welch M, Salmond GPC. The Pseudomonas aeruginosa generalized transducing phage phiPA3 is a new member of the phiKZ-like group of 'jumbo' phages, and infects model laboratory strains and clinical isolates from cystic fibrosis patients. MICROBIOLOGY-SGM 2010;157:859-867. [PMID: 21163841 DOI: 10.1099/mic.0.044701-0] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]

Pandit SB, Brylinski M, Zhou H, Gao M, Arakaki AK, Skolnick J. PSiFR: an integrated resource for prediction of protein structure and function. Bioinformatics 2010;26:687-8. [PMID: 20080513 DOI: 10.1093/bioinformatics/btq006] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open

Classification of nonenzymatic homologues of protein kinases. Comp Funct Genomics 2009:365637. [PMID: 19809514 PMCID: PMC2754085 DOI: 10.1155/2009/365637] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2009] [Accepted: 07/01/2009] [Indexed: 11/17/2022] Open

Iwaniak A, Dziuba J. Analysis of Domains in Selected Plant and Animal Food Proteins - Precursors of Biologically Active Peptides - In Silico Approach. FOOD SCI TECHNOL INT 2009. [DOI: 10.1177/1082013208106320] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Lobley A, Sadowski MI, Jones DT. pGenTHREADER and pDomTHREADER: new methods for improved protein fold recognition and superfamily discrimination. ACTA ACUST UNITED AC 2009;25:1761-7. [PMID: 19429599 DOI: 10.1093/bioinformatics/btp302] [Citation(s) in RCA: 218] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Hawkins T, Chitale M, Luban S, Kihara D. PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data. Proteins 2009;74:566-82. [PMID: 18655063 DOI: 10.1002/prot.22172] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Abstract

Protein function prediction is a central problem in bioinformatics, increasing in importance recently due to the rapid accumulation of biological data awaiting interpretation. Sequence data represents the bulk of this new stock and is the obvious target for consideration as input, as newly sequenced organisms often lack any other type of biological characterization. We have previously introduced PFP (Protein Function Prediction) as our sequence-based predictor of Gene Ontology (GO) functional terms. PFP interprets the results of a PSI-BLAST search by extracting and scoring individual functional attributes, searching a wide range of E-value sequence matches, and utilizing conventional data mining techniques to fill in missing information. We have shown it to be effective in predicting both specific and low-resolution functional attributes when sufficient data is unavailable. Here we describe (1) significant improvements to the PFP infrastructure, including the addition of prediction significance and confidence scores, (2) a thorough benchmark of performance and comparisons to other related prediction methods, and (3) applications of PFP predictions to genome-scale data. We applied PFP predictions to uncharacterized protein sequences from 15 organisms. Among these sequences, 60-90% could be annotated with a GO molecular function term at high confidence (>or=80%). We also applied our predictions to the protein-protein interaction network of the Malaria plasmodium (Plasmodium falciparum). High confidence GO biological process predictions (>or=90%) from PFP increased the number of fully enriched interactions in this dataset from 23% of interactions to 94%. Our benchmark comparison shows significant performance improvement of PFP relative to GOtcha, InterProScan, and PSI-BLAST predictions. This is consistent with the performance of PFP as the overall best predictor in both the AFP-SIG '05 and CASP7 function (FN) assessments. PFP is available as a web service at http://dragon.bio.purdue.edu/pfp/.

Collapse

Anamika K, Bhattacharya A, Srinivasan N. Analysis of the protein kinome of Entamoeba histolytica. Proteins 2008;71:995-1006. [PMID: 18004777 DOI: 10.1002/prot.21790] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Abstract

Protein kinases play important roles in almost all major signaling and regulatory pathways of eukaryotic organisms. Members in the family of protein kinases make up a substantial fraction of eukaryotic proteome. Analysis of the protein kinase repertoire (kinome) would help in the better understanding of the regulatory processes. In this article, we report the identification and analysis of the repertoire of protein kinases in the intracellular parasite Entamoeba histolytica. Using a combination of various sensitive sequence search methods and manual analysis, we have identified a set of 307 protein kinases in E. histolytica genome. We have classified these protein kinases into different subfamilies originally defined by Hanks and Hunter and studied these kinases further in the context of noncatalytic domains that are tethered to catalytic kinase domain. Compared to other eukaryotic organisms, protein kinases from E. histolytica vary in terms of their domain organization and displays features that may have a bearing in the unusual biology of this organism. Some of the parasitic kinases show high sequence similarity in the catalytic domain region with calmodulin/calcium dependent protein kinase subfamily. However, they are unlikely to act like typical calcium/calmodulin dependent kinases as they lack noncatalytic domains characteristic of such kinases in other organisms. Such kinases form the largest subfamily of kinases in E. histolytica. Interestingly, a PKA/PKG-like subfamily member is tethered to pleckstrin homology domain. Although potential cyclins and cyclin-dependent kinases could be identified in the genome the likely absence of other cell cycle proteins suggests unusual nature of cell cycle in E. histolytica. Some of the unusual features recognized in our analysis include the absence of MEK as a part of the Mitogen Activated Kinase signaling pathway and identification of transmembrane region containing Src kinase-like kinases. Sequences which could not be classified into known subfamilies of protein kinases have unusual domain architectures. Many such unclassified protein kinases are tethered to domains which are Cysteine-rich and to domains known to be involved in protein-protein interactions. Our kinome analysis of E. histolytica suggests that the organism possesses a complex protein phosphorylation network that involves many unusual kinases.

Collapse

McGuffin LJ. Aligning sequences to structures. Methods Mol Biol 2008;413:61-90. [PMID: 18075162 DOI: 10.1007/978-1-59745-574-9_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/24/2023]

Reid AJ, Yeats C, Orengo CA. Methods of remote homology detection can be combined to increase coverage by 10% in the midnight zone. Bioinformatics 2007;23:2353-60. [PMID: 17709341 DOI: 10.1093/bioinformatics/btm355] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Scheeff ED, Bourne PE. Application of protein structure alignments to iterated hidden Markov model protocols for structure prediction. BMC Bioinformatics 2006;7:410. [PMID: 16970830 PMCID: PMC1622756 DOI: 10.1186/1471-2105-7-410] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2006] [Accepted: 09/14/2006] [Indexed: 11/30/2022] Open

Abstract

Background

One of the most powerful methods for the prediction of protein structure from sequence information alone is the iterative construction of profile-type models. Because profiles are built from sequence alignments, the sequences included in the alignment and the method used to align them will be important to the sensitivity of the resulting profile. The inclusion of highly diverse sequences will presumably produce a more powerful profile, but distantly related sequences can be difficult to align accurately using only sequence information. Therefore, it would be expected that the use of protein structure alignments to improve the selection and alignment of diverse sequence homologs might yield improved profiles. However, the actual utility of such an approach has remained unclear.

Results

We explored several iterative protocols for the generation of profile hidden Markov models. These protocols were tailored to allow the inclusion of protein structure alignments in the process, and were used for large-scale creation and benchmarking of structure alignment-enhanced models. We found that models using structure alignments did not provide an overall improvement over sequence-only models for superfamily-level structure predictions. However, the results also revealed that the structure alignment-enhanced models were complimentary to the sequence-only models, particularly at the edge of the "twilight zone". When the two sets of models were combined, they provided improved results over sequence-only models alone. In addition, we found that the beneficial effects of the structure alignment-enhanced models could not be realized if the structure-based alignments were replaced with sequence-based alignments. Our experiments with different iterative protocols for sequence-only models also suggested that simple protocol modifications were unable to yield equivalent improvements to those provided by the structure alignment-enhanced models. Finally, we found that models using structure alignments provided fold-level structure assignments that were superior to those produced by sequence-only models.

Conclusion

When attempting to predict the structure of remote homologs, we advocate a combined approach in which both traditional models and models incorporating structure alignments are used.

Collapse

Li J, Wang W. Detailed assessment of homology detection using different substitution matrices. CHINESE SCIENCE BULLETIN-CHINESE 2006. [DOI: 10.1007/s11434-006-1538-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Fleming K, Kelley LA, Islam SA, MacCallum RM, Muller A, Pazos F, Sternberg MJ. The proteome: structure, function and evolution. Philos Trans R Soc Lond B Biol Sci 2006;361:441-51. [PMID: 16524832 PMCID: PMC1609342 DOI: 10.1098/rstb.2005.1802] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Jiménez JL. Does structural and chemical divergence play a role in precluding undesirable protein interactions? Proteins 2006;59:757-64. [PMID: 15822102 DOI: 10.1002/prot.20448] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Abeln S, Deane CM. Fold usage on genomes and protein fold evolution. Proteins 2006;60:690-700. [PMID: 16001400 DOI: 10.1002/prot.20506] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Gowri VS, Krishnadev O, Swamy CS, Srinivasan N. MulPSSM: a database of multiple position-specific scoring matrices of protein domain families. Nucleic Acids Res 2006;34:D243-6. [PMID: 16381855 PMCID: PMC1347406 DOI: 10.1093/nar/gkj043] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Srinivasan N, Krupa A. A genomic perspective of protein kinases in Plasmodium falciparum. Proteins 2006;58:180-9. [PMID: 15515182 DOI: 10.1002/prot.20278] [Citation(s) in RCA: 112] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Sandhya S, Chakrabarti S, Abhinandan KR, Sowdhamini R, Srinivasan N. Assessment of a rigorous transitive profile based search method to detect remotely similar proteins. J Biomol Struct Dyn 2005;23:283-98. [PMID: 16218755 DOI: 10.1080/07391102.2005.10507066] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

Chandonia JM, Kim SH, Brenner SE. Target selection and deselection at the Berkeley Structural Genomics Center. Proteins 2005;62:356-70. [PMID: 16276528 DOI: 10.1002/prot.20674] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Abstract

At the Berkeley Structural Genomics Center (BSGC), our goal is to obtain a near-complete structural complement of proteins in the minimal organisms Mycoplasma genitalium and M. pneumoniae, two closely related pathogens. Current targets for structure determination have been selected in six major stages, starting with those predicted to be most tractable to high throughput study and likely to yield new structural information. We report on the process used to select these proteins, as well as our target deselection procedure. Target deselection reduces experimental effort by eliminating targets similar to those recently solved by the structural biology community or other centers. We measure the impact of the 69 structures solved at the BSGC as of July 2004 on structure prediction coverage of the M. pneumoniae and M. genitalium proteomes. The number of Mycoplasma proteins for which the fold could first be reliably assigned based on structures solved at the BSGC (24 M. pneumoniae and 21 M. genitalium) is approximately 25% of the total resulting from work at all structural genomics centers and the worldwide structural biology community (94 M. pneumoniae and 86 M. genitalium) during the same period. As the number of structures contributed by the BSGC during that period is less than 1% of the total worldwide output, the benefits of a focused target selection strategy are apparent. If the structures of all current targets were solved, the percentage of M. pneumoniae proteins for which folds could be reliably assigned would increase from approximately 57% (391 of 687) at present to around 80% (550 of 687), and the percentage of the proteome that could be accurately modeled would increase from around 37% (254 of 687) to about 64% (438 of 687). In M. genitalium, the percentage of the proteome that could be structurally annotated based on structures of our remaining targets would rise from 72% (348 of 486) to around 76% (371 of 486), with the percentage of accurately modeled proteins would rise from 50% (243 of 486) to 58% (283 of 486). Sequences and data on experimental progress on our targets are available in the public databases TargetDB and PEPCdb.

Collapse

Shachar O, Linial M. A robust method to detect structural and functional remote homologues. Proteins 2005;57:531-8. [PMID: 15382232 DOI: 10.1002/prot.20235] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Scheeff ED, Bourne PE. Structural evolution of the protein kinase-like superfamily. PLoS Comput Biol 2005;1:e49. [PMID: 16244704 PMCID: PMC1261164 DOI: 10.1371/journal.pcbi.0010049] [Citation(s) in RCA: 189] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2005] [Accepted: 09/08/2005] [Indexed: 11/19/2022] Open

Abstract

The protein kinase family is large and important, but it is only one family in a larger superfamily of homologous kinases that phosphorylate a variety of substrates and play important roles in all three superkingdoms of life. We used a carefully constructed structural alignment of selected kinases as the basis for a study of the structural evolution of the protein kinase-like superfamily. The comparison of structures revealed a "universal core" domain consisting only of regions required for ATP binding and the phosphotransfer reaction. Remarkably, even within the universal core some kinase structures display notable changes, while still retaining essential activity. Hence, the protein kinase-like superfamily has undergone substantial structural and sequence revision over long evolutionary timescales. We constructed a phylogenetic tree for the superfamily using a novel approach that allowed for the combination of sequence and structure information into a unified quantitative analysis. When considered against the backdrop of species distribution and other metrics, our tree provides a compelling scenario for the development of the various kinase families from a shared common ancestor. We propose that most of the so-called "atypical kinases" are not intermittently derived from protein kinases, but rather diverged early in evolution to form a distinct phyletic group. Within the atypical kinases, the aminoglycoside and choline kinase families appear to share the closest relationship. These two families in turn appear to be the most closely related to the protein kinase family. In addition, our analysis suggests that the actin-fragmin kinase, an atypical protein kinase, is more closely related to the phosphoinositide-3 kinase family than to the protein kinase family. The two most divergent families, alpha-kinases and phosphatidylinositol phosphate kinases (PIPKs), appear to have distinct evolutionary histories. While the PIPKs probably have an evolutionary relationship with the rest of the kinase superfamily, the relationship appears to be very distant (and perhaps indirect). Conversely, the alpha-kinases appear to be an exception to the scenario of early divergence for the atypical kinases: they apparently arose relatively recently in eukaryotes. We present possible scenarios for the derivation of the alpha-kinases from an extant kinase fold.

Collapse

Krupa A, Srinivasan N. Diversity in domain architectures of Ser/Thr kinases and their homologues in prokaryotes. BMC Genomics 2005;6:129. [PMID: 16171520 PMCID: PMC1262709 DOI: 10.1186/1471-2164-6-129] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2004] [Accepted: 09/19/2005] [Indexed: 11/17/2022] Open

Abstract

Background

Ser/Thr/Tyr kinases (STYKs) commonly found in eukaryotes have been recently reported in many bacterial species. Recent studies elucidating their cellular functions have established their roles in bacterial growth and development. However functions of a large number of bacterial STYKs still remain elusive. The organisation of domains in a large dataset of bacterial STYKs has been investigated here in order to recognise variety in domain combinations which determine functions of bacterial STYKs.

Results

Using sensitive sequence and profile search methods, domain organisation of over 600 STYKs from 125 prokaryotic genomes have been examined. Kinase catalytic domains of STYKs tethered to a wide range of enzymatic domains such as phosphatases, HSP70, peptidyl prolyl isomerases, pectin esterases and glycoproteases have been identified. Such distinct preferences for domain combinations are not known to be present in either the Histidine kinase or the eukaryotic STYK families. Domain organisation of STYKs specific to certain groups of bacteria has also been noted in the current anlaysis. For example, Hydrophobin like domains in Mycobacterial STYK and penicillin binding domains in few STYKs of Gram-positive organisms and FHA domains in cyanobacterial STYKs. Homologues of characterised substrates of prokaryotic STYKs have also been identified.

Conclusion

The domains and domain architectures of most of the bacterial STYKs identified are very different from the known domain organisation in STYKs of eukaryotes. This observation highlights distinct biological roles of bacterial STYKs compared to eukaryotic STYKs. Bacterial STYKs reveal high diversity in domain organisation. Some of the modular organisations conserved across diverse bacterial species suggests their central role in bacterial physiology. Unique domain architectures of few other groups of STYKs reveal recruitment of functions specific to the species.

Collapse

Johnston CR, Shields DC. A sequence sub-sampling algorithm increases the power to detect distant homologues. Nucleic Acids Res 2005;33:3772-8. [PMID: 16006623 PMCID: PMC1174907 DOI: 10.1093/nar/gki687] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Sillitoe I, Dibley M, Bray J, Addou S, Orengo C. Assessing strategies for improved superfamily recognition. Protein Sci 2005;14:1800-10. [PMID: 15937274 PMCID: PMC2253352 DOI: 10.1110/ps.041056105] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Anand B, Gowri VS, Srinivasan N. Use of multiple profiles corresponding to a sequence alignment enables effective detection of remote homologues. Bioinformatics 2005;21:2821-6. [PMID: 15817691 DOI: 10.1093/bioinformatics/bti432] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Pallen MJ, Beatson SA, Bailey CM. Bioinformatics analysis of the locus for enterocyte effacement provides novel insights into type-III secretion. BMC Microbiol 2005;5:9. [PMID: 15757514 PMCID: PMC1084347 DOI: 10.1186/1471-2180-5-9] [Citation(s) in RCA: 98] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2004] [Accepted: 03/09/2005] [Indexed: 12/17/2022] Open

Kaplan N, Linial M. Automatic detection of false annotations via binary property clustering. BMC Bioinformatics 2005;6:46. [PMID: 15755318 PMCID: PMC555558 DOI: 10.1186/1471-2105-6-46] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2004] [Accepted: 03/08/2005] [Indexed: 11/10/2022] Open

Namboori S, Mhatre N, Sujatha S, Srinivasan N, Pandit SB. Enhanced functional and structural domain assignments using remote similarity detection procedures for proteins encoded in the genome of Mycobacterium tuberculosis H37Rv. J Biosci 2005;29:245-59. [PMID: 15381846 DOI: 10.1007/bf02702607] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

Zhang Z, Kochhar S, Grigorov MG. Descriptor-based protein remote homology identification. Protein Sci 2005;14:431-44. [PMID: 15632283 PMCID: PMC2253398 DOI: 10.1110/ps.041035505] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]

Stevens FJ. Efficient recognition of protein fold at low sequence identity by conservative application of Psi-BLAST: validation. J Mol Recognit 2005;18:139-49. [PMID: 15558595 DOI: 10.1002/jmr.721] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Marti-Renom MA, Madhusudhan MS, Sali A. Alignment of protein sequences by their profiles. Protein Sci 2004;13:1071-87. [PMID: 15044736 PMCID: PMC2280052 DOI: 10.1110/ps.03379804] [Citation(s) in RCA: 130] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]

Mayor LR, Fleming KP, Müller A, Balding DJ, Sternberg MJE. Clustering of protein domains in the human genome. J Mol Biol 2004;340:991-1004. [PMID: 15236962 DOI: 10.1016/j.jmb.2004.05.036] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2003] [Revised: 03/30/2004] [Accepted: 05/17/2004] [Indexed: 11/30/2022]

Kihara D, Skolnick J. Microbial genomes have over 72% structure assignment by the threading algorithm PROSPECTOR_Q. Proteins 2004;55:464-73. [PMID: 15048836 DOI: 10.1002/prot.20044] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]

Chakhaiyar P, Hasnain SE. Defining the mandate of tuberculosis research in a postgenomic era. Med Princ Pract 2004;13:177-84. [PMID: 15181320 DOI: 10.1159/000078312] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/15/2003] [Accepted: 02/07/2004] [Indexed: 11/19/2022] Open

Ranea JAG, Buchan DWA, Thornton JM, Orengo CA. Evolution of protein superfamilies and bacterial genome size. J Mol Biol 2004;336:871-87. [PMID: 15095866 DOI: 10.1016/j.jmb.2003.12.044] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2003] [Revised: 12/11/2003] [Accepted: 12/12/2003] [Indexed: 10/26/2022]

Abstract

We present the structural annotation of 56 different bacterial species based on the assignment of genes to 816 evolutionary superfamilies in the CATH domain structure database. These assignments have enabled us to analyse the recurrence of specific superfamilies within and across the genomes. We have selected the superfamilies that have a very broad representation and therefore appear to be universally distributed in a significant number of bacterial lineages. Occurrence profiles of these universally distributed superfamilies are compared with genome size in order to estimate the correlation between superfamily duplication and the increase in proteome size. This distinguishes between those size-dependent superfamilies where frequency of occurrence is highly correlated with increase in genome size, and size-independent superfamilies where no correlation is observed. Consideration of the size correlation and the ratio between the mean and the standard deviations for all the superfamily profiles allows more detailed subdivisions and classification of superfamilies. For example, within the size-independent superfamilies, we distinguished a group that are distributed evenly amongst all the genomes. Within the size-dependent superfamilies we differentiated two groups: linearly distributed and non-linearly distributed. Functional annotation using the COG database was performed for all superfamilies in each of these groups, and this revealed significant differences amongst the three sets of superfamilies. Evenly distributed, size-independent domains are shown to be involved primarily in protein translation and biosynthesis. For the size-dependent superfamilies, linearly distributed superfamilies are involved mainly in metabolism, and non-linearly distributed superfamily domains are involved principally in gene regulation.

Collapse