1
|
Proj M, De Jonghe S, Van Loy T, Jukič M, Meden A, Ciber L, Podlipnik Č, Grošelj U, Konc J, Schols D, Gobec S. A Set of Experimentally Validated Decoys for the Human CC Chemokine Receptor 7 (CCR7) Obtained by Virtual Screening. Front Pharmacol 2022; 13:855653. [PMID: 35370691 PMCID: PMC8972196 DOI: 10.3389/fphar.2022.855653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Accepted: 02/28/2022] [Indexed: 11/21/2022] Open
Abstract
We present a state-of-the-art virtual screening workflow aiming at the identification of novel CC chemokine receptor 7 (CCR7) antagonists. Although CCR7 is associated with a variety of human diseases, such as immunological disorders, inflammatory diseases, and cancer, this target is underexplored in drug discovery and there are no potent and selective CCR7 small molecule antagonists available today. Therefore, computer-aided ligand-based, structure-based, and joint virtual screening campaigns were performed. Hits from these virtual screenings were tested in a CCL19-induced calcium signaling assay. After careful evaluation, none of the in silico hits were confirmed to have an antagonistic effect on CCR7. Hence, we report here a valuable set of 287 inactive compounds that can be used as experimentally validated decoys.
Collapse
Affiliation(s)
- Matic Proj
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, University of Ljubljana, Ljubljana, Slovenia
| | - Steven De Jonghe
- Laboratory of Virology and Chemotherapy, Department of Microbiology, Immunology and Transplantation, Rega Institute for Medical Research, KU Leuven, Leuven, Belgium
| | - Tom Van Loy
- Laboratory of Virology and Chemotherapy, Department of Microbiology, Immunology and Transplantation, Rega Institute for Medical Research, KU Leuven, Leuven, Belgium
| | - Marko Jukič
- Faculty of Chemistry and Chemical Engineering, Laboratory of Physical Chemistry and Chemical Thermodynamics, University of Maribor, Maribor, Slovenia.,Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Koper, Slovenia
| | - Anže Meden
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, University of Ljubljana, Ljubljana, Slovenia
| | - Luka Ciber
- Faculty of Chemistry and Chemical Technology, University of Ljubljana, Ljubljana, Slovenia
| | - Črtomir Podlipnik
- Faculty of Chemistry and Chemical Technology, University of Ljubljana, Ljubljana, Slovenia
| | - Uroš Grošelj
- Faculty of Chemistry and Chemical Technology, University of Ljubljana, Ljubljana, Slovenia
| | - Janez Konc
- National Institute of Chemistry, Ljubljana, Slovenia
| | - Dominique Schols
- Laboratory of Virology and Chemotherapy, Department of Microbiology, Immunology and Transplantation, Rega Institute for Medical Research, KU Leuven, Leuven, Belgium
| | - Stanislav Gobec
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, University of Ljubljana, Ljubljana, Slovenia
| |
Collapse
|
2
|
Gao M, Lund-Andersen P, Morehead A, Mahmud S, Chen C, Chen X, Giri N, Roy RS, Quadir F, Effler TC, Prout R, Abraham S, Elwasif W, Haas NQ, Skolnick J, Cheng J, Sedova A. High-Performance Deep Learning Toolbox for Genome-Scale Prediction of Protein Structure and Function. WORKSHOP ON MACHINE LEARNING IN HPC ENVIRONMENTS. WORKSHOP ON MACHINE LEARNING IN HPC ENVIRONMENTS 2021; 2021:46-57. [PMID: 35112110 PMCID: PMC8802329 DOI: 10.1109/mlhpc54614.2021.00010] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Computational biology is one of many scientific disciplines ripe for innovation and acceleration with the advent of high-performance computing (HPC). In recent years, the field of machine learning has also seen significant benefits from adopting HPC practices. In this work, we present a novel HPC pipeline that incorporates various machine-learning approaches for structure-based functional annotation of proteins on the scale of whole genomes. Our pipeline makes extensive use of deep learning and provides computational insights into best practices for training advanced deep-learning models for high-throughput data such as proteomics data. We showcase methodologies our pipeline currently supports and detail future tasks for our pipeline to envelop, including large-scale sequence comparison using SAdLSA and prediction of protein tertiary structures using AlphaFold2.
Collapse
Affiliation(s)
- Mu Gao
- Georgia Institute of Technology, Atlanta, GA
| | | | | | | | - Chen Chen
- University of Missouri, Columbia, MO
| | - Xiao Chen
- University of Missouri, Columbia, MO
| | | | | | | | | | - Ryan Prout
- Oak Ridge National Laboratory, Oak Ridge, TN
| | | | | | | | | | | | - Ada Sedova
- Oak Ridge National Laboratory, Oak Ridge, TN
| |
Collapse
|
3
|
Gao M, Skolnick J. A novel sequence alignment algorithm based on deep learning of the protein folding code. Bioinformatics 2021; 37:490-496. [PMID: 32960943 PMCID: PMC8599902 DOI: 10.1093/bioinformatics/btaa810] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Revised: 08/11/2020] [Accepted: 09/08/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION From evolutionary interference, function annotation to structural prediction, protein sequence comparison has provided crucial biological insights. While many sequence alignment algorithms have been developed, existing approaches often cannot detect hidden structural relationships in the 'twilight zone' of low sequence identity. To address this critical problem, we introduce a computational algorithm that performs protein Sequence Alignments from deep-Learning of Structural Alignments (SAdLSA, silent 'd'). The key idea is to implicitly learn the protein folding code from many thousands of structural alignments using experimentally determined protein structures. RESULTS To demonstrate that the folding code was learned, we first show that SAdLSA trained on pure α-helical proteins successfully recognizes pairs of structurally related pure β-sheet protein domains. Subsequent training and benchmarking on larger, highly challenging datasets show significant improvement over established approaches. For challenging cases, SAdLSA is ∼150% better than HHsearch for generating pairwise alignments and ∼50% better for identifying the proteins with the best alignments in a sequence library. The time complexity of SAdLSA is O(N) thanks to GPU acceleration. AVAILABILITY AND IMPLEMENTATION Datasets and source codes of SAdLSA are available free of charge for academic users at http://sites.gatech.edu/cssb/sadlsa/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mu Gao
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
4
|
Kargar F, Savardashtaki A, Mortazavi M, Mahani MT, Amani AM, Ghasemi Y, Nezafat N. In SilicoStudy of 1, 4 Alpha Glucan Branching Enzyme and Substrate Docking Studies. CURR PROTEOMICS 2020. [DOI: 10.2174/1570164616666190401204009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:The 1,4-alpha-glucan branching protein (GlgB) plays an important role in the glycogen biosynthesis and the deficiency in this enzyme has resulted in Glycogen storage disease and accumulation of an amylopectin-like polysaccharide. Consequently, this enzyme was considered a special topic in clinical and biotechnological research. One of the newly introduced GlgB belongs to the Neisseria sp. HMSC071A01 (Ref.Seq. WP_049335546). For in silico analysis, the 3D molecular modeling of this enzyme was conducted in the I-TASSER web server.Methods:For a better evaluation, the important characteristics of this enzyme such as functional properties, metabolic pathway and activity were investigated in the TargetP software. Additionally, the phylogenetic tree and secondary structure of this enzyme were studied by Mafft and Prabi software, respectively. Finally, the binding site properties (the maltoheptaose as substrate) were studied using the AutoDock Vina.Results:By drawing the phylogenetic tree, the closest species were the taxonomic group of Betaproteobacteria. The results showed that the structure of this enzyme had 34.45% of the alpha helix and 45.45% of the random coil. Our analysis predicted that this enzyme has a potential signal peptide in the protein sequence.Conclusion:By these analyses, a new understanding was developed related to the sequence and structure of this enzyme. Our findings can further be used in some fields of clinical and industrial biotechnology.
Collapse
Affiliation(s)
- Farzane Kargar
- Department of Biotechnology, Institute of Science and High Technology and Environmental Science, Graduate University of Advanced Technology, Kerman, Iran
| | - Amir Savardashtaki
- Department of Medical Biotechnology, School of Advanced Medical Sciences and Technologies Shiraz University of Medical Sciences Shiraz, Iran
| | - Mojtaba Mortazavi
- Department of Biotechnology, Institute of Science and High Technology and Environmental Science, Graduate University of Advanced Technology, Kerman, Iran
| | - Masoud Torkzadeh Mahani
- Department of Biotechnology, Institute of Science and High Technology and Environmental Science, Graduate University of Advanced Technology, Kerman, Iran
| | - Ali Mohammad Amani
- Department of Medical Nanotechnology, School of Advanced Medical Sciences and Technologies, Shiraz University of Medical Sciences, Shiraz, 71348- 14336, Iran
| | - Younes Ghasemi
- Department of Medical Biotechnology, School of Advanced Medical Sciences and Technologies Shiraz University of Medical Sciences Shiraz, Iran
| | - Navid Nezafat
- Pharmaceutical Sciences Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
| |
Collapse
|
5
|
Hydrogen-Cycling during Solventogenesis in Clostridium acetobutylicum American Type Culture Collection (ATCC) 824 Requires the [NiFe]-Hydrogenase for Energy Conservation. FERMENTATION 2018. [DOI: 10.3390/fermentation4030055] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Clostridium acetobutylicum has traditionally been used for production of acetone, butanol, and ethanol (ABE). Butanol is a commodity chemical due in part to its suitability as a biofuel; however, the current yield of this product from biological systems is not economically feasible as an alternative fuel source. Understanding solvent phase physiology, solvent tolerance, and their genetic underpinning is key for future strain optimization of the bacterium. This study shows the importance of a [NiFe]-hydrogenase in solvent phase physiology. C. acetobutylicum genes ca_c0810 and ca_c0811, annotated as a HypF and HypD maturation factor, were found to be required for [NiFe]-hydrogenase activity. They were shown to be part of a polycistronic operon with other hyp genes. Hydrogenase activity assays of the ΔhypF/hypD mutant showed an almost complete inactivation of the [NiFe]-hydrogenase. Metabolic studies comparing ΔhypF/hypD and wild type (WT) strains in planktonic and sessile conditions indicated the hydrogenase was important for solvent phase metabolism. For the mutant, reabsorption of acetate and butyrate was inhibited during solventogenesis in planktonic cultures, and less ABE was produced. During sessile growth, the ΔhypF/hypD mutant had higher initial acetone: butanol ratios, which is consistent with the inability to obtain reduced cofactors via H2 uptake. In sessile conditions, the ΔhypF/hypD mutant was inhibited in early solventogenesis, but it appeared to remodel its metabolism and produced mainly butanol in late solventogenesis without the uptake of acids. Energy filtered transmission electron microscopy (EFTEM) mapped Pd(II) reduction via [NiFe]-hydrogenase induced H2 oxidation at the extracelluar side of the membrane on WT cells. A decrease of Pd(0) deposits on ΔhypF/hypD comparatively to WT indicates that the [NiFe]-hydrogenase contributed to the Pd(II) reduction. Calculations of reaction potentials during acidogenesis and solventogenesis predict the [NiFe]-hydrogenase can couple NAD+ reduction with membrane transport of electrons. Extracellular oxidation of H2 combined with the potential for electron transport across the membrane indicate that the [NiFe}-hydrogenase contributes to proton motive force maintenance via hydrogen cycling.
Collapse
|
6
|
Skariyachan S. Exploring the Potential of Herbal Ligands Toward Multidrug-Resistant Bacterial Pathogens by Computational Drug Discovery. TRANSLATIONAL BIOINFORMATICS AND ITS APPLICATION 2017. [DOI: 10.1007/978-94-024-1045-7_4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
7
|
Saripella GV, Sonnhammer ELL, Forslund K. Benchmarking the next generation of homology inference tools. Bioinformatics 2016; 32:2636-41. [PMID: 27256311 PMCID: PMC5013910 DOI: 10.1093/bioinformatics/btw305] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2015] [Accepted: 05/05/2016] [Indexed: 12/21/2022] Open
Abstract
Motivation: Over the last decades, vast numbers of sequences were deposited in public databases. Bioinformatics tools allow homology and consequently functional inference for these sequences. New profile-based homology search tools have been introduced, allowing reliable detection of remote homologs, but have not been systematically benchmarked. To provide such a comparison, which can guide bioinformatics workflows, we extend and apply our previously developed benchmark approach to evaluate the ‘next generation’ of profile-based approaches, including CS-BLAST, HHSEARCH and PHMMER, in comparison with the non-profile based search tools NCBI-BLAST, USEARCH, UBLAST and FASTA. Method: We generated challenging benchmark datasets based on protein domain architectures within either the PFAM + Clan, SCOP/Superfamily or CATH/Gene3D domain definition schemes. From each dataset, homologous and non-homologous protein pairs were aligned using each tool, and standard performance metrics calculated. We further measured congruence of domain architecture assignments in the three domain databases. Results: CSBLAST and PHMMER had overall highest accuracy. FASTA, UBLAST and USEARCH showed large trade-offs of accuracy for speed optimization. Conclusion: Profile methods are superior at inferring remote homologs but the difference in accuracy between methods is relatively small. PHMMER and CSBLAST stand out with the highest accuracy, yet still at a reasonable computational cost. Additionally, we show that less than 0.1% of Swiss-Prot protein pairs considered homologous by one database are considered non-homologous by another, implying that these classifications represent equivalent underlying biological phenomena, differing mostly in coverage and granularity. Availability and Implementation: Benchmark datasets and all scripts are placed at (http://sonnhammer.org/download/Homology_benchmark). Contact:forslund@embl.de Supplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ganapathi Varma Saripella
- Science for Life Laboratory, Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Stockholm SE-10691, Sweden
| | - Erik L L Sonnhammer
- Science for Life Laboratory, Stockholm Bioinformatics Center, Department of Biochemistry and Biophysics, Stockholm University, Stockholm SE-10691, Sweden
| | - Kristoffer Forslund
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, Heidelberg 69117, Germany
| |
Collapse
|
8
|
Žváček C, Friedrichs G, Heizinger L, Merkl R. An assessment of catalytic residue 3D ensembles for the prediction of enzyme function. BMC Bioinformatics 2015; 16:359. [PMID: 26538500 PMCID: PMC4634577 DOI: 10.1186/s12859-015-0807-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Accepted: 10/29/2015] [Indexed: 12/03/2022] Open
Abstract
Background The central element of each enzyme is the catalytic site, which commonly catalyzes a single biochemical reaction with high specificity. It was unclear to us how often sites that catalyze the same or highly similar reactions evolved on different, i. e. non-homologous protein folds and how similar their 3D poses are. Both similarities are key criteria for assessing the usability of pose comparison for function prediction. Results We have analyzed the SCOP database on the superfamily level in order to estimate the number of non-homologous enzymes possessing the same function according to their EC number. 89 % of the 873 substrate-specific functions (four digit EC number) assigned to mono-functional, single-domain enzymes were only found in one superfamily. For a reaction-specific grouping (three digit EC number), this value dropped to 35 %, indicating that in approximately 65 % of all enzymes the same function evolved in two or more non-homologous proteins. For these isofunctional enzymes, structural similarity of the catalytic sites may help to predict function, because neither high sequence similarity nor identical folds are required for a comparison. To assess the specificity of catalytic 3D poses, we compiled the redundancy-free set ENZ_SITES, which comprises 695 sites, whose composition and function are well-defined. We compared their poses with the help of the program Superpose3D and determined classification performance. If the sites were from different superfamilies, the number of true and false positive predictions was similarly high, both for a coarse and a detailed grouping of enzyme function. Moreover, classification performance did not improve drastically, if we additionally used homologous sites to predict function. Conclusions For a large number of enzymatic functions, dissimilar sites evolved that catalyze the same reaction and it is the individual substrate that determines the arrangement of the catalytic site and its local environment. These substrate-specific requirements turn the comparison of catalytic residues into a weak classifier for the prediction of enzyme function. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0807-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Clemens Žváček
- Faculty of Mathematics and Computer Science, University of Hagen, D-58084, Hagen, Germany.
| | - Gerald Friedrichs
- Faculty of Mathematics and Computer Science, University of Hagen, D-58084, Hagen, Germany.
| | - Leonhard Heizinger
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, D-93040, Regensburg, Germany.
| | - Rainer Merkl
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, D-93040, Regensburg, Germany.
| |
Collapse
|
9
|
Ghouzam Y, Postic G, de Brevern AG, Gelly JC. Improving protein fold recognition with hybrid profiles combining sequence and structure evolution. Bioinformatics 2015; 31:3782-9. [PMID: 26254434 DOI: 10.1093/bioinformatics/btv462] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2015] [Accepted: 08/02/2015] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Template-based modeling, the most successful approach for predicting protein 3D structure, often requires detecting distant evolutionary relationships between the target sequence and proteins of known structure. Developed for this purpose, fold recognition methods use elaborate strategies to exploit evolutionary information, mainly by encoding amino acid sequence into profiles. Since protein structure is more conserved than sequence, the inclusion of structural information can improve the detection of remote homology. RESULTS Here, we present ORION, a new fold recognition method based on the pairwise comparison of hybrid profiles that contain evolutionary information from both protein sequence and structure. Our method uses the 16-state structural alphabet Protein Blocks, which provides an accurate 1D description of protein structure local conformations. ORION systematically outperforms PSI-BLAST and HHsearch on several benchmarks, including target sequences from the modeling competitions CASP8, 9 and 10, and detects ∼10% more templates at fold and superfamily SCOP levels. AVAILABILITY Software freely available for download at http://www.dsimb.inserm.fr/orion/. CONTACT jean-christophe.gelly@univ-paris-diderot.fr. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yassine Ghouzam
- Inserm U1134, Paris, France, Université Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, Paris, France, Institut National de la Transfusion Sanguine, Paris, France and Laboratory of Excellence GR-Ex, Paris, France
| | - Guillaume Postic
- Inserm U1134, Paris, France, Université Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, Paris, France, Institut National de la Transfusion Sanguine, Paris, France and Laboratory of Excellence GR-Ex, Paris, France
| | - Alexandre G de Brevern
- Inserm U1134, Paris, France, Université Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, Paris, France, Institut National de la Transfusion Sanguine, Paris, France and Laboratory of Excellence GR-Ex, Paris, France
| | - Jean-Christophe Gelly
- Inserm U1134, Paris, France, Université Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, Paris, France, Institut National de la Transfusion Sanguine, Paris, France and Laboratory of Excellence GR-Ex, Paris, France
| |
Collapse
|
10
|
Wagner I, Volkmer M, Sharan M, Villaveces JM, Oswald F, Surendranath V, Habermann BH. morFeus: a web-based program to detect remotely conserved orthologs using symmetrical best hits and orthology network scoring. BMC Bioinformatics 2014; 15:263. [PMID: 25096057 PMCID: PMC4137093 DOI: 10.1186/1471-2105-15-263] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2014] [Accepted: 07/21/2014] [Indexed: 02/04/2023] Open
Abstract
Background Searching the orthologs of a given protein or DNA sequence is one of the most important and most commonly used Bioinformatics methods in Biology. Programs like BLAST or the orthology search engine Inparanoid can be used to find orthologs when the similarity between two sequences is sufficiently high. They however fail when the level of conservation is low. The detection of remotely conserved proteins oftentimes involves sophisticated manual intervention that is difficult to automate. Results Here, we introduce morFeus, a search program to find remotely conserved orthologs. Based on relaxed sequence similarity searches, morFeus selects sequences based on the similarity of their alignments to the query, tests for orthology by iterative reciprocal BLAST searches and calculates a network score for the resulting network of orthologs that is a measure of orthology independent of the E-value. Detecting remotely conserved orthologs of a protein using morFeus thus requires no manual intervention. We demonstrate the performance of morFeus by comparing it to state-of-the-art orthology resources and methods. We provide an example of remotely conserved orthologs, which were experimentally shown to be functionally equivalent in the respective organisms and therefore meet the criteria of the orthology-function conjecture. Conclusions Based on our results, we conclude that morFeus is a powerful and specific search method for detecting remotely conserved orthologs. morFeus is freely available at http://bio.biochem.mpg.de/morfeus/. Its source code is available from Sourceforge.net (https://sourceforge.net/p/morfeus/). Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-263) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Bianca H Habermann
- Max Planck Institute of Biochemistry, Am Klopferspitz 18, Martinsried 82152, Germany.
| |
Collapse
|
11
|
Jagilinki BP, Gadewal N, Mehta H, Mahadik H, Pandey V, Sawant U, A Wadegaonkar P, Goyal P, Kumar S, K Varma A. Conserved residues at the MAPKs binding interfaces that regulate transcriptional machinery. J Biomol Struct Dyn 2014; 33:852-60. [PMID: 24739067 DOI: 10.1080/07391102.2014.915764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Signaling through c-Raf downstream pathways is the crucial subject of extensive studies because over expressed or mutated genes in this pathway lead to a variety of human cancers. On the basis of cellular localization, this pathway has been sub-divided into two cascades. The first RAF1-MEK1-ERK2 cascade which remains in the cytosol, whereas the second MEK1-ERK2-RSKs transduces into the nucleus and regulates the transactivation function. But how a few amino acids critically regulate the transcriptional function remains unclear. In this paper, we have performed in silico studies to unravel how atomic complexities at the MEK1-ERK2-RSKs pathways intercedes different functional responses. The secondary structure of the ERK, RSKs have been modeled using Jpred3, PSI-PHRED, protein modeler, and Integrated sequence analyzer from Discovery Studio software. Peptides of RSKs isozymes (RSK1/2/3/4) were built and docked on ERK2 structure using ZDOCK module. The hydropathy index for the RSKs molecules was determined using the KYTE-DOOLITTLE plot. The simulations of complex molecules were carried out using a CHARMM force field. The protein-protein interactions (PPIs) in different cascade of MAP kinase (MAPK) have been shown to be similar to those predicted in vivo. PPIs elucidate that the amino acids located at the conserved domains of MAPK pathways are responsible for transactivation functions.
Collapse
Affiliation(s)
- Bhanu P Jagilinki
- a Tata Memorial Centre, Advanced Centre for Treatment, Research and Education in Cancer , Kharghar, Navi Mumbai 410 210 , Maharashtra , India
| | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Mishra S, Saxena A, Sangwan RS. Fundamentals of Homology Modeling Steps and Comparison among Important Bioinformatics Tools: An Overview. ACTA ACUST UNITED AC 2013. [DOI: 10.17311/sciintl.2013.237.252] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
13
|
Homology modeling and analysis of structure predictions of the bovine rhinitis B virus RNA dependent RNA polymerase (RdRp). Int J Mol Sci 2012; 13:8998-9013. [PMID: 22942748 PMCID: PMC3430279 DOI: 10.3390/ijms13078998] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2012] [Revised: 07/03/2012] [Accepted: 07/11/2012] [Indexed: 11/16/2022] Open
Abstract
Bovine Rhinitis B Virus (BRBV) is a picornavirus responsible for mild respiratory infection of cattle. It is probably the least characterized among the aphthoviruses. BRBV is the closest relative known to Foot and Mouth Disease virus (FMDV) with a ~43% identical polyprotein sequence and as much as 67% identical sequence for the RNA dependent RNA polymerase (RdRp), which is also known as 3D polymerase (3Dpol). In the present study we carried out phylogenetic analysis, structure based sequence alignment and prediction of three-dimensional structure of BRBV 3Dpol using a combination of different computational tools. Model structures of BRBV 3Dpol were verified for their stereochemical quality and accuracy. The BRBV 3Dpol structure predicted by SWISS-MODEL exhibited highest scores in terms of stereochemical quality and accuracy, which were in the range of 2Å resolution crystal structures. The active site, nucleic acid binding site and overall structure were observed to be in agreement with the crystal structure of unliganded as well as template/primer (T/P), nucleotide tri-phosphate (NTP) and pyrophosphate (PPi) bound FMDV 3Dpol (PDB, 1U09 and 2E9Z). The closest proximity of BRBV and FMDV 3Dpol as compared to human rhinovirus type 16 (HRV-16) and rabbit hemorrhagic disease virus (RHDV) 3Dpols is also substantiated by phylogeny analysis and root-mean square deviation (RMSD) between C-α traces of the polymerase structures. The absence of positively charged α-helix at C terminal, significant differences in non-covalent interactions especially salt bridges and CH-pi interactions around T/P channel of BRBV 3Dpol compared to FMDV 3Dpol, indicate that despite a very high homology to FMDV 3Dpol, BRBV 3Dpol may adopt a different mechanism for handling its substrates and adapting to physiological requirements. Our findings will be valuable in the design of structure-function interventions and identification of molecular targets for drug design applicable to Aphthovirus RdRps.
Collapse
|
14
|
Repertoire of Protein Kinases Encoded in the Genome of Takifugu rubripes. Comp Funct Genomics 2012; 2012:258284. [PMID: 22666085 PMCID: PMC3359783 DOI: 10.1155/2012/258284] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2011] [Revised: 02/14/2012] [Accepted: 02/28/2012] [Indexed: 12/02/2022] Open
Abstract
Takifugu rubripes is teleost fish widely used in comparative genomics to understand the human system better due to its similarities both in number of genes and structure of genes. In this work we survey the fugu genome, and, using sensitive computational approaches, we identify the repertoire of putative protein kinases and classify them into groups and subfamilies. The fugu genome encodes 519 protein kinase-like sequences and this number of putative protein kinases is comparable closely to that of human. However, in spite of its similarities to human kinases at the group level, there are differences at the subfamily level as noted in the case of KIS and DYRK subfamilies which contribute to differences which are specific to the adaptation of the organism. Also, certain unique domain combination of galectin domain and YkA domain suggests alternate mechanisms for immune response and binding to lipoproteins. Lastly, an overall similarity with the MAPK pathway of humans suggests its importance to understand signaling mechanisms in humans. Overall the fugu serves as a good model organism to understand roles of human kinases as far as kinases such as LRRK and IRAK and their associated pathways are concerned.
Collapse
|
15
|
Kumar M, Ahmad S, Ahmad E, Saifi MA, Khan RH. In silico prediction and analysis of Caenorhabditis EF-hand containing proteins. PLoS One 2012; 7:e36770. [PMID: 22701514 PMCID: PMC3360750 DOI: 10.1371/journal.pone.0036770] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2012] [Accepted: 04/12/2012] [Indexed: 01/12/2023] Open
Abstract
Calcium (Ca⁺²) is a ubiquitous messenger in eukaryotes including Caenorhabditis. Ca⁺²-mediated signalling processes are usually carried out through well characterized proteins like calmodulin (CaM) and other Ca⁺² binding proteins (CaBP). These proteins interact with different targets and activate it by bringing conformational changes. Majority of the EF-hand proteins in Caenorhabditis contain Ca⁺² binding motifs. Here, we have performed homology modelling of CaM-like proteins using the crystal structure of Drosophila melanogaster CaM as a template. Molecular docking was applied to explore the binding mechanism of CaM-like proteins and IQ1 motif which is a ∼25 residues and conform to the consensus sequence (I, L, V)QXXXRXXXX(R,K) to serve as a binding site for different EF hand proteins. We made an attempt to identify all the EF-hand (a helix-loop-helix structure characterized by a 12 residues loop sequence involved in metal coordination) containing proteins and their Ca⁺² binding affinity in Caenorhabditis by analysing the complete genome sequence. Docking studies revealed that F165, F169, L29, E33, F44, L57, M61, M96, M97, M108, G65, V115, F93, N104, E144 of CaM-like protein is involved in the interaction with IQ1 motif. A maximum of 170 EF-hand proteins and 39 non-EF-hand proteins with Ca⁺²/metal binding motif were identified. Diverse proteins including enzyme, transcription, translation and large number of unknown proteins have one or more putative EF-hands. Phylogenetic analysis revealed seven major classes/groups that contain some families of proteins. Various domains that we identified in the EF-hand proteins (uncharacterized) would help in elucidating their functions. It is the first report of its kind where calcium binding loop sequences of EF-hand proteins were analyzed to decipher their calcium affinities. Variation in Ca⁺²-binding affinity of EF-hand CaBP could be further used to study the behaviour of these proteins. Our analyses postulated that Ca⁺² is likely to be key player in Caenorhabditis cell signalling.
Collapse
Affiliation(s)
- Manish Kumar
- Advanced Instrumentation Research Facility, Jawaharlal Nehru University, New Delhi, India
| | - Shadab Ahmad
- Centre for Computational Biology and Bioinformatics, Jawaharlal Nehru University, New Delhi, India
| | - Ejaz Ahmad
- Interdisciplinary Biotechnology Unit, Aligarh Muslim University, Aligarh, India
| | - Muheet Alam Saifi
- Department of Zoology, College of Science, King Saud University, Riyadh, Kingdom of Saudi Arabia
| | - Rizwan Hasan Khan
- Interdisciplinary Biotechnology Unit, Aligarh Muslim University, Aligarh, India
- * E-mail:
| |
Collapse
|
16
|
González-Díaz H, Muíño L, Anadón AM, Romaris F, Prado-Prado FJ, Munteanu CR, Dorado J, Sierra AP, Mezo M, González-Warleta M, Gárate T, Ubeira FM. MISS-Prot: web server for self/non-self discrimination of protein residue networks in parasites; theory and experiments in Fasciola peptides and Anisakis allergens. MOLECULAR BIOSYSTEMS 2011; 7:1938-55. [PMID: 21468430 DOI: 10.1039/c1mb05069a] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Infections caused by human parasites (HPs) affect the poorest 500 million people worldwide but chemotherapy has become expensive, toxic, and/or less effective due to drug resistance. On the other hand, many 3D structures in Protein Data Bank (PDB) remain without function annotation. We need theoretical models to quickly predict biologically relevant Parasite Self Proteins (PSP), which are expressed differentially in a given parasite and are dissimilar to proteins expressed in other parasites and have a high probability to become new vaccines (unique sequence) or drug targets (unique 3D structure). We present herein a model for PSPs in eight different HPs (Ascaris, Entamoeba, Fasciola, Giardia, Leishmania, Plasmodium, Trypanosoma, and Toxoplasma) with 90% accuracy for 15 341 training and validation cases. The model combines protein residue networks, Markov Chain Models (MCM) and Artificial Neural Networks (ANN). The input parameters are the spectral moments of the Markov transition matrix for electrostatic interactions associated with the protein residue complex network calculated with the MARCH-INSIDE software. We implemented this model in a new web-server called MISS-Prot (MARCH-INSIDE Scores for Self-Proteins). MISS-Prot was programmed using PHP/HTML/Python and MARCH-INSIDE routines and is freely available at: . This server is easy to use by non-experts in Bioinformatics who can carry out automatic online upload and prediction with 3D structures deposited at PDB (mode 1). We can also study outcomes of Peptide Mass Fingerprinting (PMFs) and MS/MS for query proteins with unknown 3D structures (mode 2). We illustrated the use of MISS-Prot in experimental and/or theoretical studies of peptides from Fasciola hepatica cathepsin proteases or present on 10 Anisakis simplex allergens (Ani s 1 to Ani s 10). In doing so, we combined electrophoresis (1DE), MALDI-TOF Mass Spectroscopy, and MASCOT to seek sequences, Molecular Mechanics + Molecular Dynamics (MM/MD) to generate 3D structures and MISS-Prot to predict PSP scores. MISS-Prot also allows the prediction of PSP proteins in 16 additional species including parasite hosts, fungi pathogens, disease transmission vectors, and biotechnologically relevant organisms.
Collapse
Affiliation(s)
- Humberto González-Díaz
- Department of Microbiology & Parasitology, Faculty of Pharmacy, University of Santiago de Compostela, 15782, Santiago de Compostela, Spain.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Monson R, Foulds I, Foweraker J, Welch M, Salmond GPC. The Pseudomonas aeruginosa generalized transducing phage phiPA3 is a new member of the phiKZ-like group of 'jumbo' phages, and infects model laboratory strains and clinical isolates from cystic fibrosis patients. MICROBIOLOGY-SGM 2010; 157:859-867. [PMID: 21163841 DOI: 10.1099/mic.0.044701-0] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Pseudomonas aeruginosa is an important pathogen in cystic fibrosis patients, and a model organism for the study of nosocomially acquired infections, biofilms and intrinsic multidrug resistance. In this study we characterize ϕPA3, a new generalized transducing bacteriophage for P. aeruginosa. ϕPA3 transduced chromosomal mutations between PAO1 strains, and infected multiple P. aeruginosa clinical isolates as well as the P. aeruginosa model laboratory strains PAK and PA14. Electron microscopy imaging was used to classify ϕPA3 in the order Caudovirales and the family Myoviridae. The genome of ϕPA3 was sequenced and found to contain 309,208 bp, the second-largest bacteriophage currently deposited in GenBank. The genome contains 378 ORFs and five tRNAs. Many ORF products in the ϕPA3 genome are similar to proteins encoded by P. aeruginosa phage ϕKZ and Pseudomonas chlororaphis phage 201ϕ2-1, and so ϕPA3 was classified genetically as a member of the ϕKZ-like group of phages. This is the first report of a member of this group of phages acting as a generalized transducer. Given its wide host range, high transduction efficiency and large genome size, the 'jumbo' phage ϕPA3 could be a powerful tool in functional genomic analysis of diverse P. aeruginosa strains of fundamental and clinical importance.
Collapse
Affiliation(s)
- Rita Monson
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1QW, UK
| | - Ian Foulds
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1QW, UK
| | - Juliet Foweraker
- Papworth Hospital Foundation NHS Trust, Papworth Everard, Cambridge CB23 3RE, UK
| | - Martin Welch
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1QW, UK
| | - George P C Salmond
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1QW, UK
| |
Collapse
|
18
|
Pandit SB, Brylinski M, Zhou H, Gao M, Arakaki AK, Skolnick J. PSiFR: an integrated resource for prediction of protein structure and function. Bioinformatics 2010; 26:687-8. [PMID: 20080513 DOI: 10.1093/bioinformatics/btq006] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
UNLABELLED In the post-genomic era, the annotation of protein function facilitates the understanding of various biological processes. To extend the range of function annotation methods to the twilight zone of sequence identity, we have developed approaches that exploit both protein tertiary structure and/or protein sequence evolutionary relationships. To serve the scientific community, we have integrated the structure prediction tools, TASSER, TASSER-Lite and METATASSER, and the functional inference tools, FINDSITE, a structure-based algorithm for binding site prediction, Gene Ontology molecular function inference and ligand screening, EFICAz(2), a sequence-based approach to enzyme function inference and DBD-hunter, an algorithm for predicting DNA-binding proteins and associated DNA-binding residues, into a unified web resource, Protein Structure and Function prediction Resource (PSiFR). AVAILABILITY AND IMPLEMENTATION PSiFR is freely available for use on the web at http://psifr.cssb.biology.gatech.edu/
Collapse
Affiliation(s)
- Shashi B Pandit
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, GA 30318, USA
| | | | | | | | | | | |
Collapse
|
19
|
Classification of nonenzymatic homologues of protein kinases. Comp Funct Genomics 2009:365637. [PMID: 19809514 PMCID: PMC2754085 DOI: 10.1155/2009/365637] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2009] [Accepted: 07/01/2009] [Indexed: 11/17/2022] Open
Abstract
Protein Kinase-Like Non-kinases (PKLNKs), which are closely related to protein kinases, lack the crucial catalytic aspartate in the catalytic loop, and hence cannot function as protein kinase, have been analysed. Using various sensitive sequence analysis methods, we have recognized 82 PKLNKs from four higher eukaryotic organisms, namely, Homo sapiens, Mus musculus, Rattus norvegicus, and Drosophila melanogaster. On the basis of their domain combination and function, PKLNKs have been classified mainly into four categories: (1) Ligand binding PKLNKs, (2) PKLNKs with extracellular protein-protein interaction domain, (3) PKLNKs involved in dimerization, and (4) PKLNKs with cytoplasmic protein-protein interaction module. While members of the first two classes of PKLNKs have transmembrane domain tethered to the PKLNK domain, members of the other two classes of PKLNKs are cytoplasmic in nature. The current classification scheme hopes to provide a convenient framework to classify the PKLNKs from other eukaryotes which would be helpful in deciphering their roles in cellular processes.
Collapse
|
20
|
Iwaniak A, Dziuba J. Analysis of Domains in Selected Plant and Animal Food Proteins - Precursors of Biologically Active Peptides - In Silico Approach. FOOD SCI TECHNOL INT 2009. [DOI: 10.1177/1082013208106320] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
In silico methods are useful tool in protein structure-functional relationships analysis. BIOPEP and InterPro databases were applied to analyze the presence of bioactive fragments in the domains occurring in the sequences representing the major groups of proteins. Domains found in the proteins analyzed had mostly transporting (bovine β-lactoglobulin), immunoglobulin-like (chicken connectin), alpha-amylase inhibitor (a/β-wheat gliadin), calcium binding (chicken myosin) functions, or allowed straightly to assign the protein to an appropriate superfamily (bovine casein). It confirmed the thesis about the existence of the functional relations between the structure (sequence) and the domains with identified conformation. Amongst the domains present in the protein sequences we revealed the presence of fragments with the activities: antihypertensive, opioid, dipeptidylpeptidase IV inhibitors, immunomodulating, and neuropeptides. In the chicken connectin within the immunoglobulin-like domain we found immunomodulating fragments. InterPro analysis did not reveal the existence of any domains in a soybean globulin. It can be explained by the lack of the key structure information helpful in the defining the structure-function relationships. As the number of information in the applied databases will continue to increase we can expect to find stronger relationships between bioactivity of fragments encrypted in proteins and the functionality of domains. This might allow in the future to find evolutionary similarity between different origin food proteins - sources of bioactive peptides.
Collapse
Affiliation(s)
- A. Iwaniak
- rWarmia and Mazury University in Olsztyn, Chair of Food Biochemistry ul. Pl. Cieszyński 1, 10-726 Olsztyn, Poland
| | - J. Dziuba
- Warmia and Mazury University in Olsztyn, Chair of Food Biochemistry ul. Pl. Cieszyński 1, 10-726 Olsztyn, Poland,
| |
Collapse
|
21
|
Lobley A, Sadowski MI, Jones DT. pGenTHREADER and pDomTHREADER: new methods for improved protein fold recognition and superfamily discrimination. ACTA ACUST UNITED AC 2009; 25:1761-7. [PMID: 19429599 DOI: 10.1093/bioinformatics/btp302] [Citation(s) in RCA: 218] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Generation of structural models and recognition of homologous relationships for unannotated protein sequences are fundamental problems in bioinformatics. Improving the sensitivity and selectivity of methods designed for these two tasks therefore has downstream benefits for many other bioinformatics applications. RESULTS We describe the latest implementation of the GenTHREADER method for structure prediction on a genomic scale. The method combines profile-profile alignments with secondary-structure specific gap-penalties, classic pair- and solvation potentials using a linear combination optimized with a regression SVM model. We find this combination significantly improves both detection of useful templates and accuracy of sequence-structure alignments relative to other competitive approaches. We further present a second implementation of the protocol designed for the task of discriminating superfamilies from one another. This method, pDomTHREADER, is the first to incorporate both sequence and structural data directly in this task and improves sensitivity and selectivity over the standard version of pGenTHREADER and three other standard methods for remote homology detection.
Collapse
Affiliation(s)
- Anna Lobley
- Department of Computer Science, University College London, UK
| | | | | |
Collapse
|
22
|
Hawkins T, Chitale M, Luban S, Kihara D. PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data. Proteins 2009; 74:566-82. [PMID: 18655063 DOI: 10.1002/prot.22172] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Protein function prediction is a central problem in bioinformatics, increasing in importance recently due to the rapid accumulation of biological data awaiting interpretation. Sequence data represents the bulk of this new stock and is the obvious target for consideration as input, as newly sequenced organisms often lack any other type of biological characterization. We have previously introduced PFP (Protein Function Prediction) as our sequence-based predictor of Gene Ontology (GO) functional terms. PFP interprets the results of a PSI-BLAST search by extracting and scoring individual functional attributes, searching a wide range of E-value sequence matches, and utilizing conventional data mining techniques to fill in missing information. We have shown it to be effective in predicting both specific and low-resolution functional attributes when sufficient data is unavailable. Here we describe (1) significant improvements to the PFP infrastructure, including the addition of prediction significance and confidence scores, (2) a thorough benchmark of performance and comparisons to other related prediction methods, and (3) applications of PFP predictions to genome-scale data. We applied PFP predictions to uncharacterized protein sequences from 15 organisms. Among these sequences, 60-90% could be annotated with a GO molecular function term at high confidence (>or=80%). We also applied our predictions to the protein-protein interaction network of the Malaria plasmodium (Plasmodium falciparum). High confidence GO biological process predictions (>or=90%) from PFP increased the number of fully enriched interactions in this dataset from 23% of interactions to 94%. Our benchmark comparison shows significant performance improvement of PFP relative to GOtcha, InterProScan, and PSI-BLAST predictions. This is consistent with the performance of PFP as the overall best predictor in both the AFP-SIG '05 and CASP7 function (FN) assessments. PFP is available as a web service at http://dragon.bio.purdue.edu/pfp/.
Collapse
Affiliation(s)
- Troy Hawkins
- Department of Biological Sciences, College of Science, Purdue University, West Lafayette, Indiana 47907, USA
| | | | | | | |
Collapse
|
23
|
Anamika K, Bhattacharya A, Srinivasan N. Analysis of the protein kinome of Entamoeba histolytica. Proteins 2008; 71:995-1006. [PMID: 18004777 DOI: 10.1002/prot.21790] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Protein kinases play important roles in almost all major signaling and regulatory pathways of eukaryotic organisms. Members in the family of protein kinases make up a substantial fraction of eukaryotic proteome. Analysis of the protein kinase repertoire (kinome) would help in the better understanding of the regulatory processes. In this article, we report the identification and analysis of the repertoire of protein kinases in the intracellular parasite Entamoeba histolytica. Using a combination of various sensitive sequence search methods and manual analysis, we have identified a set of 307 protein kinases in E. histolytica genome. We have classified these protein kinases into different subfamilies originally defined by Hanks and Hunter and studied these kinases further in the context of noncatalytic domains that are tethered to catalytic kinase domain. Compared to other eukaryotic organisms, protein kinases from E. histolytica vary in terms of their domain organization and displays features that may have a bearing in the unusual biology of this organism. Some of the parasitic kinases show high sequence similarity in the catalytic domain region with calmodulin/calcium dependent protein kinase subfamily. However, they are unlikely to act like typical calcium/calmodulin dependent kinases as they lack noncatalytic domains characteristic of such kinases in other organisms. Such kinases form the largest subfamily of kinases in E. histolytica. Interestingly, a PKA/PKG-like subfamily member is tethered to pleckstrin homology domain. Although potential cyclins and cyclin-dependent kinases could be identified in the genome the likely absence of other cell cycle proteins suggests unusual nature of cell cycle in E. histolytica. Some of the unusual features recognized in our analysis include the absence of MEK as a part of the Mitogen Activated Kinase signaling pathway and identification of transmembrane region containing Src kinase-like kinases. Sequences which could not be classified into known subfamilies of protein kinases have unusual domain architectures. Many such unclassified protein kinases are tethered to domains which are Cysteine-rich and to domains known to be involved in protein-protein interactions. Our kinome analysis of E. histolytica suggests that the organism possesses a complex protein phosphorylation network that involves many unusual kinases.
Collapse
Affiliation(s)
- K Anamika
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560012, India
| | | | | |
Collapse
|
24
|
Abstract
Most newly sequenced proteins are likely to adopt a similar structure to one which has already been experimentally determined. For this reason, the most successful approaches to protein structure prediction have been template-based methods. Such prediction methods attempt to identify and model the folds of unknown structures by aligning the target sequences to a set of representative template structures within a fold library. In this chapter, I discuss the development of template-based approaches to fold prediction, from the traditional techniques to the recent state-of-the-art methods. I also discuss the recent development of structural annotation databases, which contain models built by aligning the sequences from entire proteomes against known structures. Finally, I run through a practical step-by-step guide for aligning target sequences to known structures and contemplate the future direction of template-based structure prediction.
Collapse
|
25
|
Reid AJ, Yeats C, Orengo CA. Methods of remote homology detection can be combined to increase coverage by 10% in the midnight zone. Bioinformatics 2007; 23:2353-60. [PMID: 17709341 DOI: 10.1093/bioinformatics/btm355] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION A recent development in sequence-based remote homologue detection is the introduction of profile-profile comparison methods. These are more powerful than previous technologies and can detect potentially homologous relationships missed by structural classifications such as CATH and SCOP. As structural classifications traditionally act as the gold standard of homology this poses a challenge in benchmarking them. RESULTS We present a novel approach which allows an accurate benchmark of these methods against the CATH structural classification. We then apply this approach to assess the accuracy of a range of publicly available methods for remote homology detection including several profile-profile methods (COMPASS, HHSearch, PRC) from two perspectives. First, in distinguishing homologous domains from non-homologues and second, in annotating proteomes with structural domain families. PRC is shown to be the best method for distinguishing homologues. We show that SAM is the best practical method for annotating genomes, whilst using COMPASS for the most remote homologues would increase coverage. Finally, we introduce a simple approach to increase the sensitivity of remote homologue detection by up to 10%. This is achieved by combining multiple methods with a jury vote. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Adam James Reid
- Department of Biochemistry and Molecular Biology, University College London, Gower Street, London WC1E 6BT, UK.
| | | | | |
Collapse
|
26
|
Scheeff ED, Bourne PE. Application of protein structure alignments to iterated hidden Markov model protocols for structure prediction. BMC Bioinformatics 2006; 7:410. [PMID: 16970830 PMCID: PMC1622756 DOI: 10.1186/1471-2105-7-410] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2006] [Accepted: 09/14/2006] [Indexed: 11/30/2022] Open
Abstract
Background One of the most powerful methods for the prediction of protein structure from sequence information alone is the iterative construction of profile-type models. Because profiles are built from sequence alignments, the sequences included in the alignment and the method used to align them will be important to the sensitivity of the resulting profile. The inclusion of highly diverse sequences will presumably produce a more powerful profile, but distantly related sequences can be difficult to align accurately using only sequence information. Therefore, it would be expected that the use of protein structure alignments to improve the selection and alignment of diverse sequence homologs might yield improved profiles. However, the actual utility of such an approach has remained unclear. Results We explored several iterative protocols for the generation of profile hidden Markov models. These protocols were tailored to allow the inclusion of protein structure alignments in the process, and were used for large-scale creation and benchmarking of structure alignment-enhanced models. We found that models using structure alignments did not provide an overall improvement over sequence-only models for superfamily-level structure predictions. However, the results also revealed that the structure alignment-enhanced models were complimentary to the sequence-only models, particularly at the edge of the "twilight zone". When the two sets of models were combined, they provided improved results over sequence-only models alone. In addition, we found that the beneficial effects of the structure alignment-enhanced models could not be realized if the structure-based alignments were replaced with sequence-based alignments. Our experiments with different iterative protocols for sequence-only models also suggested that simple protocol modifications were unable to yield equivalent improvements to those provided by the structure alignment-enhanced models. Finally, we found that models using structure alignments provided fold-level structure assignments that were superior to those produced by sequence-only models. Conclusion When attempting to predict the structure of remote homologs, we advocate a combined approach in which both traditional models and models incorporating structure alignments are used.
Collapse
Affiliation(s)
- Eric D Scheeff
- San Diego Supercomputer Center, University of California, San Diego, 9500 Gilman Dr., La Jolla, CA 92093-0537, USA
- Present address: Razavi-Newman Center for Bioinformatics, The Salk Institute for Biological Studies, 10010 North Torrey Pines Rd., La Jolla, CA 92037, USA
| | - Philip E Bourne
- San Diego Supercomputer Center, University of California, San Diego, 9500 Gilman Dr., La Jolla, CA 92093-0537, USA
- Department of Pharmacology, University of California, San Diego, 9500 Gilman Dr., La Jolla, CA 92093, USA
| |
Collapse
|
27
|
Li J, Wang W. Detailed assessment of homology detection using different substitution matrices. CHINESE SCIENCE BULLETIN-CHINESE 2006. [DOI: 10.1007/s11434-006-1538-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
28
|
Fleming K, Kelley LA, Islam SA, MacCallum RM, Muller A, Pazos F, Sternberg MJ. The proteome: structure, function and evolution. Philos Trans R Soc Lond B Biol Sci 2006; 361:441-51. [PMID: 16524832 PMCID: PMC1609342 DOI: 10.1098/rstb.2005.1802] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
This paper reports two studies to model the inter-relationships between protein sequence, structure and function. First, an automated pipeline to provide a structural annotation of proteomes in the major genomes is described. The results are stored in a database at Imperial College, London (3D-GENOMICS) that can be accessed at www.sbg.bio.ic.ac.uk. Analysis of the assignments to structural superfamilies provides evolutionary insights. 3D-GENOMICS is being integrated with related proteome annotation data at University College London and the European Bioinformatics Institute in a project known as e-protein (http://www.e-protein.org/). The second topic is motivated by the developments in structural genomics projects in which the structure of a protein is determined prior to knowledge of its function. We have developed a new approach PHUNCTIONER that uses the gene ontology (GO) classification to supervise the extraction of the sequence signal responsible for protein function from a structure-based sequence alignment. Using GO we can obtain profiles for a range of specificities described in the ontology. In the region of low sequence similarity (around 15%), our method is more accurate than assignment from the closest structural homologue. The method is also able to identify the specific residues associated with the function of the protein family.
Collapse
Affiliation(s)
- Keiran Fleming
- Structural Bioinformatics Group, Centre for Bioinformatics, Division of Molecular Biosciences, Imperial College of Science, Technology and MedicineLondon SW7 2AZ, UK
| | - Lawrence A Kelley
- Structural Bioinformatics Group, Centre for Bioinformatics, Division of Molecular Biosciences, Imperial College of Science, Technology and MedicineLondon SW7 2AZ, UK
- Biomolecular Modelling Laboratory, Cancer Research UK44 Lincoln's Inn Fields, London WC2A 3PX, UK
| | - Suhail A Islam
- Structural Bioinformatics Group, Centre for Bioinformatics, Division of Molecular Biosciences, Imperial College of Science, Technology and MedicineLondon SW7 2AZ, UK
- Biomolecular Modelling Laboratory, Cancer Research UK44 Lincoln's Inn Fields, London WC2A 3PX, UK
| | - Robert M MacCallum
- Biomolecular Modelling Laboratory, Cancer Research UK44 Lincoln's Inn Fields, London WC2A 3PX, UK
| | - Arne Muller
- Structural Bioinformatics Group, Centre for Bioinformatics, Division of Molecular Biosciences, Imperial College of Science, Technology and MedicineLondon SW7 2AZ, UK
- Biomolecular Modelling Laboratory, Cancer Research UK44 Lincoln's Inn Fields, London WC2A 3PX, UK
| | - Florencio Pazos
- Structural Bioinformatics Group, Centre for Bioinformatics, Division of Molecular Biosciences, Imperial College of Science, Technology and MedicineLondon SW7 2AZ, UK
| | - Michael J.E Sternberg
- Structural Bioinformatics Group, Centre for Bioinformatics, Division of Molecular Biosciences, Imperial College of Science, Technology and MedicineLondon SW7 2AZ, UK
- Biomolecular Modelling Laboratory, Cancer Research UK44 Lincoln's Inn Fields, London WC2A 3PX, UK
- Author for correspondence ()
| |
Collapse
|
29
|
Jiménez JL. Does structural and chemical divergence play a role in precluding undesirable protein interactions? Proteins 2006; 59:757-64. [PMID: 15822102 DOI: 10.1002/prot.20448] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
To understand the evolutionary forces establishing, maintaining, breaking, or precluding protein-protein interactions, a comprehensive data set of protein complexes has been analyzed to examine the overlap between protein interfaces and the most conserved or divergent protein surface areas. The most divergent areas tend to be found predominantly away from protein interfaces, although when found at interfaces, they are associated with specific lack of cross-reactivity between close homologues, like in antibody-antigen complexes. Moreover, the amino acid composition of highly variable regions is significantly different from any other protein surfaces. The variable regions present higher structural plasticity as a result of insertions and deletions, and favor charged over hydrophobic residues, a known strategy to minimize aggregation. This suggests that (1) a rapid rate of mutations at these regions might be continuously altering their properties, making difficult the coadaptation, in shape and chemical complementarity, to potential interacting partners; and (2) the existence of some form of selective pressure for variable areas away from interfaces to accumulate charged residues, perhaps as an evolutionary mechanism to increase solubility and minimize undesirable interactions within the crowded cellular environment. Finally, these results are placed into the context of the aberrant oligomerization of sickle-cell anemia hemoglobin and prion proteins.
Collapse
Affiliation(s)
- José L Jiménez
- Biomolecular Modelling Laboratory, Cancer Research UK London Research Institute, London, United Kingdom.
| |
Collapse
|
30
|
Abstract
We review fold usage on completed genomes to explore protein structure evolution. The patterns of presence or absence of folds on genomes gives us insights into the relationships between folds, the age of different folds and how we have arrived at the set of folds we see today. We examine the relationships between different measures which describe protein fold usage, such as the number of copies of a fold per genome, the number of families per fold, and the number of genomes a fold occurs on. We obtained these measures of fold usage by searching for the structural domains on 157 completed genome sequences from all three kingdoms of life. In our comparisons of these measures we found that bacteria have relatively more distinct folds on their genomes than archaea. Eukaryotes were found to have many more copies of a fold on their genomes. If we separate out the different fold classes, the alpha/beta class has relatively fewer distinct folds on large genomes, more copies of a fold on bacteria and more folds occurring in all three kingdoms simultaneously. These results possibly indicate that most alpha/beta folds originated earlier than other folds. The expected power law distribution is observed for copies of a fold per genome and we found a similar distribution for the number of families per fold. However, a more complicated distribution appears for fold occurrence across genomes, which strongly depends on fold class and kingdom. We also show that there is not a clear relationship between the three measures of fold usage. A fold which occurs on many genomes does not necessarily have many copies on each genome. Similarly, folds with many copies do not necessarily have many families or vice versa.
Collapse
Affiliation(s)
- Sanne Abeln
- Department of Statistics, University of Oxford, United Kingdom
| | | |
Collapse
|
31
|
Gowri VS, Krishnadev O, Swamy CS, Srinivasan N. MulPSSM: a database of multiple position-specific scoring matrices of protein domain families. Nucleic Acids Res 2006; 34:D243-6. [PMID: 16381855 PMCID: PMC1347406 DOI: 10.1093/nar/gkj043] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Representation of multiple sequence alignments of protein families in terms of position-specific scoring matrices (PSSMs) is commonly used in the detection of remote homologues. A PSSM is generated with respect to one of the sequences involved in the multiple sequence alignment as a reference. We have shown recently that the use of multiple PSSMs corresponding to an alignment, with several sequences in the family used as reference, improves the sensitivity of the remote homology detection dramatically. MulPSSM contains PSSMs for a large number of sequence and structural families of protein domains with multiple PSSMs for every family. The approach involves use of a clustering algorithm to identify most distinct sequences corresponding to a family. With each one of the distinct sequences as reference, multiple PSSMs have been generated. The current release of MulPSSM contains ∼33 000 and ∼38 000 PSSMs corresponding to 7868 sequence and 2625 structural families. A RPS_BLAST interface allows sequence search against PSSMs of sequence or structural families or both. An analysis interface allows display and convenient navigation of alignments and domain hits. MulPSSM can be accessed at .
Collapse
Affiliation(s)
- V. S. Gowri
- Molecular Biophysics Unit, Indian Institute of ScienceBangalore 560012, India
| | - O. Krishnadev
- Molecular Biophysics Unit, Indian Institute of ScienceBangalore 560012, India
| | - C. S. Swamy
- Molecular Biophysics Unit, Indian Institute of ScienceBangalore 560012, India
- National Centre for Biological SciencesGKVK Campus, Bangalore 560065, India
| | - N. Srinivasan
- Molecular Biophysics Unit, Indian Institute of ScienceBangalore 560012, India
- To whom correspondence should be addressed. Tel: +91 80 2293 2837; Fax: +91 80 2360 0535;
| |
Collapse
|
32
|
Abstract
Protein kinases are central to regulation of cellular signaling in the eukaryotes. Well-conserved and lineage-specific protein kinases have previously been identified from various completely sequenced genomes of eukaryotes. The current work describes a genome-wide analysis for protein kinases encoded in the Plasmodium falciparum genome. Using a few different profile matching methods, we have identified 99 protein kinases or related proteins in the parasite genome. We have classified these kinases into subfamilies and analyzed them in the context of noncatalytic domains that occur in these catalytic kinase domain-containing proteins. Compared to most eukaryotic protein kinases, these sequences vary significantly in terms of their lengths, inserts in catalytic domains, and co-occurring domains. Catalytic and noncatalytic domains contain long stretches of repeats of positively charged and other polar amino acids. Various components of the cell cycle, including 4 cyclin-dependent kinase (CDK) homologues, 2 cyclins, 1 CDK regulatory subunit, and 1 kinase-associated phosphatase, are identified. Identification of putative mitogen-activated protein (MAP) Kinase and MAP Kinase Kinase of P. falciparum suggests a new paradigm in the highly conserved signaling pathway of eukaryotes. The calcium-dependent kinase family, well represented in P. falciparum, shows varying domain combinations with EF-hands and pleckstrin homology domains. The analysis reveals a new subfamily of protein kinases having limited sequence similarity with previously known subfamilies. A new transmembrane kinase with 6 membrane-spanning regions is identified. Putative apicoplast targeting sequences have been detected in some of these protein kinases, suggesting their export to the apicoplast.
Collapse
|
33
|
Sandhya S, Chakrabarti S, Abhinandan KR, Sowdhamini R, Srinivasan N. Assessment of a rigorous transitive profile based search method to detect remotely similar proteins. J Biomol Struct Dyn 2005; 23:283-98. [PMID: 16218755 DOI: 10.1080/07391102.2005.10507066] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Profile-based sequence search procedures are commonly employed to detect remote relationships between proteins. We provide an assessment of a Cascade PSI-BLAST protocol that rigorously employs intermediate sequences in detecting remote relationships between proteins. In this approach we detect using PSI-BLAST, which involves multiple rounds of iteration, an initial set of homologues for a protein in a 'first generation' search by querying a database. We propagate a 'second generation' search in the database, involving multiple runs of PSI-BLAST using each of the homologues identified in the previous generation as queries to recognize homologues not detected earlier. This non-directed search process can be viewed as an iteration of iterations that is continued to detect further homologues until no new hits are detectable. We present an assessment of the coverage of this 'cascaded' intermediate sequence search on diverse folds and find that searches for up to three generations detect most known homologues of a query. Our assessments show that this approach appears to perform better than the traditional use of PSI-BLAST by detecting 15% more relationships within a family and 35% more relationships within a superfamily. We show that such searches can be performed on generalized sequence databases and non-trivial relationships between proteins can be detected effectively. Such a propagation of searches maximizes the chances of detecting distant homologies by effectively scanning protein "fold space".
Collapse
Affiliation(s)
- S Sandhya
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India
| | | | | | | | | |
Collapse
|
34
|
Chandonia JM, Kim SH, Brenner SE. Target selection and deselection at the Berkeley Structural Genomics Center. Proteins 2005; 62:356-70. [PMID: 16276528 DOI: 10.1002/prot.20674] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
At the Berkeley Structural Genomics Center (BSGC), our goal is to obtain a near-complete structural complement of proteins in the minimal organisms Mycoplasma genitalium and M. pneumoniae, two closely related pathogens. Current targets for structure determination have been selected in six major stages, starting with those predicted to be most tractable to high throughput study and likely to yield new structural information. We report on the process used to select these proteins, as well as our target deselection procedure. Target deselection reduces experimental effort by eliminating targets similar to those recently solved by the structural biology community or other centers. We measure the impact of the 69 structures solved at the BSGC as of July 2004 on structure prediction coverage of the M. pneumoniae and M. genitalium proteomes. The number of Mycoplasma proteins for which the fold could first be reliably assigned based on structures solved at the BSGC (24 M. pneumoniae and 21 M. genitalium) is approximately 25% of the total resulting from work at all structural genomics centers and the worldwide structural biology community (94 M. pneumoniae and 86 M. genitalium) during the same period. As the number of structures contributed by the BSGC during that period is less than 1% of the total worldwide output, the benefits of a focused target selection strategy are apparent. If the structures of all current targets were solved, the percentage of M. pneumoniae proteins for which folds could be reliably assigned would increase from approximately 57% (391 of 687) at present to around 80% (550 of 687), and the percentage of the proteome that could be accurately modeled would increase from around 37% (254 of 687) to about 64% (438 of 687). In M. genitalium, the percentage of the proteome that could be structurally annotated based on structures of our remaining targets would rise from 72% (348 of 486) to around 76% (371 of 486), with the percentage of accurately modeled proteins would rise from 50% (243 of 486) to 58% (283 of 486). Sequences and data on experimental progress on our targets are available in the public databases TargetDB and PEPCdb.
Collapse
Affiliation(s)
- John-Marc Chandonia
- Berkeley Structural Genomics Center, Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | | | | |
Collapse
|
35
|
Abstract
With currently available sequence data, it is feasible to conduct extensive comparisons among large sets of protein sequences. It is still a much more challenging task to partition the protein space into structurally and functionally related families solely based on sequence comparisons. The ProtoNet system automatically generates a treelike classification of the whole protein space. It stands to reason that this classification reflects evolutionary relationships, both close and remote. In this article, we examine this hypothesis. We present a semiautomatic procedure that singles out certain inner nodes in the ProtoNet tree that should ideally correspond to structurally and functionally defined protein families. We compare the performance of this method against several expert systems. Some of the competing methods incorporate additional extraneous information on protein structure or on enzymatic activities. The ProtoNet-based method performs at least as well as any of the methods with which it was compared. This article illustrates the ProtoNet-based method on several evolutionarily diverse families. Using this new method, an evolutionary divergence scheme can be proposed for a large number of structural and functional related superfamilies.
Collapse
Affiliation(s)
- Ori Shachar
- School of Computer Science and Engineering, Hebrew University, Jerusalem, Israel
| | | |
Collapse
|
36
|
Scheeff ED, Bourne PE. Structural evolution of the protein kinase-like superfamily. PLoS Comput Biol 2005; 1:e49. [PMID: 16244704 PMCID: PMC1261164 DOI: 10.1371/journal.pcbi.0010049] [Citation(s) in RCA: 189] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2005] [Accepted: 09/08/2005] [Indexed: 11/19/2022] Open
Abstract
The protein kinase family is large and important, but it is only one family in a larger superfamily of homologous kinases that phosphorylate a variety of substrates and play important roles in all three superkingdoms of life. We used a carefully constructed structural alignment of selected kinases as the basis for a study of the structural evolution of the protein kinase-like superfamily. The comparison of structures revealed a "universal core" domain consisting only of regions required for ATP binding and the phosphotransfer reaction. Remarkably, even within the universal core some kinase structures display notable changes, while still retaining essential activity. Hence, the protein kinase-like superfamily has undergone substantial structural and sequence revision over long evolutionary timescales. We constructed a phylogenetic tree for the superfamily using a novel approach that allowed for the combination of sequence and structure information into a unified quantitative analysis. When considered against the backdrop of species distribution and other metrics, our tree provides a compelling scenario for the development of the various kinase families from a shared common ancestor. We propose that most of the so-called "atypical kinases" are not intermittently derived from protein kinases, but rather diverged early in evolution to form a distinct phyletic group. Within the atypical kinases, the aminoglycoside and choline kinase families appear to share the closest relationship. These two families in turn appear to be the most closely related to the protein kinase family. In addition, our analysis suggests that the actin-fragmin kinase, an atypical protein kinase, is more closely related to the phosphoinositide-3 kinase family than to the protein kinase family. The two most divergent families, alpha-kinases and phosphatidylinositol phosphate kinases (PIPKs), appear to have distinct evolutionary histories. While the PIPKs probably have an evolutionary relationship with the rest of the kinase superfamily, the relationship appears to be very distant (and perhaps indirect). Conversely, the alpha-kinases appear to be an exception to the scenario of early divergence for the atypical kinases: they apparently arose relatively recently in eukaryotes. We present possible scenarios for the derivation of the alpha-kinases from an extant kinase fold.
Collapse
Affiliation(s)
- Eric D Scheeff
- San Diego Supercomputer Center, University of California, San Diego, California, United States of America.
| | | |
Collapse
|
37
|
Krupa A, Srinivasan N. Diversity in domain architectures of Ser/Thr kinases and their homologues in prokaryotes. BMC Genomics 2005; 6:129. [PMID: 16171520 PMCID: PMC1262709 DOI: 10.1186/1471-2164-6-129] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2004] [Accepted: 09/19/2005] [Indexed: 11/17/2022] Open
Abstract
Background Ser/Thr/Tyr kinases (STYKs) commonly found in eukaryotes have been recently reported in many bacterial species. Recent studies elucidating their cellular functions have established their roles in bacterial growth and development. However functions of a large number of bacterial STYKs still remain elusive. The organisation of domains in a large dataset of bacterial STYKs has been investigated here in order to recognise variety in domain combinations which determine functions of bacterial STYKs. Results Using sensitive sequence and profile search methods, domain organisation of over 600 STYKs from 125 prokaryotic genomes have been examined. Kinase catalytic domains of STYKs tethered to a wide range of enzymatic domains such as phosphatases, HSP70, peptidyl prolyl isomerases, pectin esterases and glycoproteases have been identified. Such distinct preferences for domain combinations are not known to be present in either the Histidine kinase or the eukaryotic STYK families. Domain organisation of STYKs specific to certain groups of bacteria has also been noted in the current anlaysis. For example, Hydrophobin like domains in Mycobacterial STYK and penicillin binding domains in few STYKs of Gram-positive organisms and FHA domains in cyanobacterial STYKs. Homologues of characterised substrates of prokaryotic STYKs have also been identified. Conclusion The domains and domain architectures of most of the bacterial STYKs identified are very different from the known domain organisation in STYKs of eukaryotes. This observation highlights distinct biological roles of bacterial STYKs compared to eukaryotic STYKs. Bacterial STYKs reveal high diversity in domain organisation. Some of the modular organisations conserved across diverse bacterial species suggests their central role in bacterial physiology. Unique domain architectures of few other groups of STYKs reveal recruitment of functions specific to the species.
Collapse
Affiliation(s)
- A Krupa
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India
- Cell Cycle Control Laboratory, London Research Institute, Cancer Research – UK, South Mimms, Hertfordshire, EN6 3LD UK
| | - N Srinivasan
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India
| |
Collapse
|
38
|
Johnston CR, Shields DC. A sequence sub-sampling algorithm increases the power to detect distant homologues. Nucleic Acids Res 2005; 33:3772-8. [PMID: 16006623 PMCID: PMC1174907 DOI: 10.1093/nar/gki687] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Searching databases for distant homologues using alignments instead of individual sequences increases the power of detection. However, most methods assume that protein evolution proceeds in a regular fashion, with the inferred tree of sequences providing a good estimation of the evolutionary process. We investigated the combined HMMER search results from random alignment subsets (with three sequences each) drawn from the parent alignment (Rand-shuffle algorithm), using the SCOP structural classification to determine true similarities. At false-positive rates of 5%, the Rand-shuffle algorithm improved HMMER's sensitivity, with a 37.5% greater sensitivity compared with HMMER alone, when easily identified similarities (identifiable by BLAST) were excluded from consideration. An extension of the Rand-shuffle algorithm (Ali-shuffle) weighted towards more informative sequence subsets. This approach improved the performance over HMMER alone and PSI-BLAST, particularly at higher false-positive rates. The improvements in performance of these sequence sub-sampling methods may reflect lower sensitivity to alignment error and irregular evolutionary patterns. The Ali-shuffle and Rand-shuffle sequence homology search programs are available by request from the authors.
Collapse
Affiliation(s)
- Catrióna R Johnston
- Department of Clinical Pharmacology, Bioinformatics Group, Royal College of Surgeons in Ireland, 123 St Stephens Green, Dublin 2, Ireland.
| | | |
Collapse
|
39
|
Sillitoe I, Dibley M, Bray J, Addou S, Orengo C. Assessing strategies for improved superfamily recognition. Protein Sci 2005; 14:1800-10. [PMID: 15937274 PMCID: PMC2253352 DOI: 10.1110/ps.041056105] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
There are more than 200 completed genomes and over 1 million nonredundant sequences in public repositories. Although the structural data are more sparse (approximately 13,000 nonredundant structures solved to date), several powerful sequence-based methodologies now allow these structures to be mapped onto related regions in a significant proportion of genome sequences. We review a number of publicly available strategies for providing structural annotations for genome sequences, and we describe the protocol adopted to provide CATH structural annotations for completed genomes. In particular, we assess the performance of several sequence-based protocols employing Hidden Markov model (HMM) technologies for superfamily recognition, including a new approach (SAMOSA [sequence augmented models of structure alignments]) that exploits multiple structural alignments from the CATH domain structure database when building the models. Using a data set of remote homologs detected by structure comparison and manually validated in CATH, a single-seed HMM library was able to recognize 76% of the data set. Including the SAMOSA models in the HMM library showed little gain in homolog recognition, although a slight improvement in alignment quality was observed for very remote homologs. However, using an expanded 1D-HMM library, CATH-ISL increased the coverage to 86%. The single-seed HMM library has been used to annotate the protein sequences of 120 genomes from all three major kingdoms, allowing up to 70% of the genes or partial genes to be assigned to CATH superfamilies. It has also been used to recruit sequences from Swiss-Prot and TrEMBL into CATH domain superfamilies, expanding the CATH database eightfold.
Collapse
Affiliation(s)
- Ian Sillitoe
- Biomolecular Structure and Modelling Unit, Department of Biochemistry and Molecular Biology, University College London, UK
| | | | | | | | | |
Collapse
|
40
|
Anand B, Gowri VS, Srinivasan N. Use of multiple profiles corresponding to a sequence alignment enables effective detection of remote homologues. Bioinformatics 2005; 21:2821-6. [PMID: 15817691 DOI: 10.1093/bioinformatics/bti432] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Position specific scoring matrices (PSSMs) corresponding to aligned sequences of homologous proteins are commonly used in homology detection. A PSSM is generated on the basis of one of the homologues as a reference sequence, which is the query in the case of PSI-BLAST searches. The reference sequence is chosen arbitrarily while generating PSSMs for reverse BLAST searches. In this work we demonstrate that the use of multiple PSSMs corresponding to a given alignment and variable reference sequences is more effective than using traditional single PSSMs and hidden Markov models. RESULTS Searches for proteins with known 3-D structures have been made against three databases of protein family profiles corresponding to known structures: (1) One PSSM per family; (2) multiple PSSMs corresponding to an alignment and variable reference sequences for every family; and (3) hidden Markov models. A comparison of the performances of these three approaches suggests that the use of multiple PSSMs is most effective. CONTACT ns@mbu.iisc.ernet.in.
Collapse
Affiliation(s)
- B Anand
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India
| | | | | |
Collapse
|
41
|
Pallen MJ, Beatson SA, Bailey CM. Bioinformatics analysis of the locus for enterocyte effacement provides novel insights into type-III secretion. BMC Microbiol 2005; 5:9. [PMID: 15757514 PMCID: PMC1084347 DOI: 10.1186/1471-2180-5-9] [Citation(s) in RCA: 98] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2004] [Accepted: 03/09/2005] [Indexed: 12/17/2022] Open
Abstract
Background Like many other pathogens, enterohaemorrhagic and enteropathogenic strains of Escherichia coli employ a type-III secretion system to translocate bacterial effector proteins into host cells, where they then disrupt a range of cellular functions. This system is encoded by the locus for enterocyte effacement. Many of the genes within this locus have been assigned names and functions through homology with the better characterised Ysc-Yop system from Yersinia spp. However, the functions and homologies of many LEE genes remain obscure. Results We have performed a fresh bioinformatics analysis of the LEE. Using PSI-BLAST we have been able to identify several novel homologies between LEE-encoded and Ysc-Yop-associated proteins: Orf2/YscE, Orf5/YscL, rORF8/EscI, SepQ/YscQ, SepL/YopN-TyeA, CesD2/LcrR. In addition, we highlight homology between EspA and flagellin, and report many new homologues of the chaperone CesT. Conclusion We conclude that the vast majority of LEE-encoded proteins do indeed possess homologues and that homology data can be used in combination with experimental data to make fresh functional predictions.
Collapse
Affiliation(s)
- Mark J Pallen
- Bacterial Pathogenesis and Genomics Unit, Division of Immunity and Infection, Medical School, University of Birmingham, Birmingham, B15 2TT, UK
| | - Scott A Beatson
- Bacterial Pathogenesis and Genomics Unit, Division of Immunity and Infection, Medical School, University of Birmingham, Birmingham, B15 2TT, UK
| | - Christopher M Bailey
- Bacterial Pathogenesis and Genomics Unit, Division of Immunity and Infection, Medical School, University of Birmingham, Birmingham, B15 2TT, UK
| |
Collapse
|
42
|
Kaplan N, Linial M. Automatic detection of false annotations via binary property clustering. BMC Bioinformatics 2005; 6:46. [PMID: 15755318 PMCID: PMC555558 DOI: 10.1186/1471-2105-6-46] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2004] [Accepted: 03/08/2005] [Indexed: 11/10/2022] Open
Abstract
Background Computational protein annotation methods occasionally introduce errors. False-positive (FP) errors are annotations that are mistakenly associated with a protein. Such false annotations introduce errors that may spread into databases through similarity with other proteins. Generally, methods used to minimize the chance for FPs result in decreased sensitivity or low throughput. We present a novel protein-clustering method that enables automatic separation of FP from true hits. The method quantifies the biological similarity between pairs of proteins by examining each protein's annotations, and then proceeds by clustering sets of proteins that received similar annotation into biological groups. Results Using a test set of all PROSITE signatures that are marked as FPs, we show that the method successfully separates FPs in 69% of the 327 test cases supplied by PROSITE. Furthermore, we constructed an extensive random FP simulation test and show a high degree of success in detecting FP, indicating that the method is not specifically tuned for PROSITE and performs well on larger scales. We also suggest some means of predicting in which cases this approach would be successful. Conclusion Automatic detection of FPs may greatly facilitate the manual validation process and increase annotation sensitivity. With the increasing number of automatic annotations, the tendency of biological properties to be clustered, once a biological similarity measure is introduced, may become exceedingly helpful in the development of such automatic methods.
Collapse
Affiliation(s)
- Noam Kaplan
- Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Israel
| | - Michal Linial
- Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Israel
- Department of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| |
Collapse
|
43
|
Namboori S, Mhatre N, Sujatha S, Srinivasan N, Pandit SB. Enhanced functional and structural domain assignments using remote similarity detection procedures for proteins encoded in the genome of Mycobacterium tuberculosis H37Rv. J Biosci 2005; 29:245-59. [PMID: 15381846 DOI: 10.1007/bf02702607] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
The sequencing of the Mycobacterium tuberculosis (MTB) H37Rv genome has facilitated deeper insights into the biology of MTB, yet the functions of many MTB proteins are unknown. We have used sensitive profile-based search procedures to assign functional and structural domains to infer functions of gene products encoded in MTB. These domain assignments have been made using a compendium of sequence and structural domain families. Functions are predicted for 78 % of the encoded gene products. For 69 % of these, functions can be inferred by domain assignments. The functions for the rest are deduced from their homology to proteins of known function. Superfamily relationships between families of unknown and known structures have increased structural information by approximately 11%. Remote similarity detection methods have enabled domain assignments for 1325 'hypothetical proteins'. The most populated families in MTB are involved in lipid metabolism, entry and survival of the bacillus in host. Interestingly, for 353 proteins, which we refer to as MTB-specific, no homologues have been identified. Numerous, previously unannotated, hypothetical proteins have been assigned domains and some of these could perhaps be the possible chemotherapeutic targets. MTB-specific proteins might include factors responsible for virulence. Importantly, these assignments could be valuable for experimental endeavors. The detailed results are publicly available at http://hodgkin.mbu.iisc.ernet.in/~dots.
Collapse
Affiliation(s)
- Seema Namboori
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012
| | | | | | | | | |
Collapse
|
44
|
Abstract
Here, we report a novel protein sequence descriptor-based remote homology identification method, able to infer fold relationships without the explicit knowledge of structure. In a first phase, we have individually benchmarked 13 different descriptor types in fold identification experiments in a highly diverse set of protein sequences. The relevant descriptors were related to the fold class membership by using simple similarity measures in the descriptor spaces, such as the cosine angle. Our results revealed that the three best-performing sets of descriptors were the sequence-alignment-based descriptor using PSI-BLAST e-values, the descriptors based on the alignment of secondary structural elements (SSEA), and the descriptors based on the occurrence of PROSITE functional motifs. In a second phase, the three top-performing descriptors were combined to obtain a final method with improved performance, which we named DescFold. Class membership was predicted by Support Vector Machine (SVM) learning. In comparison with the individual PSI-BLAST-based descriptor, the rate of remote homology identification increased from 33.7% to 46.3%. We found out that the composite set of descriptors was able to identify the true remote homolog for nearly every sixth sequence at the 95% confidence level, or some 10% more than a single PSI-BLAST search. We have benchmarked the DescFold method against several other state-of-the-art fold recognition algorithms for the 172 LiveBench-8 targets, and we concluded that it was able to add value to the existing techniques by providing a confident hit for at least 10% of the sequences not identifiable by the previously known methods.
Collapse
Affiliation(s)
- Ziding Zhang
- Nestlé Research Center, BioAnalyti-cal Science, CH-1000 Lausanne 26, Switzerland. Ziding.
| | | | | |
Collapse
|
45
|
Stevens FJ. Efficient recognition of protein fold at low sequence identity by conservative application of Psi-BLAST: validation. J Mol Recognit 2005; 18:139-49. [PMID: 15558595 DOI: 10.1002/jmr.721] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
A substantial fraction of protein sequences derived from genomic analyses is currently classified as representing 'hypothetical proteins of unknown function'. In part, this reflects the limitations of methods for comparison of sequences with very low identity. We evaluated the effectiveness of a Psi-BLAST search strategy to identify proteins of similar fold at low sequence identity. Psi-BLAST searches for structurally characterized low-sequence-identity matches were carried out on a set of over 300 proteins of known structure. Searches were conducted in NCBI's non-redundant database and were limited to three rounds. Some 614 potential homologs with 25% or lower sequence identity to 166 members of the search set were obtained. Disregarding the expect value, level of sequence identity and span of alignment, correspondence of fold between the target and potential homolog was found in more than 95% of the Psi-BLAST matches. Restrictions on expect value or span of alignment improved the false positive rate at the expense of eliminating many true homologs. Approximately three-quarters of the putative homologs obtained by three rounds of Psi-BLAST revealed no significant sequence similarity to the target protein upon direct sequence comparison by BLAST, and therefore could not be found by a conventional search. Although three rounds of Psi-BLAST identified many more homologs than a standard BLAST search, most homologs were undetected. It appears that more than 80% of all homologs to a target protein may be characterized by a lack of significant sequence similarity. We suggest that conservative use of Psi-BLAST has the potential to propose experimentally testable functions for the majority of proteins currently annotated as 'hypothetical proteins of unknown function'.
Collapse
Affiliation(s)
- F J Stevens
- Biosciences Division, Argonne National Laboratory, Argonne, IL 60439, USA.
| |
Collapse
|
46
|
Abstract
The accuracy of an alignment between two protein sequences can be improved by including other detectably related sequences in the comparison. We optimize and benchmark such an approach that relies on aligning two multiple sequence alignments, each one including one of the two protein sequences. Thirteen different protocols for creating and comparing profiles corresponding to the multiple sequence alignments are implemented in the SALIGN command of MODELLER. A test set of 200 pairwise, structure-based alignments with sequence identities below 40% is used to benchmark the 13 protocols as well as a number of previously described sequence alignment methods, including heuristic pairwise sequence alignment by BLAST, pairwise sequence alignment by global dynamic programming with an affine gap penalty function by the ALIGN command of MODELLER, sequence-profile alignment by PSI-BLAST, Hidden Markov Model methods implemented in SAM and LOBSTER, pairwise sequence alignment relying on predicted local structure by SEA, and multiple sequence alignment by CLUSTALW and COMPASS. The alignment accuracies of the best new protocols were significantly better than those of the other tested methods. For example, the fraction of the correctly aligned residues relative to the structure-based alignment by the best protocol is 56%, which can be compared with the accuracies of 26%, 42%, 43%, 48%, 50%, 49%, 43%, and 43% for the other methods, respectively. The new method is currently applied to large-scale comparative protein structure modeling of all known sequences.
Collapse
Affiliation(s)
- Marc A Marti-Renom
- Mission Bay Genentech Hall, University of California, San Francisco, San Francisco, CA 94143, USA.
| | | | | |
Collapse
|
47
|
Mayor LR, Fleming KP, Müller A, Balding DJ, Sternberg MJE. Clustering of protein domains in the human genome. J Mol Biol 2004; 340:991-1004. [PMID: 15236962 DOI: 10.1016/j.jmb.2004.05.036] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2003] [Revised: 03/30/2004] [Accepted: 05/17/2004] [Indexed: 11/30/2022]
Abstract
We present a systematic study of the clustering of genes within the human genome based on homology inferred from both sequence and structural similarity. The 3D-Genomics automated proteome annotation pipeline () was utilised to infer homology for each protein domain in the genome, for the 26 superfamilies most highly represented in the Structural Classification Of Proteins (SCOP) database. This approach enabled us to identify homologues that could not be detected by sequence-based methods alone. For each superfamily, we investigated the distribution, both within and among chromosomes, of genes encoding at least one domain within the superfamily. The results indicate a diversity of clustering behaviours: some superfamilies showed no evidence of any clustering, and others displayed significant clustering either within or among chromosomes, or both. Removal of tandem repeats reduced the levels of clustering observed, but some superfamilies still displayed highly significant clustering. Thus, our study suggests that either the process of gene duplication, or the evolution of the resulting clusters, differs between structural superfamilies.
Collapse
Affiliation(s)
- Lianne R Mayor
- Department of Epidemiology and Public Health, Imperial College, St Mary's Campus, London W2 1PG, UK
| | | | | | | | | |
Collapse
|
48
|
Kihara D, Skolnick J. Microbial genomes have over 72% structure assignment by the threading algorithm PROSPECTOR_Q. Proteins 2004; 55:464-73. [PMID: 15048836 DOI: 10.1002/prot.20044] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
The genome scale threading of five complete microbial genomes is revisited using our state-of-the-art threading algorithm, PROSPECTOR_Q. Considering that structure assignment to an ORF could be useful for predicting biochemical function as well as for analyzing pathways, it is important to assess the current status of genome scale threading. The fraction of ORFs to which we could assign protein structures with a reasonably good confidence level to each genome sequences is over 72%, which is significantly higher than earlier studies. Using the assigned structures, we have predicted the function of several ORFs through "single-function" template structures, obtained from an analysis of the relationship between protein fold and function. The fold distribution of the genomes and the effect of the number of homologous sequences on structure assignment are also discussed.
Collapse
Affiliation(s)
- Daisuke Kihara
- UB Center of Excellence in Bioinformatics, University at Buffalo, Buffalo, New York 14215, USA
| | | |
Collapse
|
49
|
Chakhaiyar P, Hasnain SE. Defining the mandate of tuberculosis research in a postgenomic era. Med Princ Pract 2004; 13:177-84. [PMID: 15181320 DOI: 10.1159/000078312] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/15/2003] [Accepted: 02/07/2004] [Indexed: 11/19/2022] Open
Abstract
The identification of Mycobacterium tuberculosis by Robert Koch in 1882 as the causative agent of tuberculosis, the release of the drug rifampicin in 1970 and the sequencing of the M. tuberculosis genome in 1998 are three major events that have revolutionized tuberculosis research. In spite of these breakthroughs, the continued status of tuberculosis as the largest killer amongst infectious diseases is an issue of major concern. Although directly observed short course chemotherapy exists to treat the disease, the emergence of drug-resistant strains has severely threatened the efficacy of the treatment. The recent sequencing of the M. tuberculosis genome holds promise for the development of new vaccines and the design of new drugs. This is all the more possible when the information from the genome sequence is combined with proteomics and structural and functional genomics. Such an integrated approach has led to the birth of a new field of research christened 'postgenomics' that holds substantial promise for the identification of novel drug targets and the potential to aid the development of new chemotherapeutic compounds to treat tuberculosis. The challenge before the scientific community therefore lies in elucidation of the wealth of information provided by the genome sequence and its translation into the design of novel therapies for the disease. All the major developments in the field of tuberculosis research after the sequencing of the M. tuberculosis genome will be discussed in this review.
Collapse
Affiliation(s)
- Prachee Chakhaiyar
- Laboratory of Molecular and Cellular Biology, Centre for DNA Fingerprinting and Diagnostics, Nacharam, Hyderabad 500 076, India
| | | |
Collapse
|
50
|
Ranea JAG, Buchan DWA, Thornton JM, Orengo CA. Evolution of protein superfamilies and bacterial genome size. J Mol Biol 2004; 336:871-87. [PMID: 15095866 DOI: 10.1016/j.jmb.2003.12.044] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2003] [Revised: 12/11/2003] [Accepted: 12/12/2003] [Indexed: 10/26/2022]
Abstract
We present the structural annotation of 56 different bacterial species based on the assignment of genes to 816 evolutionary superfamilies in the CATH domain structure database. These assignments have enabled us to analyse the recurrence of specific superfamilies within and across the genomes. We have selected the superfamilies that have a very broad representation and therefore appear to be universally distributed in a significant number of bacterial lineages. Occurrence profiles of these universally distributed superfamilies are compared with genome size in order to estimate the correlation between superfamily duplication and the increase in proteome size. This distinguishes between those size-dependent superfamilies where frequency of occurrence is highly correlated with increase in genome size, and size-independent superfamilies where no correlation is observed. Consideration of the size correlation and the ratio between the mean and the standard deviations for all the superfamily profiles allows more detailed subdivisions and classification of superfamilies. For example, within the size-independent superfamilies, we distinguished a group that are distributed evenly amongst all the genomes. Within the size-dependent superfamilies we differentiated two groups: linearly distributed and non-linearly distributed. Functional annotation using the COG database was performed for all superfamilies in each of these groups, and this revealed significant differences amongst the three sets of superfamilies. Evenly distributed, size-independent domains are shown to be involved primarily in protein translation and biosynthesis. For the size-dependent superfamilies, linearly distributed superfamilies are involved mainly in metabolism, and non-linearly distributed superfamily domains are involved principally in gene regulation.
Collapse
Affiliation(s)
- Juan A G Ranea
- Biomlolecular Structure and Modelling Group, Department of Biochemistry and Molecular Biology, University College London, London WC1E 6BT, UK.
| | | | | | | |
Collapse
|