1
|
Kotb HM, Davey NE. xProtCAS: A Toolkit for Extracting Conserved Accessible Surfaces from Protein Structures. Biomolecules 2023; 13:906. [PMID: 37371487 DOI: 10.3390/biom13060906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 05/23/2023] [Accepted: 05/24/2023] [Indexed: 06/29/2023] Open
Abstract
The identification of protein surfaces required for interaction with other biomolecules broadens our understanding of protein function, their regulation by post-translational modification, and the deleterious effect of disease mutations. Protein interaction interfaces are often identifiable as patches of conserved residues on a protein's surface. However, finding conserved accessible surfaces on folded regions requires an understanding of the protein structure to discriminate between functional and structural constraints on residue conservation. With the emergence of deep learning methods for protein structure prediction, high-quality structural models are now available for any protein. In this study, we introduce tools to identify conserved surfaces on AlphaFold2 structural models. We define autonomous structural modules from the structural models and convert these modules to a graph encoding residue topology, accessibility, and conservation. Conserved surfaces are then extracted using a novel eigenvector centrality-based approach. We apply the tool to the human proteome identifying hundreds of uncharacterised yet highly conserved surfaces, many of which contain clinically significant mutations. The xProtCAS tool is available as open-source Python software and an interactive web server.
Collapse
Affiliation(s)
- Hazem M Kotb
- Division of Cancer Biology, The Institute of Cancer Research, 237 Fulham Road, London SW3 6JB, UK
| | - Norman E Davey
- Division of Cancer Biology, The Institute of Cancer Research, 237 Fulham Road, London SW3 6JB, UK
| |
Collapse
|
2
|
Sinha M, Jagadeesan R, Kumar N, Saha S, Kothandan G, Kumar D. In-silico studies on Myo inositol-1-phosphate synthase of Leishmania donovani in search of anti-leishmaniasis. J Biomol Struct Dyn 2020; 40:3371-3384. [PMID: 33200690 DOI: 10.1080/07391102.2020.1847194] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Myo-inositol is one of the vital nutritional requirements for the Leishmania parasites' survival and virulence in the mammalian host. . Myo-inositol-1-phosphate synthase (MIPS) is responsible for the synthesis of myo-inositol in Leishmania, which plays a vital role in Leishmania's virulence to mammalian hosts. Earlier studies suggest MIP synthase as a potential drug target against which valproate was used as a drug. So, MIP synthase can be used as a target for anti-leishmanial drugs, and its inhibition may help in preventing leishmaniasis. The present study aims to identify valproate's potent analogs as drugs against MIP synthase of L. donovani (Ld-MIPS) with minimum side effects and toxicity to host.In this study, the three-dimensional structure of Ld-MIPS was built, followed by active site prediction. Ligand-based virtual screening was done using hybrid similarity recognition methods. The best 123 valproate analogs were filtered based on their quantitative structure activity relationship (QSAR) properties and were docked against Ld-MIPS using FlexX, PyRx and iGEMDOCK software. The topmost five ligands were selected for molecular dynamics simulation and pharmacokinetic analysis based on the docking score. Simulation studies up to 30 ns revealed that all five lead molecules bound with Ld-MIPS throughout MD simulation and there was no variation in their backbone. All the chosen inhibitors exhibited good pharmacokinetics/ADMET predictions with an excellent absorption profile, metabolism, oral bioavailability, solubility, excretion, and minimal toxicity, suggesting that these inhibitors may further be developed as anti-leishmaniasis drugs to prevent the spread of leishmaniasis.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Mousumi Sinha
- Department of Microbiology, Assam University, Silchar, Assam, India
| | - Rahul Jagadeesan
- CAS in Crystallography and Biophysics, Guindy Campus, University of Madras, Chennai, Tamil Nadu, India
| | - Neeraj Kumar
- Functional Genomics & Complex System Lab, Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology, Palampur, Himachal Pradesh, India
| | - Satabdi Saha
- Department of Microbiology, Assam University, Silchar, Assam, India
| | - Gugan Kothandan
- CAS in Crystallography and Biophysics, Guindy Campus, University of Madras, Chennai, Tamil Nadu, India
| | - Diwakar Kumar
- Department of Microbiology, Assam University, Silchar, Assam, India
| |
Collapse
|
3
|
Bhowmik D, Jagadeesan R, Rai P, Nandi R, Gugan K, Kumar D. Evaluation of potential drugs against leishmaniasis targeting catalytic subunit of Leishmania donovani nuclear DNA primase using ligand based virtual screening, docking and molecular dynamics approaches. J Biomol Struct Dyn 2020; 39:1838-1852. [PMID: 32141397 DOI: 10.1080/07391102.2020.1739557] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Leishmania donovani, causes leishmaniasis, a global health trouble with around 89 different countries and its population under its risk. Replication initiation events have been instrumental in regulating the DNA duplication and as the small subunit of L. donovani nuclear DNA primase (Ld-PriS) inherits the catalytic site, it plays a vital role in DNA replication. In this study we have aimed Ld-PriS for the first time as a prospective target for the application of drug against Leishmania parasite. 3-D structures of Ld-PriS were built and ligand-based virtual screening was performed using hybrid similarity recognition techniques. Ligands from the ZINC database were used for the screening purposes based on known DNA primase inhibitor Sphingosine as a query. Top 150 ligands were taken into consideration for molecular docking against the query protein (Ld-PriS) using PyRx and iGEMDOCK softwares. Top five compounds with the best docking score were selected for pharmacokinetic investigation and molecular dynamic simulation. These top five screened inhibitors showed very poor binding affinity toward the catalytic subunit of human primase indicating their safety toward the host normal replication mechanism. The top five compounds showed good pharmacokinetic profiles and ADMET predictions revealed good absorption, solubility, permeability, uniform distribution, proper metabolism, minimal toxicity and good bioavailability. Simulation studies upto 50 ns revealed the three leads ZINC000009219046, ZINC000025998119 and ZINC000004677901 bind with Ld-PriS throughout the simulation and there were no huge variations in their backbone suggesting that these three may play as potential lead compounds for developing new drug against leishmaniasis.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Deep Bhowmik
- Department of Microbiology, Assam University, Silchar, Assam, India
| | - Rahul Jagadeesan
- CAS in Crystallography and Biophysics, Guindy Campus, University of Madras, Chennai, India
| | - Praveen Rai
- Department of Biotechnology, Central University of Rajasthan, Bandarsindri, India
| | - Rajat Nandi
- Department of Microbiology, Assam University, Silchar, Assam, India
| | - Kothandan Gugan
- CAS in Crystallography and Biophysics, Guindy Campus, University of Madras, Chennai, India
| | - Diwakar Kumar
- Department of Microbiology, Assam University, Silchar, Assam, India
| |
Collapse
|
4
|
Abstract
Codon usage depends on mutation bias, tRNA-mediated selection, and the need for high efficiency and accuracy in translation. One codon in a synonymous codon family is often strongly over-used, especially in highly expressed genes, which often leads to a high dN/dS ratio because dS is very small. Many different codon usage indices have been proposed to measure codon usage and codon adaptation. Sense codon could be misread by release factors and stop codons misread by tRNAs, which also contribute to codon usage in rare cases. This chapter outlines the conceptual framework on codon evolution, illustrates codon-specific and gene-specific codon usage indices, and presents their applications. A new index for codon adaptation that accounts for background mutation bias (Index of Translation Elongation) is presented and contrasted with codon adaptation index (CAI) which does not consider background mutation bias. They are used to re-analyze data from a recent paper claiming that translation elongation efficiency matters little in protein production. The reanalysis disproves the claim.
Collapse
|
5
|
Meng F, Murray GF, Kurgan L, Donahue HJ. Functional and structural characterization of osteocytic MLO-Y4 cell proteins encoded by genes differentially expressed in response to mechanical signals in vitro. Sci Rep 2018; 8:6716. [PMID: 29712973 PMCID: PMC5928037 DOI: 10.1038/s41598-018-25113-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Accepted: 04/09/2018] [Indexed: 12/29/2022] Open
Abstract
The anabolic response of bone to mechanical load is partially the result of osteocyte response to fluid flow-induced shear stress. Understanding signaling pathways activated in osteocytes exposed to fluid flow could identify novel signaling pathways involved in the response of bone to mechanical load. Bioinformatics allows for a unique perspective and provides key first steps in understanding these signaling pathways. We examined proteins encoded by genes differentially expressed in response to fluid flow in murine osteocytic MLO-Y4 cells. We considered structural and functional characteristics including putative intrinsic disorder, evolutionary conservation, interconnectedness in protein-protein interaction networks, and cellular localization. Our analysis suggests that proteins encoded by fluid flow activated genes have lower than expected conservation, are depleted in intrinsic disorder, maintain typical levels of connectivity for the murine proteome, and are found in the cytoplasm and extracellular space. Pathway analyses reveal that these proteins are associated with cellular response to stress, chemokine and cytokine activity, enzyme binding, and osteoclast differentiation. The lower than expected disorder of proteins encoded by flow activated genes suggests they are relatively specialized.
Collapse
Affiliation(s)
- Fanchi Meng
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada
| | - Graeme F Murray
- Bone Engineering, Science and Technology (BEST) Laboratory, Department of Biomedical Engineering, Virginia Commonwealth University, Richmond, Virginia, United States of America
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia, United States of America.
| | - Henry J Donahue
- Bone Engineering, Science and Technology (BEST) Laboratory, Department of Biomedical Engineering, Virginia Commonwealth University, Richmond, Virginia, United States of America.
| |
Collapse
|
6
|
Ma L, Wang DD, Zou B, Yan H. An Eigen-Binding Site Based Method for the Analysis of Anti-EGFR Drug Resistance in Lung Cancer Treatment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1187-1194. [PMID: 27187970 DOI: 10.1109/tcbb.2016.2568184] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
We explore the drug resistance mechanism in non-small cell lung cancer treatment by characterizing the drug-binding site of a protein mutant based on local surface and energy features. These features are transformed to an eigen-binding site space and used for drug resistance level prediction and analysis.
Collapse
|
7
|
Moll M, Finn PW, Kavraki LE. Structure-guided selection of specificity determining positions in the human Kinome. BMC Genomics 2016; 17 Suppl 4:431. [PMID: 27556159 PMCID: PMC5001202 DOI: 10.1186/s12864-016-2790-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Background The human kinome contains many important drug targets. It is well-known that inhibitors of protein kinases bind with very different selectivity profiles. This is also the case for inhibitors of many other protein families. The increased availability of protein 3D structures has provided much information on the structural variation within a given protein family. However, the relationship between structural variations and binding specificity is complex and incompletely understood. We have developed a structural bioinformatics approach which provides an analysis of key determinants of binding selectivity as a tool to enhance the rational design of drugs with a specific selectivity profile. Results We propose a greedy algorithm that computes a subset of residue positions in a multiple sequence alignment such that structural and chemical variation in those positions helps explain known binding affinities. By providing this information, the main purpose of the algorithm is to provide experimentalists with possible insights into how the selectivity profile of certain inhibitors is achieved, which is useful for lead optimization. In addition, the algorithm can also be used to predict binding affinities for structures whose affinity for a given inhibitor is unknown. The algorithm’s performance is demonstrated using an extensive dataset for the human kinome. Conclusion We show that the binding affinity of 38 different kinase inhibitors can be explained with consistently high precision and accuracy using the variation of at most six residue positions in the kinome binding site. We show for several inhibitors that we are able to identify residues that are known to be functionally important.
Collapse
Affiliation(s)
- Mark Moll
- Department of Computer Science, Rice University, PO Box 1892, Houston, 77251, TX, USA.
| | - Paul W Finn
- University of Buckingham, Hunter St, Buckingham, UK
| | - Lydia E Kavraki
- Department of Computer Science, Rice University, PO Box 1892, Houston, 77251, TX, USA
| |
Collapse
|
8
|
Melamed D, Young DL, Miller CR, Fields S. Combining natural sequence variation with high throughput mutational data to reveal protein interaction sites. PLoS Genet 2015; 11:e1004918. [PMID: 25671604 PMCID: PMC4335499 DOI: 10.1371/journal.pgen.1004918] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2014] [Accepted: 11/24/2014] [Indexed: 12/29/2022] Open
Abstract
Many protein interactions are conserved among organisms despite changes in the amino acid sequences that comprise their contact sites, a property that has been used to infer the location of these sites from protein homology. In an inter-species complementation experiment, a sequence present in a homologue is substituted into a protein and tested for its ability to support function. Therefore, substitutions that inhibit function can identify interaction sites that changed over evolution. However, most of the sequence differences within a protein family remain unexplored because of the small-scale nature of these complementation approaches. Here we use existing high throughput mutational data on the in vivo function of the RRM2 domain of the Saccharomyces cerevisiae poly(A)-binding protein, Pab1, to analyze its sites of interaction. Of 197 single amino acid differences in 52 Pab1 homologues, 17 reduce the function of Pab1 when substituted into the yeast protein. The majority of these deleterious mutations interfere with the binding of the RRM2 domain to eIF4G1 and eIF4G2, isoforms of a translation initiation factor. A large-scale mutational analysis of the RRM2 domain in a two-hybrid assay for eIF4G1 binding supports these findings and identifies peripheral residues that make a smaller contribution to eIF4G1 binding. Three single amino acid substitutions in yeast Pab1 corresponding to residues from the human orthologue are deleterious and eliminate binding to the yeast eIF4G isoforms. We create a triple mutant that carries these substitutions and other humanizing substitutions that collectively support a switch in binding specificity of RRM2 from the yeast eIF4G1 to its human orthologue. Finally, we map other deleterious substitutions in Pab1 to inter-domain (RRM2–RRM1) or protein-RNA (RRM2–poly(A)) interaction sites. Thus, the combined approach of large-scale mutational data and evolutionary conservation can be used to characterize interaction sites at single amino acid resolution. The interactions of proteins with each other are essential for almost all biological processes. Many of the sites of protein contact have evolved to maintain these interactions, but use different sets of amino acid residues. As a result, the residues at a contact site in a protein from one species might not allow a protein interaction when they are tested in a second species. This property underlies the idea of inter-species complementation assays, which test the effect of replacing protein segments from one species by their equivalents from another species. However, this approach has been highly limited in the number of changes that could be analyzed in a single study. Here, we present a novel approach that combines a high-throughput analysis of mutations in a single protein with the set of natural sequences corresponding to evolutionarily divergent variants of this protein. This integration step allows us to map at high resolution both sites of inter-protein interaction as well as intra-protein interaction. Our approach can be used with proteins that have limited functional and structural data, and it can be applied to improve the performance of computational tools that use sequence homology to predict function.
Collapse
Affiliation(s)
- Daniel Melamed
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington, United States of America
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
- * E-mail:
| | - David L. Young
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Christina R. Miller
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington, United States of America
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Stanley Fields
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington, United States of America
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
- Department of Medicine, University of Washington, Seattle, Washington, United States of America
| |
Collapse
|
9
|
Comparative modeling and virtual screening for the identification of novel inhibitors for myo-inositol-1-phosphate synthase. Mol Biol Rep 2014; 41:5039-52. [PMID: 24752405 DOI: 10.1007/s11033-014-3370-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2013] [Accepted: 04/05/2014] [Indexed: 12/20/2022]
Abstract
Myo-inositol-1-phosphate (MIP) synthase is a key enzyme in the myo-inositol biosynthesis pathway. Disruption of the inositol signaling pathway is associated with bipolar disorders. Previous work suggested that MIP synthase could be an attractive target for the development of anti-bipolar drugs. Inhibition of this enzyme could possibly help in reducing the risk of a disease in patients. With this objective, three dimensional structure of the protein was modeled followed by the active site prediction. For the first time, computational studies were carried out to obtain structural insights into the interactive behavior of this enzyme with ligands. Virtual screening was carried out using FILTER, ROCS and EON modules of the OpenEye scientific software. Natural products from the ZINC database were used for the screening process. Resulting compounds were docked into active site of the target protein using FRED (Fast Rigid Exhaustive Docking) and GOLD (Genetic Optimization for Ligand Docking) docking programs. The analysis indicated extensive hydrogen bonding network and hydrophobic interactions which play a significant role in ligand binding. Four compounds are shortlisted and their binding assay analysis is underway.
Collapse
|
10
|
Lavanya P, Ramaiah S, Anbarasu A. Influence of C-H...O interactions on the structural stability of β-lactamases. J Biol Phys 2013; 39:649-63. [PMID: 23996409 DOI: 10.1007/s10867-013-9324-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2013] [Accepted: 05/26/2013] [Indexed: 01/31/2023] Open
Abstract
β-Lactamases produced by pathogenic bacteria cleave β-lactam antibiotics and render them ineffective. Understanding the principles that govern the structural stability of β-lactamases requires elucidation of the nature of the interactions that are involved in stabilization. In the present study, we systematically analyze the influence of CH...O interactions on determining the specificity and stability of β-lactamases in relation to environmental preferences. It is interesting to note that all the residues located in the active site of β-lactamases are involved in CH...O interactions. A significant percentage of CH...O interactions have a higher conservation score and short-range interactions are the predominant type of interactions in β-lactamases. These results will be useful in understanding the stability patterns of β-lactamases.
Collapse
Affiliation(s)
- P Lavanya
- Medical & Biological Computing Laboratory, School of Biosciences and Technology, VIT University, Vellore 632014, Tamil Nadu, India
| | | | | |
Collapse
|
11
|
Xia X. Position weight matrix, gibbs sampler, and the associated significance tests in motif characterization and prediction. SCIENTIFICA 2012; 2012:917540. [PMID: 24278755 PMCID: PMC3820676 DOI: 10.6064/2012/917540] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2012] [Accepted: 10/11/2012] [Indexed: 05/31/2023]
Abstract
Position weight matrix (PWM) is not only one of the most widely used bioinformatic methods, but also a key component in more advanced computational algorithms (e.g., Gibbs sampler) for characterizing and discovering motifs in nucleotide or amino acid sequences. However, few generally applicable statistical tests are available for evaluating the significance of site patterns, PWM, and PWM scores (PWMS) of putative motifs. Statistical significance tests of the PWM output, that is, site-specific frequencies, PWM itself, and PWMS, are in disparate sources and have never been collected in a single paper, with the consequence that many implementations of PWM do not include any significance test. Here I review PWM-based methods used in motif characterization and prediction (including a detailed illustration of the Gibbs sampler for de novo motif discovery), present statistical and probabilistic rationales behind statistical significance tests relevant to PWM, and illustrate their application with real data. The multiple comparison problem associated with the test of site-specific frequencies is best handled by false discovery rate methods. The test of PWM, due to the use of pseudocounts, is best done by resampling methods. The test of individual PWMS for each sequence segment should be based on the extreme value distribution.
Collapse
Affiliation(s)
- Xuhua Xia
- Department of Biology, University of Ottawa, 30 Marie Curie, Ottawa, ON, Canada K1N 6N5
| |
Collapse
|
12
|
Glembo TJ, Farrell DW, Gerek ZN, Thorpe MF, Ozkan SB. Collective dynamics differentiates functional divergence in protein evolution. PLoS Comput Biol 2012; 8:e1002428. [PMID: 22479170 PMCID: PMC3315450 DOI: 10.1371/journal.pcbi.1002428] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2011] [Accepted: 01/30/2012] [Indexed: 12/29/2022] Open
Abstract
Protein evolution is most commonly studied by analyzing related protein sequences and generating ancestral sequences through Bayesian and Maximum Likelihood methods, and/or by resurrecting ancestral proteins in the lab and performing ligand binding studies to determine function. Structural and dynamic evolution have largely been left out of molecular evolution studies. Here we incorporate both structure and dynamics to elucidate the molecular principles behind the divergence in the evolutionary path of the steroid receptor proteins. We determine the likely structure of three evolutionarily diverged ancestral steroid receptor proteins using the Zipping and Assembly Method with FRODA (ZAMF). Our predictions are within ∼2.7 Å all-atom RMSD of the respective crystal structures of the ancestral steroid receptors. Beyond static structure prediction, a particular feature of ZAMF is that it generates protein dynamics information. We investigate the differences in conformational dynamics of diverged proteins by obtaining the most collective motion through essential dynamics. Strikingly, our analysis shows that evolutionarily diverged proteins of the same family do not share the same dynamic subspace, while those sharing the same function are simultaneously clustered together and distant from those, that have functionally diverged. Dynamic analysis also enables those mutations that most affect dynamics to be identified. It correctly predicts all mutations (functional and permissive) necessary to evolve new function and ∼60% of permissive mutations necessary to recover ancestral function. Proteins are remarkable machines of the living systems that show diverse biochemical functions. Biochemical diversity has grown over time via molecular evolution. In order to understand how diversity arose, it is fundamental to understand how the earliest proteins evolved and served as templates for the present diverse proteome. The one sequence - one structure - one function paradigm is being extended to a new view: an ensemble of different conformations in equilibrium can evolve new function and the analysis of inherent structural dynamics is crucial to give a more complete understanding of protein evolution. Therefore, we aim to bring structural dynamics into protein evolution through our zipping and assembly method with FRODA. (ZAMF). We apply ZAMF to simultaneously obtain structures and structural dynamics of three ancestral sequences of steroid receptor proteins. By comparative dynamics analysis among the three ancestral steroid hormone receptors: (i) we show that changes in the structural dynamics indicates functional divergence and (ii) we identify all functionally critical and most of the permissive mutations necessary to evolve new function. Overall, all these findings suggest that conformational dynamics may play an important role where new functions evolve through novel molecular interactions.
Collapse
Affiliation(s)
- Tyler J. Glembo
- Center for Biological Physics, Department of Physics, Arizona State University, Tempe, Arizona, United States of America
| | - Daniel W. Farrell
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, United States of America
| | - Z. Nevin Gerek
- Center for Biological Physics, Department of Physics, Arizona State University, Tempe, Arizona, United States of America
| | - M. F. Thorpe
- Center for Biological Physics, Department of Physics, Arizona State University, Tempe, Arizona, United States of America
| | - S. Banu Ozkan
- Center for Biological Physics, Department of Physics, Arizona State University, Tempe, Arizona, United States of America
- * E-mail:
| |
Collapse
|
13
|
Adams R, Worth CL, Guenther S, Dunkel M, Lehmann R, Preissner R. Binding sites in membrane proteins--diversity, druggability and prospects. Eur J Cell Biol 2011; 91:326-39. [PMID: 21872966 DOI: 10.1016/j.ejcb.2011.06.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2010] [Revised: 06/22/2011] [Accepted: 06/22/2011] [Indexed: 11/27/2022] Open
Abstract
The identification of novel drug targets is one of the major challenges in proteomics. Computational methods developed over the last decade have enhanced the process of drug design in both terms of time and quality. The main task is the design of selective compounds, which bind targets more specifically, dependent on the desired mode of action of the particular drug. This makes it necessary to create compounds, which either exhibit their functions on one single protein to exclude undesired cross-reactivity or to use the advantageous effect of less selective drugs that target numerous proteins and therefore exhibit their functions on whole protein classes. Main aspects in the assignment of interactions between ligands and putative targets involve the amino acid composition of the binding site, evolutionary conservation and similarity in sequence and structure of known targets. Similarities or differences within classified protein families can be the key to their function and give first hints to functional drug design. Hereby, binding site-based classification outnumbers sequence-based classifications since similar binding sites can also be found in more distant proteins. Membrane proteins are 'difficult targets', because of their special physicochemical characteristics and the general lack of structural information. Here, we describe recent advances in modeling methods dedicated to membrane proteins. Different descriptors of similarity between compounds and the similarity between binding sites are under development and elucidate important aspects like dynamics or entropy. The importance of computational drug design is undisputable. Nevertheless, the process of design is complicated by increasing complexity, which underlines the importance of accurate knowledge about the addressed target class(es) and particularly their binding sites. One main objective by considering named topics is to predict putative side effects and errant functions (off-target effects) of novel drugs, which requires a holistic (systems biology) view on drug-target-pathway relations. In the following, we give a brief summary about the recent discussion on drug-target interactions with emphasis on membrane proteins.
Collapse
Affiliation(s)
- Robert Adams
- Charité-Universitätsklinikum Berlin, Structural Bioinformatics Group, Lindenberger Weg 80, 13125 Berlin, Germany
| | | | | | | | | | | |
Collapse
|
14
|
Nilsson J, Grahn M, Wright APH. Proteome-wide evidence for enhanced positive Darwinian selection within intrinsically disordered regions in proteins. Genome Biol 2011; 12:R65. [PMID: 21771306 PMCID: PMC3218827 DOI: 10.1186/gb-2011-12-7-r65] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2011] [Revised: 05/31/2011] [Accepted: 07/19/2011] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Understanding the adaptive changes that alter the function of proteins during evolution is an important question for biology and medicine. The increasing number of completely sequenced genomes from closely related organisms, as well as individuals within species, facilitates systematic detection of recent selection events by means of comparative genomics. RESULTS We have used genome-wide strain-specific single nucleotide polymorphism data from 64 strains of budding yeast (Saccharomyces cerevisiae or Saccharomyces paradoxus) to determine whether adaptive positive selection is correlated with protein regions showing propensity for different classes of structure conformation. Data from phylogenetic and population genetic analysis of 3,746 gene alignments consistently shows a significantly higher degree of positive Darwinian selection in intrinsically disordered regions of proteins compared to regions of alpha helix, beta sheet or tertiary structure. Evidence of positive selection is significantly enriched in classes of proteins whose functions and molecular mechanisms can be coupled to adaptive processes and these classes tend to have a higher average content of intrinsically unstructured protein regions. CONCLUSIONS We suggest that intrinsically disordered protein regions may be important for the production and maintenance of genetic variation with adaptive potential and that they may thus be of central significance for the evolvability of the organism or cell in which they occur.
Collapse
Affiliation(s)
- Johan Nilsson
- School of Life Sciences, Södertörn University, SE-141 89 Huddinge, Sweden.
| | | | | |
Collapse
|
15
|
Structural bioinformatics: deriving biological insights from protein structures. Interdiscip Sci 2010; 2:347-66. [PMID: 21153779 DOI: 10.1007/s12539-010-0045-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2010] [Revised: 06/18/2010] [Accepted: 06/21/2010] [Indexed: 12/27/2022]
Abstract
Structural bioinformatics can be described as an approach that will help decipher biological insights from protein structures. As an important component of structural biology, this area promises to provide a high resolution understanding of biology by assisting comprehension and interpretation of a large amount of structural data. Biological function of protein molecules can be inferred from their three-dimensional structures by comparing structures, classifying them and transferring function from a related protein or family. It is well known now that the structure space of protein molecules is more conserved than the sequence space, making it important to seek functional associations at the structural level. An added advantage of structural bioinformatics over simpler sequence-based methods is that the former also provides ultimate insights into the mechanisms by which various biological events take place. A bird's eye-view of the different aspects of structural bioinformatics is given here along with various recent advances in the area including how knowledge obtained from structural bioinformatics can be applied in drug discovery.
Collapse
|
16
|
Moll M, Bryant DH, Kavraki LE. The LabelHash algorithm for substructure matching. BMC Bioinformatics 2010; 11:555. [PMID: 21070651 PMCID: PMC2996407 DOI: 10.1186/1471-2105-11-555] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2010] [Accepted: 11/11/2010] [Indexed: 08/30/2023] Open
Abstract
Background There is an increasing number of proteins with known structure but unknown function. Determining their function would have a significant impact on understanding diseases and designing new therapeutics. However, experimental protein function determination is expensive and very time-consuming. Computational methods can facilitate function determination by identifying proteins that have high structural and chemical similarity. Results We present LabelHash, a novel algorithm for matching substructural motifs to large collections of protein structures. The algorithm consists of two phases. In the first phase the proteins are preprocessed in a fashion that allows for instant lookup of partial matches to any motif. In the second phase, partial matches for a given motif are expanded to complete matches. The general applicability of the algorithm is demonstrated with three different case studies. First, we show that we can accurately identify members of the enolase superfamily with a single motif. Next, we demonstrate how LabelHash can complement SOIPPA, an algorithm for motif identification and pairwise substructure alignment. Finally, a large collection of Catalytic Site Atlas motifs is used to benchmark the performance of the algorithm. LabelHash runs very efficiently in parallel; matching a motif against all proteins in the 95% sequence identity filtered non-redundant Protein Data Bank typically takes no more than a few minutes. The LabelHash algorithm is available through a web server and as a suite of standalone programs at http://labelhash.kavrakilab.org. The output of the LabelHash algorithm can be further analyzed with Chimera through a plugin that we developed for this purpose. Conclusions LabelHash is an efficient, versatile algorithm for large-scale substructure matching. When LabelHash is running in parallel, motifs can typically be matched against the entire PDB on the order of minutes. The algorithm is able to identify functional homologs beyond the twilight zone of sequence identity and even beyond fold similarity. The three case studies presented in this paper illustrate the versatility of the algorithm.
Collapse
Affiliation(s)
- Mark Moll
- Department of Computer Science, Rice University, Houston, TX 77005, USA.
| | | | | |
Collapse
|
17
|
Slama P, Geman D. Identification of family-determining residues in PHD fingers. Nucleic Acids Res 2010; 39:1666-79. [PMID: 21059680 PMCID: PMC3061080 DOI: 10.1093/nar/gkq947] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Histone modifications are fundamental to chromatin structure and transcriptional regulation, and are recognized by a limited number of protein folds. Among these folds are PHD fingers, which are present in most chromatin modification complexes. To date, about 15 PHD finger domains have been structurally characterized, whereas hundreds of different sequences have been identified. Consequently, an important open problem is to predict structural features of a PHD finger knowing only its sequence. Here, we classify PHD fingers into different groups based on the analysis of residue–residue co-evolution in their sequences. We measure the degree to which fixing the amino acid type at one position modifies the frequencies of amino acids at other positions. We then detect those position/amino acid combinations, or ‘conditions’, which have the strongest impact on other sequence positions. Clustering these strong conditions yields four families, providing informative labels for PHD finger sequences. Existing experimental results, as well as docking calculations performed here, reveal that these families indeed show discrepancies at the functional level. Our method should facilitate the functional characterization of new PHD fingers, as well as other protein families, solely based on sequence information.
Collapse
Affiliation(s)
- Patrick Slama
- Institute for Computational Medicine and Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA.
| | | |
Collapse
|
18
|
Harms MJ, Thornton JW. Analyzing protein structure and function using ancestral gene reconstruction. Curr Opin Struct Biol 2010; 20:360-6. [PMID: 20413295 DOI: 10.1016/j.sbi.2010.03.005] [Citation(s) in RCA: 156] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2010] [Accepted: 03/22/2010] [Indexed: 01/06/2023]
Abstract
Protein families with functionally diverse members can illuminate the structural determinants of protein function and the process by which protein structure and function evolve. To identify the key amino acid changes that differentiate one family member from another, most studies have taken a horizontal approach, swapping candidate residues between present-day family members. This approach has often been stymied, however, by the fact that shifts in function often require multiple interacting mutations; chimeric proteins are often nonfunctional, either because one lineage has amassed mutations that are incompatible with key residues that conferred a new function on other lineages, or because it lacks mutations required to support those key residues. These difficulties can be overcome by using a vertical strategy, which reconstructs ancestral genes and uses them as the appropriate background in which to study the effects of historical mutations on functional diversification. In this review, we discuss the advantages of the vertical strategy and highlight several exemplary studies that have used ancestral gene reconstruction to reveal the molecular underpinnings of protein structure, function, and evolution.
Collapse
Affiliation(s)
- Michael J Harms
- Howard Hughes Medical Institute, Center for Ecology and Evolutionary Biology, University of Oregon, Eugene, OR 97403, USA.
| | | |
Collapse
|
19
|
Procter JB, Thompson J, Letunic I, Creevey C, Jossinet F, Barton GJ. Visualization of multiple alignments, phylogenies and gene family evolution. Nat Methods 2010; 7:S16-25. [PMID: 20195253 DOI: 10.1038/nmeth.1434] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Software for visualizing sequence alignments and trees are essential tools for life scientists. In this review, we describe the major features and capabilities of a selection of stand-alone and web-based applications useful when investigating the function and evolution of a gene family. These range from simple viewers, to systems that provide sophisticated editing and analysis functions. We conclude with a discussion of the challenges that these tools now face due to the flood of next generation sequence data and the increasingly complex network of bioinformatics information sources.
Collapse
|
20
|
Field SF, Matz MV. Retracing evolution of red fluorescence in GFP-like proteins from Faviina corals. Mol Biol Evol 2010; 27:225-33. [PMID: 19793832 PMCID: PMC2877551 DOI: 10.1093/molbev/msp230] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Proteins of the green fluorescent protein family represent a convenient experimental model to study evolution of novelty at the molecular level. Here, we focus on the origin of Kaede-like red fluorescent proteins characteristic of the corals of the Faviina suborder. We demonstrate, using an original approach involving resurrection and analysis of the library of possible evolutionary intermediates, that it takes on the order of 12 mutations, some of which strongly interact epistatically, to fully recapitulate the evolution of a red fluorescent phenotype from the ancestral green. Five of the identified mutations would not have been found without the help of ancestral reconstruction, because the corresponding site states are shared between extant red and green proteins due to their recent descent from a dual-function common ancestor. Seven of the 12 mutations affect residues that are not in close contact with the chromophore and thus must exert their effect indirectly through adjustments of the overall protein fold; the relevance of these mutations could not have been anticipated from the purely theoretical analysis of the protein's structure. Our results introduce a powerful experimental approach for comparative analysis of functional specificity in protein families even in the cases of pronounced epistasis, provide foundation for the detailed studies of evolutionary trajectories leading to novelty and complexity, and will help rational modification of existing fluorescent labels.
Collapse
Affiliation(s)
| | - Mikhail V. Matz
- Section of Integrative Biology, University of Texas at Austin
| |
Collapse
|
21
|
Chakrabarti S, Panchenko AR. Structural and functional roles of coevolved sites in proteins. PLoS One 2010; 5:e8591. [PMID: 20066038 PMCID: PMC2797611 DOI: 10.1371/journal.pone.0008591] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2009] [Accepted: 10/19/2009] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Understanding the residue covariations between multiple positions in protein families is very crucial and can be helpful for designing protein engineering experiments. These simultaneous changes or residue coevolution allow protein to maintain its overall structural-functional integrity while enabling it to acquire specific functional modifications. Despite the significant efforts in the field there is still controversy in terms of the preferable locations of coevolved residues on different regions of protein molecules, the strength of coevolutionary signal and role of coevolution in functional diversification. METHODOLOGY In this paper we study the scale and nature of residue coevolution in maintaining the overall functionality and structural integrity of proteins. We employed a large scale study to investigate the structural and functional aspects of coevolved residues. We found that the networks representing the coevolutionary residue connections within our dataset are in general of 'small-world' type as they have clustering coefficient values higher than random networks and also show smaller mean shortest path lengths similar and/or lower than random and regular networks. We also found that altogether 11% of functionally important sites are coevolved with any other sites. Active sites are found more frequently to coevolve with any other sites (15%) compared to protein (11%) and ligand (9%) binding sites. Metal binding and active sites are also found to be more frequently coevolved with other metal binding and active sites, respectively. Analysis of the coupling between coevolutionary processes and the spatial distribution of coevolved sites reveals that a high fraction of coevolved sites are located close to each other. Moreover, approximately 80% of charge compensatory substitutions within coevolved sites are found at very close spatial proximity (<or= 5A), pointing to the possible preservation of salt bridges in evolution. CONCLUSION Our findings show that a noticeable fraction of functionally important sites undergo coevolution and also point towards compensatory substitutions as a probable coevolutionary mechanism within spatially proximal coevolved functional sites.
Collapse
Affiliation(s)
- Saikat Chakrabarti
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail: (SC); (ARP)
| | - Anna R. Panchenko
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail: (SC); (ARP)
| |
Collapse
|
22
|
Georgi B, Schultz J, Schliep A. Partially-supervised protein subclass discovery with simultaneous annotation of functional residues. BMC STRUCTURAL BIOLOGY 2009; 9:68. [PMID: 19857261 PMCID: PMC2777906 DOI: 10.1186/1472-6807-9-68] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/03/2009] [Accepted: 10/26/2009] [Indexed: 03/20/2023]
Abstract
BACKGROUND The study of functional subfamilies of protein domain families and the identification of the residues which determine substrate specificity is an important question in the analysis of protein domains. One way to address this question is the use of clustering methods for protein sequence data and approaches to predict functional residues based on such clusterings. The locations of putative functional residues in known protein structures provide insights into how different substrate specificities are reflected on the protein structure level. RESULTS We have developed an extension of the context-specific independence mixture model clustering framework which allows for the integration of experimental data. As these are usually known only for a few proteins, our algorithm implements a partially-supervised learning approach. We discover domain subfamilies and predict functional residues for four protein domain families: phosphatases, pyridoxal dependent decarboxylases, WW and SH3 domains to demonstrate the usefulness of our approach. CONCLUSION The partially-supervised clustering revealed biologically meaningful subfamilies even for highly heterogeneous domains and the predicted functional residues provide insights into the basis of the different substrate specificities.
Collapse
Affiliation(s)
- Benjamin Georgi
- Max Planck Institute for Molecular Genetics, Dept, of Computational Molecular Biology, Ihnestrasse 73, 14195 Berlin, Germany.
| | | | | |
Collapse
|
23
|
Tang K, Pugalenthi G, Suganthan PN, Lanczycki CJ, Chakrabarti S. Prediction of functionally important sites from protein sequences using sparse kernel least squares classifiers. Biochem Biophys Res Commun 2009; 384:155-9. [PMID: 19394310 DOI: 10.1016/j.bbrc.2009.04.096] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2009] [Accepted: 04/20/2009] [Indexed: 11/25/2022]
Abstract
Identification of functionally important sites (FIS) in proteins is a critical problem and can have profound importance where protein structural information is limited. Machine learning techniques have been very useful in successful classification of many important biological problems. In this paper, we adopt the sparse kernel least squares classifiers (SKLSC) approach for classification and/or prediction of FIS using protein sequence derived features. The SKLSC algorithm was applied to 5435 FIS that have been extracted from 312 reliable alignments for a wide range of protein families. We obtained 68.28% sensitivity and 68.66% specificity for training dataset and 65.34% sensitivity and 66.88% specificity for testing dataset. Further, large scale benchmarking study using alignments of 101 protein families containing 1899 FIS showed that our method achieved an average approximately 70% sensitivity in predicting different types of FIS, such as active sites, metal, ligand or protein binding sites. Our findings also indicate that active sites and metal binding sites are comparably easier to predict compared to the ligand and protein binding sites. Despite moderate success, our results suggest the usefulness and potential of SKLSC approach in prediction of FIS using only protein sequence derived information.
Collapse
Affiliation(s)
- Ke Tang
- NICAL, Department of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui, China
| | | | | | | | | |
Collapse
|
24
|
Marsh L. Spatial autocorrelation of amino Acid replacement rates in the vasopressin receptor family. J Mol Evol 2008; 68:28-39. [PMID: 19052795 DOI: 10.1007/s00239-008-9183-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2008] [Revised: 10/29/2008] [Accepted: 11/11/2008] [Indexed: 11/24/2022]
Abstract
Evolutionary rates of sites can be independent of one another or correlated in some fashion. Significant spatial autocorrelation was observed for site amino acid replacement rates in vasopressin receptor family proteins (VPRs). Spatial autocorrelation of rates is the propensity of residues to lie near other residues of similar rate in the folded protein structure. Optimal correlation occurred at a distance suggesting that residues in contact had correlated rates. As another way to study the same phenomenon, VPR was partitioned into >40 x 10 A(3) contiguous spatial clusters for amino acid replacement rate estimation. Partitioning was done without preconception of functional regions of the protein and with a random partition control. Cluster rates exhibited an overdispersed distribution suggesting that rates were not randomly distributed in the spatial partitions. In tests, cluster partitioning improved maximum likelihood and Bayesian likelihood models for VPR evolution. Spatial clusters with outlier rates, or lineage-specific clusters differing in rate, proved to contain VPR features likely to be under selection. Thus the spatial autocorrelation observed is probably not just a statistical finding, but likely has an evolutionary basis in protein function.
Collapse
Affiliation(s)
- Lorraine Marsh
- Department of Biology, Long Island University, Brooklyn, NY, 11201, USA.
| |
Collapse
|
25
|
Bauer RA, Günther S, Jansen D, Heeger C, Thaben PF, Preissner R. SuperSite: dictionary of metabolite and drug binding sites in proteins. Nucleic Acids Res 2008; 37:D195-200. [PMID: 18842629 PMCID: PMC2686477 DOI: 10.1093/nar/gkn618] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The increasing structural information about target-bound compounds provide a rich basis to study the binding mechanisms of metabolites and drugs. SuperSite is a database, which combines the structural information with various tools for the analysis of molecular recognition. The main data is made up of 8000 metabolites including 1300 drugs, bound to about 290 000 different receptor binding sites. The analysis tools include features, like the highlighting of evolutionary conserved receptor residues, the marking of putative binding pockets and the superpositioning of different binding sites of the same ligand. User-defined compounds can be edited or uploaded and will be superimposed with the most similar co-crystallized ligand. The user can examine all results online with the molecule viewer Jmol. An implemented search algorithm allows the screening of uploaded proteins, in order to detect potential drug binding sites, which are similar to known binding pockets. The huge data set of target-bound compounds in combination with the provided analysis tools allow to inspect the characteristics of molecular recognition, especially for drug target interactions. SuperSite is publicly available at: http://bioinformatics.charite.de/supersite.
Collapse
Affiliation(s)
- Raphael André Bauer
- Institute of Molecular Biology and Bioinformatics, Structural Bioinformatics Group, Charité- Medical University Berlin, Arnimallee 22, 14195 Berlin, Germany
| | | | | | | | | | | |
Collapse
|
26
|
Lee BC, Park K, Kim D. Analysis of the residue-residue coevolution network and the functionally important residues in proteins. Proteins 2008; 72:863-72. [PMID: 18275083 DOI: 10.1002/prot.21972] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
It is a common belief that some residues of a protein are more important than others. In some cases, point mutations of some residues make butterfly effect on the protein structure and function, but in other cases they do not. In addition, the residues important for the protein function tend to be not only conserved but also coevolved with other interacting residues in a protein. Motivated by these observations, the authors propose that there is a network composed of the residues, the residue-residue coevolution network (RRCN), where nodes are residues and links are set when the coevolutionary interaction strengths between residues are sufficiently large. The authors build the RRCN for the 44 diverse protein families. The interaction strengths are calculated by using McBASC algorithm. After constructing the RRCN, the authors identify residues that have high degree of connectivity (hub nodes), and residues that play a central role in network flow of information (C(I) nodes). The authors show that these residues are likely to be functionally important residues. Moreover, the C(I) nodes appear to be more relevant to the function than the hub nodes. Unlike other similar methods, the method described in this study is solely based on sequences. Therefore, the method can be applied to the function annotation of a wider range of proteins.
Collapse
Affiliation(s)
- Byung-Chul Lee
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon 305-701, Korea
| | | | | |
Collapse
|
27
|
Liu ZP, Wu LY, Wang Y, Zhang XS, Chen L. Bridging protein local structures and protein functions. Amino Acids 2008; 35:627-50. [PMID: 18421562 PMCID: PMC7088341 DOI: 10.1007/s00726-008-0088-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2008] [Accepted: 03/10/2008] [Indexed: 12/11/2022]
Abstract
One of the major goals of molecular and evolutionary biology is to understand the functions of proteins by extracting functional information from protein sequences, structures and interactions. In this review, we summarize the repertoire of methods currently being applied and report recent progress in the field of in silico annotation of protein function based on the accumulation of vast amounts of sequence and structure data. In particular, we emphasize the newly developed structure-based methods, which are able to identify locally structural motifs and reveal their relationship with protein functions. These methods include computational tools to identify the structural motifs and reveal the strong relationship between these pre-computed local structures and protein functions. We also discuss remaining problems and possible directions for this exciting and challenging area.
Collapse
Affiliation(s)
- Zhi-Ping Liu
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, 100080, Beijing, China
| | | | | | | | | |
Collapse
|
28
|
Lanczycki CJ, Chakrabarti S. A tool for the prediction of functionally important sites in proteins using a library of functional templates. Bioinformation 2008; 2:279-83. [PMID: 18478080 PMCID: PMC2374371 DOI: 10.6026/97320630002279] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2008] [Accepted: 02/11/2008] [Indexed: 11/23/2022] Open
Abstract
UNLABELLED Understanding and characterizing the biochemical and evolutionary information within the wealth of protein sequence and structural data, particularly at functionally important sites, is very important. A comprehensive analysis of physico-chemical properties and evolutionary conservation patterns at the molecular and biological function level is expected to yield important clues for identifying similar sites in as-yet uncharacterized proteins. We present a library of protein functional templates (PFTs) designed to represent the compositional and evolutionary conservation patterns of functional sites at the molecular and biological function level. Subsequently we developed LIMACS (LInear MAtching of Conservation Scores), a software tool that uses the template library for the prediction of functionally important sites in a multiple sequence alignment, transferring the molecular function annotation from the most-similar functional site in the template library to a predicted site. AVAILABILITY The PFT library, the LIMACS program and source code are available for PC, Mac and Linux operating systems from ftp://ftp.ncbi.nih.gov/pub/lanczyck/limacs.
Collapse
Affiliation(s)
- Christopher J Lanczycki
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Saikat Chakrabarti
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
29
|
Torkamani A, Schork NJ. Distribution analysis of nonsynonymous polymorphisms within the human kinase gene family. Genomics 2007; 90:49-58. [PMID: 17498919 DOI: 10.1016/j.ygeno.2007.03.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2006] [Revised: 02/15/2007] [Accepted: 03/10/2007] [Indexed: 11/22/2022]
Abstract
The human kinase gene family is composed of 518 genes that are involved in a diverse spectrum of physiological functions. They are also implicated in a number of diseases and encompass 10% of current drug targets. Contemporary, high-throughput sequencing efforts have identified a rich source of naturally occurring single nucleotide polymorphisms (SNPs) in kinases, a subset of which occur in the coding region of genes (cSNPs) and result in a change in the encoded amino acid sequence (nonsynonymous coding SNP; nscSNPs). What fraction of this naturally occurring variation underlies human disease is largely unknown (uDC), and much of it is assumed not to be disease causing (DC). We pursued a comprehensive computational analysis of the distribution of 1463 nscSNPs and 999 DC nscSNPs within the kinase gene family and have found that DCs are overrepresentated in the kinase catalytic domain and in receptor structures. In addition, the frequencies with which specific amino acid changes occur differ between the DCs and the uDCs, implying different biological characteristics for the two sets of human polymorphisms. Our results provide insights into the sequence and structural phenomena associated with naturally occurring kinase nscSNPs that contribute to human diseases.
Collapse
Affiliation(s)
- Ali Torkamani
- Graduate Program in Biomedical Sciences, Department of Medicine, University of California at San Diego, La Jolla, CA 92093, USA
| | | |
Collapse
|