1
|
Lau AM, Bordin N, Kandathil SM, Sillitoe I, Waman VP, Wells J, Orengo CA, Jones DT. Exploring structural diversity across the protein universe with The Encyclopedia of Domains. Science 2024; 386:eadq4946. [PMID: 39480926 DOI: 10.1126/science.adq4946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Accepted: 08/30/2024] [Indexed: 11/02/2024]
Abstract
The AlphaFold Protein Structure Database (AFDB) contains more than 214 million predicted protein structures composed of domains, which are independently folding units found in multiple structural and functional contexts. Identifying domains can enable many functional and evolutionary analyses but has remained challenging because of the sheer scale of the data. Using deep learning methods, we have detected and classified every domain in the AFDB, producing The Encyclopedia of Domains. We detected nearly 365 million domains, over 100 million more than can be found by sequence methods, covering more than 1 million taxa. Reassuringly, 77% of the nonredundant domains are similar to known superfamilies, greatly expanding representation of their domain space. We uncovered more than 10,000 new structural interactions between superfamilies and thousands of new folds across the fold space continuum.
Collapse
Affiliation(s)
- Andy M Lau
- Department of Computer Science, University College London, London WC1E 6BT, UK
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Shaun M Kandathil
- Department of Computer Science, University College London, London WC1E 6BT, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Vaishali P Waman
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Jude Wells
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
- Centre for Artificial Intelligence, University College London, London WC1V 6BH, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - David T Jones
- Department of Computer Science, University College London, London WC1E 6BT, UK
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| |
Collapse
|
2
|
Gupta SK, Ponte-Sucre A, Bencurova E, Dandekar T. An Ebola, Neisseria and Trypanosoma human protein interaction census reveals a conserved human protein cluster targeted by various human pathogens. Comput Struct Biotechnol J 2021; 19:5292-5308. [PMID: 34745452 PMCID: PMC8531761 DOI: 10.1016/j.csbj.2021.09.017] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 09/14/2021] [Accepted: 09/15/2021] [Indexed: 12/28/2022] Open
Abstract
Filovirus ebolavirus (ZE; Zaire ebolavirus, Bundibugyo ebolavirus), Neisseria meningitidis (NM), and Trypanosoma brucei (Tb) are serious infectious pathogens, spanning viruses, bacteria and protists and all may target the blood and central nervous system during their life cycle. NM and Tb are extracellular pathogens while ZE is obligatory intracellular, targetting immune privileged sites. By using interactomics and comparative evolutionary analysis we studied whether conserved human proteins are targeted by these pathogens. We examined 2797 unique pathogen-targeted human proteins. The information derived from orthology searches of experimentally validated protein-protein interactions (PPIs) resulted both in unique and shared PPIs for each pathogen. Comparing and analyzing conserved and pathogen-specific infection pathways for NM, TB and ZE, we identified human proteins predicted to be targeted in at least two of the compared host-pathogen networks. However, four proteins were common to all three host-pathogen interactomes: the elongation factor 1-alpha 1 (EEF1A1), the SWI/SNF complex subunit SMARCC2 (matrix-associated actin-dependent regulator of chromatin subfamily C), the dolichyl-diphosphooligosaccharide--protein glycosyltransferase subunit 1 (RPN1), and the tubulin beta-5 chain (TUBB). These four human proteins all are also involved in cytoskeleton and its regulation and are often addressed by various human pathogens. Specifically, we found (i) 56 human pathogenic bacteria and viruses that target these four proteins, (ii) the well researched new pandemic pathogen SARS-CoV-2 targets two of these four human proteins and (iii) nine human pathogenic fungi (yet another evolutionary distant organism group) target three of the conserved proteins by 130 high confidence interactions.
Collapse
Affiliation(s)
- Shishir K Gupta
- Functional Genomics & Systems Biology Group, Department of Bioinformatics, Biocenter, Am Hubland, University of Würzburg, 97074 Würzburg, Germany
- Evolutionary Genomics Group, Center for Computational and Theoretical Biology, University of Würzburg, 97078 Würzburg, Germany
| | - Alicia Ponte-Sucre
- Laboratorio de Fisiología Molecular, Instituto de Medicina Experimental, Escuela Luis Razetti, Universidad Central de Venezuela, Caracas, Venezuela
- Medical Mission Institute, Hermann-Schell-Str. 7, 97074 Würzburg, Germany
| | - Elena Bencurova
- Functional Genomics & Systems Biology Group, Department of Bioinformatics, Biocenter, Am Hubland, University of Würzburg, 97074 Würzburg, Germany
| | - Thomas Dandekar
- Functional Genomics & Systems Biology Group, Department of Bioinformatics, Biocenter, Am Hubland, University of Würzburg, 97074 Würzburg, Germany
- EMBL Heidelberg, BioComputing Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| |
Collapse
|
3
|
Chen YF, Xia Y. Structural Profiling of Bacterial Effectors Reveals Enrichment of Host-Interacting Domains and Motifs. Front Mol Biosci 2021; 8:626600. [PMID: 34012977 PMCID: PMC8126662 DOI: 10.3389/fmolb.2021.626600] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 04/21/2021] [Indexed: 11/13/2022] Open
Abstract
Effector proteins are bacterial virulence factors secreted directly into host cells and, through extensive interactions with host proteins, rewire host signaling pathways to the advantage of the pathogen. Despite the crucial role of globular domains as mediators of protein-protein interactions (PPIs), previous structural studies of bacterial effectors are primarily focused on individual domains, rather than domain-mediated PPIs, which limits their ability to uncover systems-level molecular recognition principles governing host-bacteria interactions. Here, we took an interaction-centric approach and systematically examined the potential of structural components within bacterial proteins to engage in or target eukaryote-specific domain-domain interactions (DDIs). Our results indicate that: 1) effectors are about six times as likely as non-effectors to contain host-like domains that mediate DDIs exclusively in eukaryotes; 2) the average domain in effectors is about seven times as likely as that in non-effectors to co-occur with DDI partners in eukaryotes rather than in bacteria; and 3) effectors are about nine times as likely as non-effectors to contain bacteria-exclusive domains that target host domains mediating DDIs exclusively in eukaryotes. Moreover, in the absence of host-like domains or among pathogen proteins without domain assignment, effectors harbor a higher variety and density of short linear motifs targeting host domains that mediate DDIs exclusively in eukaryotes. Our study lends novel quantitative insight into the structural basis of effector-induced perturbation of host-endogenous PPIs and may aid in the design of selective inhibitors of host-pathogen interactions.
Collapse
Affiliation(s)
| | - Yu Xia
- Department of Bioengineering, McGill University, Montreal, QC, Canada
| |
Collapse
|
4
|
Verma R, Pandit SB. Unraveling the structural landscape of intra-chain domain interfaces: Implication in the evolution of domain-domain interactions. PLoS One 2019; 14:e0220336. [PMID: 31374091 PMCID: PMC6677297 DOI: 10.1371/journal.pone.0220336] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Accepted: 07/12/2019] [Indexed: 12/22/2022] Open
Abstract
Intra-chain domain interactions are known to play a significant role in the function and stability of multidomain proteins. These interactions are mediated through a physical interaction at domain-domain interfaces (DDIs). With a motivation to understand evolution of interfaces, we have investigated similarities among DDIs. Even though interfaces of protein-protein interactions (PPIs) have been previously studied by structurally aligning interfaces, similar analyses have not yet been performed on DDIs of either multidomain proteins or PPIs. For studying the structural landscape of DDIs, we have used iAlign to structurally align intra-chain domain interfaces of domains. The interface alignment of spatially constrained domains (due to inter-domain linkers) showed that ~88% of these could identify a structural matching interface having similar C-alpha geometry and contact pattern despite that aligned domain pairs are not structurally related. Moreover, the mean interface similarity score (IS-score) is 0.307, which is higher compared to the average random IS-score (0.207) suggesting domain interfaces are not random. The structural space of DDIs is highly connected as ~84% of all possible directed edges among interfaces are found to have at most path length of 8 when 0.26 is IS-score threshold. At this threshold, ~83% of interfaces form the largest strongly connected component. Thus, suggesting that structural space of intra-chain domain interfaces is degenerate and highly connected, as has been found in PPI interfaces. Interestingly, searching for structural neighbors of inter-chain interfaces among intra-chain interfaces showed that ~86% could find a statistically significant match to intra-chain interface with a mean IS-score of 0.311. This implies that domain interfaces are degenerate whether formed within a protein or between proteins. The interface degeneracy is most likely due to limited possible ways of packing secondary structures. In principle, interface similarities can be exploited to accurately model domain interfaces in structure prediction of multidomain proteins.
Collapse
Affiliation(s)
- Rivi Verma
- Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali, India
| | - Shashi Bhushan Pandit
- Department of Biological Sciences, Indian Institute of Science Education and Research, Mohali, India
- * E-mail:
| |
Collapse
|
5
|
Esmaielbeiki R, Krawczyk K, Knapp B, Nebel JC, Deane CM. Progress and challenges in predicting protein interfaces. Brief Bioinform 2016; 17:117-31. [PMID: 25971595 PMCID: PMC4719070 DOI: 10.1093/bib/bbv027] [Citation(s) in RCA: 80] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Revised: 03/18/2015] [Indexed: 12/31/2022] Open
Abstract
The majority of biological processes are mediated via protein-protein interactions. Determination of residues participating in such interactions improves our understanding of molecular mechanisms and facilitates the development of therapeutics. Experimental approaches to identifying interacting residues, such as mutagenesis, are costly and time-consuming and thus, computational methods for this purpose could streamline conventional pipelines. Here we review the field of computational protein interface prediction. We make a distinction between methods which address proteins in general and those targeted at antibodies, owing to the radically different binding mechanism of antibodies. We organize the multitude of currently available methods hierarchically based on required input and prediction principles to provide an overview of the field.
Collapse
|
6
|
Nchongboh CG, Wu GW, Hong N, Wang GP. Protein–protein interactions between proteins of Citrus tristeza virus isolates. Virus Genes 2014; 49:456-65. [DOI: 10.1007/s11262-014-1100-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2014] [Accepted: 06/20/2014] [Indexed: 12/01/2022]
|
7
|
Coelho ED, Arrais JP, Matos S, Pereira C, Rosa N, Correia MJ, Barros M, Oliveira JL. Computational prediction of the human-microbial oral interactome. BMC SYSTEMS BIOLOGY 2014; 8:24. [PMID: 24576332 PMCID: PMC3975954 DOI: 10.1186/1752-0509-8-24] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/27/2013] [Accepted: 02/17/2014] [Indexed: 11/12/2022]
Abstract
BACKGROUND The oral cavity is a complex ecosystem where human chemical compounds coexist with a particular microbiota. However, shifts in the normal composition of this microbiota may result in the onset of oral ailments, such as periodontitis and dental caries. In addition, it is known that the microbial colonization of the oral cavity is mediated by protein-protein interactions (PPIs) between the host and microorganisms. Nevertheless, this kind of PPIs is still largely undisclosed. To elucidate these interactions, we have created a computational prediction method that allows us to obtain a first model of the Human-Microbial oral interactome. RESULTS We collected high-quality experimental PPIs from five major human databases. The obtained PPIs were used to create our positive dataset and, indirectly, our negative dataset. The positive and negative datasets were merged and used for training and validation of a naïve Bayes classifier. For the final prediction model, we used an ensemble methodology combining five distinct PPI prediction techniques, namely: literature mining, primary protein sequences, orthologous profiles, biological process similarity, and domain interactions. Performance evaluation of our method revealed an area under the ROC-curve (AUC) value greater than 0.926, supporting our primary hypothesis, as no single set of features reached an AUC greater than 0.877. After subjecting our dataset to the prediction model, the classified result was filtered for very high confidence PPIs (probability ≥ 1-10-7), leading to a set of 46,579 PPIs to be further explored. CONCLUSIONS We believe this dataset holds not only important pathways involved in the onset of infectious oral diseases, but also potential drug-targets and biomarkers. The dataset used for training and validation, the predictions obtained and the network final network are available at http://bioinformatics.ua.pt/software/oralint.
Collapse
Affiliation(s)
- Edgar D Coelho
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Telematics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal
| | - Joel P Arrais
- Department of Informatics Engineering (DEI), University of Coimbra, Coimbra, Portugal
- Centre for Informatics and Systems of the University at Coimbra (CISUC), University of Coimbra, Coimbra, Portugal
| | - Sérgio Matos
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Telematics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal
| | - Carlos Pereira
- Centre for Informatics and Systems of the University at Coimbra (CISUC), University of Coimbra, Coimbra, Portugal
- Department of Informatics Engineering and Systems, Polytechnic Institute of Coimbra, Engineering Institute of Coimbra (IPC-ISEC), Coimbra, Portugal
| | - Nuno Rosa
- Department of Health Sciences, Institute of Health Sciences, The Catholic University of Portugal, Viseu, Portugal
| | - Maria José Correia
- Department of Health Sciences, Institute of Health Sciences, The Catholic University of Portugal, Viseu, Portugal
| | - Marlene Barros
- Department of Health Sciences, Institute of Health Sciences, The Catholic University of Portugal, Viseu, Portugal
- Centre for Neurosciences and Cell Biology, University of Coimbra, Coimbra, Portugal
| | - José Luís Oliveira
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Telematics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal
| |
Collapse
|
8
|
Wang H, Huang H, Ding C, Nie F. Predicting Protein–Protein Interactions from Multimodal Biological Data Sources via Nonnegative Matrix Tri-Factorization. J Comput Biol 2013; 20:344-58. [DOI: 10.1089/cmb.2012.0273] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Affiliation(s)
- Hua Wang
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas
| | - Heng Huang
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas
| | - Chris Ding
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas
| | - Feiping Nie
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas
| |
Collapse
|
9
|
Bellacchio E. In silico analysis of the two tandem somatomedin B domains of ENPP1 reveals hints on the homodimerization of the protein. J Cell Physiol 2012; 227:3566-74. [DOI: 10.1002/jcp.24058] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
10
|
Abstract
With the advent of Systems Biology, the prediction of whether two proteins form a complex has become a problem of increased importance. A variety of experimental techniques have been applied to the problem, but three-dimensional structural information has not been widely exploited. Here we explore the range of applicability of such information by analyzing the extent to which the location of binding sites on protein surfaces is conserved among structural neighbors. We find, as expected, that interface conservation is most significant among proteins that have a clear evolutionary relationship, but that there is a significant level of conservation even among remote structural neighbors. This finding is consistent with recent evidence that information available from structural neighbors, independent of classification, should be exploited in the search for functional insights. The value of such structural information is highlighted through the development of a new protein interface prediction method, PredUs, that identifies what residues on protein surfaces are likely to participate in complexes with other proteins. The performance of PredUs, as measured through comparisons with other methods, suggests that relationships across protein structure space can be successfully exploited in the prediction of protein-protein interactions.
Collapse
|
11
|
Liu M, Chen XW, Jothi R. Knowledge-guided inference of domain-domain interactions from incomplete protein-protein interaction networks. ACTA ACUST UNITED AC 2009; 25:2492-9. [PMID: 19667081 PMCID: PMC2752622 DOI: 10.1093/bioinformatics/btp480] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Motivation: Protein-protein interactions (PPIs), though extremely valuable towards a better understanding of protein functions and cellular processes, do not provide any direct information about the regions/domains within the proteins that mediate the interaction. Most often, it is only a fraction of a protein that directly interacts with its biological partners. Thus, understanding interaction at the domain level is a critical step towards (i) thorough understanding of PPI networks; (ii) precise identification of binding sites; (iii) acquisition of insights into the causes of deleterious mutations at interaction sites; and (iv) most importantly, development of drugs to inhibit pathological protein interactions. In addition, knowledge derived from known domain–domain interactions (DDIs) can be used to understand binding interfaces, which in turn can help discover unknown PPIs. Results: Here, we describe a novel method called K-GIDDI (knowledge-guided inference of DDIs) to narrow down the PPI sites to smaller regions/domains. K-GIDDI constructs an initial DDI network from cross-species PPI networks, and then expands the DDI network by inferring additional DDIs using a divide-and-conquer biclustering algorithm guided by Gene Ontology (GO) information, which identifies partial-complete bipartite sub-networks in the DDI network and makes them complete bipartite sub-networks by adding edges. Our results indicate that K-GIDDI can reliably predict DDIs. Most importantly, K-GIDDI's novel network expansion procedure allows prediction of DDIs that are otherwise not identifiable by methods that rely only on PPI data. Contact:xwchen@ku.edu Availability:http://www.ittc.ku.edu/∼xwchen/domainNetwork/ddinet.html Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mei Liu
- Bioinformatics and Computational Life-Sciences Laboratory, ITTC, Department of Electrical Engineering and Computer Science, University of Kansas, 1520 West 15th Street, Lawrence, KS 66045, USA
| | | | | |
Collapse
|
12
|
Built-in loops allow versatility in domain-domain interactions: lessons from self-interacting domains. Proc Natl Acad Sci U S A 2008; 105:13292-7. [PMID: 18757736 DOI: 10.1073/pnas.0801207105] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Compilations of domain-domain interactions based on solved structures suggest there are distinct domain pairs that are used repeatedly in different protein contexts to mediate protein-protein interactions. However, not all protein pairs with the corresponding domains that can potentially mediate interaction do interact, even when they are colocalized and coexpressed. It is conceivable that there are structural and sequence features, below the domain level, that play a role in determining the potential of domains to mediate protein-protein interactions. Here, we discover such features by comparing domains that, on the one hand, mediate homodimerization of proteins and, on the other, occur in different proteins that are documented as monomers. Intriguingly, this comparison uncovered surface loops that can be considered as determinants of the interactions. There are enabling loops, which mediate the domain interactions, and disabling loops that prevent the interactions. The presence of the enabling/disabling loops is consistent with the fulfillment/prevention of the interaction and is highly preserved in evolution. This suggests that, along with the preservation of structural elements that enable interaction, evolution maintains elements intended to prevent unwanted interactions. The enabling and disabling loops discovered in this study have implications in prediction of protein-protein interactions, by pointing to the protein regions that determine the interaction. Our results extend the hierarchy of attributes that collectively establish the modularity of domain-mediated protein-protein interactions.
Collapse
|
13
|
Mahdavi MA, Lin YH. Prediction of protein-protein interactions using protein signature profiling. GENOMICS PROTEOMICS & BIOINFORMATICS 2008; 5:177-86. [PMID: 18267299 PMCID: PMC5963007 DOI: 10.1016/s1672-0229(08)60005-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
Protein domains are conserved and functionally independent structures that play an important role in interactions among related proteins. Domain-domain interactions have been recently used to predict protein-protein interactions (PPI). In general, the interaction probability of a pair of domains is scored using a trained scoring function. Satisfying a threshold, the protein pairs carrying those domains are regarded as “interacting”. In this study, the signature contents of proteins were utilized to predict PPI pairs in Saccharomyces cerevisiae, Caenorhabditis elegans, and Homo sapiens. Similarity between protein signature patterns was scored and PPI predictions were drawn based on the binary similarity scoring function. Results show that the true positive rate of prediction by the proposed approach is approximately 32% higher than that using the maximum likelihood estimation method when compared with a test set, resulting in 22% increase in the area under the receiver operating characteristic (ROC) curve. When proteins containing one or two signatures were removed, the sensitivity of the predicted PPI pairs increased significantly. The predicted PPI pairs are on average 11 times more likely to interact than the random selection at a confidence level of 0.95, and on average 4 times better than those predicted by either phylogenetic profiling or gene expression profiling.
Collapse
Affiliation(s)
- Mahmood A Mahdavi
- Department of Chemical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada
| | | |
Collapse
|
14
|
Brylinski M, Skolnick J. What is the relationship between the global structures of apo and holo proteins? Proteins 2008; 70:363-77. [PMID: 17680687 DOI: 10.1002/prot.21510] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
It is well known that ligand binding and release may induce a wide range of structural changes in a receptor protein, varying from small movements of loops or side chains in the binding pocket to large-scale domain hinge-bending and shear motions or even partial unfolding that facilitates the capture and release of a ligand. An interesting question is what in general are the conformational changes triggered by ligand binding? The aim of this work is analyze the magnitude of structural changes in a protein resulting from ligand binding to assess if the state of ligand binding needs to be included in template-based protein structure prediction algorithms. To address this issue, a nonredundant dataset of 521 paired protein structures in the ligand-free and ligand-bound form was created and used to estimate the degree of both local and global structure similarity between the apo and holo forms. In most cases, the proteins undergo relatively small conformational rearrangements of their tertiary structure upon ligand binding/release (most root-mean-square-deviations from native, RMSD, are <1 A). However, a clear difference was observed between single- and multiple-domain proteins. For the latter, RMSD changes greater than 1 A and sometimes larger were found for almost 1/3 of the cases; these are mainly associated with large-scale hinge-bending movements of entire domains. The changes in the mutual orientation of individual domains in multiple-domain proteins upon ligand binding were investigated using a mechanistic model based on mass-weighted principal axes as well as interface buried surface calculations. Some preferences toward the anticipated mechanism of protein domain movements are predictable based on the examination of just the ligand-free structural form. These results have applications to protein structure prediction, particularly in the context of protein domain assembly, if additional information concerning ligand binding is exploited.
Collapse
Affiliation(s)
- Michal Brylinski
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, Georgia 30318, USA
| | | |
Collapse
|
15
|
Schuster-Böckler B, Bateman A. Reuse of structural domain-domain interactions in protein networks. BMC Bioinformatics 2007; 8:259. [PMID: 17640363 PMCID: PMC1940023 DOI: 10.1186/1471-2105-8-259] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2007] [Accepted: 07/18/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein interactions are thought to be largely mediated by interactions between structural domains. Databases such as iPfam relate interactions in protein structures to known domain families. Here, we investigate how the domain interactions from the iPfam database are distributed in protein interactions taken from the HPRD, MPact, BioGRID, DIP and IntAct databases. RESULTS We find that known structural domain interactions can only explain a subset of 4-19% of the available protein interactions, nevertheless this fraction is still significantly bigger than expected by chance. There is a correlation between the frequency of a domain interaction and the connectivity of the proteins it occurs in. Furthermore, a large proportion of protein interactions can be attributed to a small number of domain interactions. We conclude that many, but not all, domain interactions constitute reusable modules of molecular recognition. A substantial proportion of domain interactions are conserved between E. coli, S. cerevisiae and H. sapiens. These domains are related to essential cellular functions, suggesting that many domain interactions were already present in the last universal common ancestor. CONCLUSION Our results support the concept of domain interactions as reusable, conserved building blocks of protein interactions, but also highlight the limitations currently imposed by the small number of available protein structures.
Collapse
Affiliation(s)
| | - Alex Bateman
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK
| |
Collapse
|
16
|
Shoemaker BA, Panchenko AR. Deciphering protein-protein interactions. Part I. Experimental techniques and databases. PLoS Comput Biol 2007; 3:e42. [PMID: 17397251 PMCID: PMC1847991 DOI: 10.1371/journal.pcbi.0030042] [Citation(s) in RCA: 235] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
|
17
|
Itzhaki Z, Akiva E, Altuvia Y, Margalit H. Evolutionary conservation of domain-domain interactions. Genome Biol 2007; 7:R125. [PMID: 17184549 PMCID: PMC1794438 DOI: 10.1186/gb-2006-7-12-r125] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2006] [Revised: 11/06/2006] [Accepted: 12/21/2006] [Indexed: 11/16/2022] Open
Abstract
Mapping of domain-domain interactions onto the cellular protein-protein interaction networks of different organisms demonstrates that there is a catalogue of domain pairs that is used for mediating various interactions in the cell Background Recently, there has been much interest in relating domain-domain interactions (DDIs) to protein-protein interactions (PPIs) and vice versa, in an attempt to understand the molecular basis of PPIs. Results Here we map structurally derived DDIs onto the cellular PPI networks of different organisms and demonstrate that there is a catalog of domain pairs that is used to mediate various interactions in the cell. We show that these DDIs occur frequently in protein complexes and that homotypic interactions (of a domain with itself) are abundant. A comparison of the repertoires of DDIs in the networks of Escherichia coli, Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens shows that many DDIs are evolutionarily conserved. Conclusion Our results indicate that different organisms use the same 'building blocks' for PPIs, suggesting that the functionality of many domain pairs in mediating protein interactions is maintained in evolution.
Collapse
Affiliation(s)
- Zohar Itzhaki
- Department of Molecular Genetics and Biotechnology, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem 91120, Israel
| | - Eyal Akiva
- Department of Molecular Genetics and Biotechnology, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem 91120, Israel
| | - Yael Altuvia
- Department of Molecular Genetics and Biotechnology, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem 91120, Israel
| | - Hanah Margalit
- Department of Molecular Genetics and Biotechnology, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem 91120, Israel
| |
Collapse
|
18
|
Han JH, Batey S, Nickson AA, Teichmann SA, Clarke J. The folding and evolution of multidomain proteins. Nat Rev Mol Cell Biol 2007; 8:319-30. [PMID: 17356578 DOI: 10.1038/nrm2144] [Citation(s) in RCA: 283] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Analyses of genomes show that more than 70% of eukaryotic proteins are composed of multiple domains. However, most studies of protein folding focus on individual domains and do not consider how interactions between domains might affect folding. Here, we address this by analysing the three-dimensional structures of multidomain proteins that have been characterized experimentally and observe that where the interface is small and loosely packed, or unstructured, the folding of the domains is independent. Furthermore, recent studies indicate that multidomain proteins have evolved mechanisms to minimize the problems of interdomain misfolding.
Collapse
Affiliation(s)
- Jung-Hoon Han
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, UK
| | | | | | | | | |
Collapse
|
19
|
Jefferson ER, Walsh TP, Barton GJ. Biological units and their effect upon the properties and prediction of protein-protein interactions. J Mol Biol 2006; 364:1118-29. [PMID: 17049359 DOI: 10.1016/j.jmb.2006.09.042] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2006] [Revised: 09/12/2006] [Accepted: 09/15/2006] [Indexed: 11/30/2022]
Abstract
Structural data as collated in the Protein Data Bank (PDB) have been widely applied in the study and prediction of protein-protein interactions. However, since the basic PDB Entries contain only the contents of the asymmetric unit rather than the biological unit, some key interactions may be missed by analysing only the PDB Entry. A total of 69,054 SCOP (Structural Classification of Proteins) domains were examined systematically to identify the number of additional novel interacting domain pairs and interfaces found by considering the biological unit as stored in the PQS (Protein Quaternary Structure) database. The PQS data adds 25,965 interacting domain pairs to those seen in the PDB Entries to give a total of 61,783 redundant interacting domain pairs. Redundancy filtering at the level of the SCOP family shows PQS to increase the number of novel interacting domain-family pairs by 302 (13.3%) from 2277, but only 16/302 (1.4%) of the interacting domain pairs have the two domains in different SCOP families. This suggests the biological units add little to the elucidation of novel biological interaction networks. However, when the orientation of the domain pairs is considered, the PQS data increases the number of novel domain-domain interfaces observed by 1455 (34.5%) to give 5677 non-redundant domain-domain interfaces. In all, 162/1455 novel domain-domain interfaces are between domains from different families, an increase of 8.9% over the PDB Entries. Overall, the PQS biological units provide a rich source of novel domain-domain interfaces that are not seen in the studied PDB Entries, and so PQS domain-domain interaction data should be exploited wherever possible in the analysis and prediction of protein-protein interactions.
Collapse
Affiliation(s)
- Emily R Jefferson
- University of Dundee, School of Life Sciences, Dow Street, Dundee, DD1 5EH Scotland, UK
| | | | | |
Collapse
|
20
|
Sánchez IE, Tejero J, Gómez-Moreno C, Medina M, Serrano L. Point mutations in protein globular domains: contributions from function, stability and misfolding. J Mol Biol 2006; 363:422-32. [PMID: 16978645 DOI: 10.1016/j.jmb.2006.08.020] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2006] [Revised: 07/25/2006] [Accepted: 08/08/2006] [Indexed: 11/25/2022]
Abstract
Several contrasting hypotheses have been formulated about the influence of functional and conformational properties, like stability and avoidance of misfolding, on the evolution of protein globular domains. Selection at functional sites has been suggested to be detrimental to stability or coupled to it. Avoidance of misfolding may be achieved by discarding misfolding-prone sequences or by maintaining a stable native state and thus destabilizing partially or fully unfolded states from which misfolding can take place. We have performed a hierarchical analysis of a large database of point mutations to dissect the relative contributions of function, stability and misfolding in the evolution of natural sequences. We show that at catalytic sites, selection for function overrules selection for stability but find no evidence for an anticorrelation between function and stability. Selection for stability plays a secondary role at binding sites, but is not fully coupled to selection for function. Remarkably, we did not find a selective pressure against misfolding-prone sequences in globular proteins at the level of individual positions. We suggest that such a selection would compromise native-state stability due to a correlation between the stabilities of native and misfolded states. Stabilization of the native state is the most frequent way in which natural proteins avoid misfolding.
Collapse
Affiliation(s)
- I E Sánchez
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany.
| | | | | | | | | |
Collapse
|
21
|
Jothi R, Cherukuri PF, Tasneem A, Przytycka TM. Co-evolutionary analysis of domains in interacting proteins reveals insights into domain-domain interactions mediating protein-protein interactions. J Mol Biol 2006; 362:861-75. [PMID: 16949097 PMCID: PMC1618801 DOI: 10.1016/j.jmb.2006.07.072] [Citation(s) in RCA: 81] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2006] [Revised: 06/19/2006] [Accepted: 07/14/2006] [Indexed: 11/28/2022]
Abstract
Recent advances in functional genomics have helped generate large-scale high-throughput protein interaction data. Such networks, though extremely valuable towards molecular level understanding of cells, do not provide any direct information about the regions (domains) in the proteins that mediate the interaction. Here, we performed co-evolutionary analysis of domains in interacting proteins in order to understand the degree of co-evolution of interacting and non-interacting domains. Using a combination of sequence and structural analysis, we analyzed protein-protein interactions in F1-ATPase, Sec23p/Sec24p, DNA-directed RNA polymerase and nuclear pore complexes, and found that interacting domain pair(s) for a given interaction exhibits higher level of co-evolution than the non-interacting domain pairs. Motivated by this finding, we developed a computational method to test the generality of the observed trend, and to predict large-scale domain-domain interactions. Given a protein-protein interaction, the proposed method predicts the domain pair(s) that is most likely to mediate the protein interaction. We applied this method on the yeast interactome to predict domain-domain interactions, and used known domain-domain interactions found in PDB crystal structures to validate our predictions. Our results show that the prediction accuracy of the proposed method is statistically significant. Comparison of our prediction results with those from two other methods reveals that only a fraction of predictions are shared by all the three methods, indicating that the proposed method can detect known interactions missed by other methods. We believe that the proposed method can be used with other methods to help identify previously unrecognized domain-domain interactions on a genome scale, and could potentially help reduce the search space for identifying interaction sites.
Collapse
Affiliation(s)
- Raja Jothi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
- *Corresponding authors; E-mail addresses of the corresponding authors: ;
| | - Praveen F. Cherukuri
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
- Bioinformatics Program Boston University, Boston, MA 02215, USA
| | - Asba Tasneem
- Booz Allen Hamilton Inc., Rockville, MD 20852, USA
| | - Teresa M. Przytycka
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
- *Corresponding authors; E-mail addresses of the corresponding authors: ;
| |
Collapse
|
22
|
Kim WK, Henschel A, Winter C, Schroeder M. The many faces of protein-protein interactions: A compendium of interface geometry. PLoS Comput Biol 2006; 2:e124. [PMID: 17009862 PMCID: PMC1584320 DOI: 10.1371/journal.pcbi.0020124] [Citation(s) in RCA: 94] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2006] [Accepted: 07/31/2006] [Indexed: 11/18/2022] Open
Abstract
A systematic classification of protein-protein interfaces is a valuable resource for understanding the principles of molecular recognition and for modelling protein complexes. Here, we present a classification of domain interfaces according to their geometry. Our new algorithm uses a hybrid approach of both sequential and structural features. The accuracy is evaluated on a hand-curated dataset of 416 interfaces. Our hybrid procedure achieves 83% precision and 95% recall, which improves the earlier sequence-based method by 5% on both terms. We classify virtually all domain interfaces of known structure, which results in nearly 6,000 distinct types of interfaces. In 40% of the cases, the interacting domain families associate in multiple orientations, suggesting that all the possible binding orientations need to be explored for modelling multidomain proteins and protein complexes. In general, hub proteins are shown to use distinct surface regions (multiple faces) for interactions with different partners. Our classification provides a convenient framework to query genuine gene fusion, which conserves binding orientation in both fused and separate forms. The result suggests that the binding orientations are not conserved in at least one-third of the gene fusion cases detected by a conventional sequence similarity search. We show that any evolutionary analysis on interfaces can be skewed by multiple binding orientations and multiple interaction partners. The taxonomic distribution of interface types suggests that ancient interfaces common to the three major kingdoms of life are enriched by symmetric homodimers. The classification results are online at http://www.scoppi.org.
Collapse
Affiliation(s)
- Wan Kyu Kim
- Bioinformatics Group, Biotechnological Centre, Technische Universität Dresden, Dresden, Germany
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge, United Kingdom
| | - Andreas Henschel
- Bioinformatics Group, Biotechnological Centre, Technische Universität Dresden, Dresden, Germany
| | - Christof Winter
- Bioinformatics Group, Biotechnological Centre, Technische Universität Dresden, Dresden, Germany
| | - Michael Schroeder
- Bioinformatics Group, Biotechnological Centre, Technische Universität Dresden, Dresden, Germany
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
23
|
Han JH, Kerrison N, Chothia C, Teichmann SA. Divergence of interdomain geometry in two-domain proteins. Structure 2006; 14:935-45. [PMID: 16698554 DOI: 10.1016/j.str.2006.01.016] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2005] [Revised: 12/23/2005] [Accepted: 01/18/2006] [Indexed: 10/24/2022]
Abstract
For homologous protein chains composed of two domains, we have determined the extent to which they conserve (1) their interdomain geometry and (2) the molecular structure of the domain interface. This work was carried out on 128 unique two-domain architectures. Of the 128, we find 75 conserve their interdomain geometry and the structure of their domain interface; 5 conserve their interdomain geometry but not the structure of their interface; and 48 have variable geometries and divergent interface structure. We describe how different types of interface changes or the absence of an interface is responsible for these differences in geometry. Variable interdomain geometries can be found in homologous structures with high sequence identities (70%).
Collapse
Affiliation(s)
- Jung-Hoon Han
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge, CB2 2QH, United Kingdom.
| | | | | | | |
Collapse
|
24
|
Lise S, Walker-Taylor A, Jones DT. Docking protein domains in contact space. BMC Bioinformatics 2006; 7:310. [PMID: 16790041 PMCID: PMC1559650 DOI: 10.1186/1471-2105-7-310] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2006] [Accepted: 06/21/2006] [Indexed: 11/10/2022] Open
Abstract
Background Many biological processes involve the physical interaction between protein domains. Understanding these functional associations requires knowledge of the molecular structure. Experimental investigations though present considerable difficulties and there is therefore a need for accurate and reliable computational methods. In this paper we present a novel method that seeks to dock protein domains using a contact map representation. Rather than providing a full three dimensional model of the complex, the method predicts contacting residues across the interface. We use a scoring function that combines structural, physicochemical and evolutionary information, where each potential residue contact is assigned a value according to the scoring function and the hypothesis is that the real configuration of contacts is the one that maximizes the score. The search is performed with a simulated annealing algorithm directly in contact space. Results We have tested the method on interacting domain pairs that are part of the same protein (intra-molecular domains). We show that it correctly predicts some contacts and that predicted residues tend to be significantly closer to each other than other pairs of residues in the same domains. Moreover we find that predicted contacts can often discriminate the best model (or the native structure, if present) among a set of optimal solutions generated by a standard docking procedure. Conclusion Contact docking appears feasible and able to complement other computational methods for the prediction of protein-protein interactions. With respect to more standard docking algorithms it might be more suitable to handle protein conformational changes and to predict complexes starting from protein models.
Collapse
Affiliation(s)
- Stefano Lise
- Department of Biochemistry and Molecular Biology, University College London, UK
| | | | - David T Jones
- Department of Biochemistry and Molecular Biology, University College London, UK
- Department of Computer Science, University College London, UK
| |
Collapse
|
25
|
Shoemaker BA, Panchenko AR, Bryant SH. Finding biologically relevant protein domain interactions: conserved binding mode analysis. Protein Sci 2005; 15:352-61. [PMID: 16385001 PMCID: PMC1855242 DOI: 10.1110/ps.051760806] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Proteins evolved through the shuffling of functional domains, and therefore, the same domain can be found in different proteins and species. Interactions between such conserved domains often involve specific, well-determined binding surfaces reflecting their important biological role in a cell. To find biologically relevant interactions we developed a method of systematically comparing and classifying protein domain interactions from the structural data. As a result, a set of conserved binding modes (CBMs) was created using the atomic detail of structure alignment data and the protein domain classification of the Conserved Domain Database. A conserved binding mode is inferred when different members of interacting domain families dock in the same way, such that their structural complexes superimpose well. Such domain interactions with recurring structural themes have greater significance to be biologically relevant, unlike spurious crystal packing interactions. Consequently, this study gives lower and upper bounds on the number of different types of interacting domain pairs in the structure database on the order of 1000-2000. We use CBMs to create domain interaction networks, which highlight functionally significant connections by avoiding many infrequent links between highly connected nodes. The CBMs also constitute a library of docking templates that may be used in molecular modeling to infer the characteristics of an unknown binding surface, just as conserved domains may be used to infer the structure of an unknown protein. The method's ability to sort through and classify large numbers of putative interacting domain pairs is demonstrated on the oligomeric interactions of globins.
Collapse
Affiliation(s)
- Benjamin A Shoemaker
- Computational Biology Branch, National Center for Biotechnology Information, Building 38A, National Institutes of Health, Bethesda, MD 20894, USA.
| | | | | |
Collapse
|
26
|
Pils B, Copley RR, Schultz J. Variation in structural location and amino acid conservation of functional sites in protein domain families. BMC Bioinformatics 2005; 6:210. [PMID: 16122386 PMCID: PMC1215474 DOI: 10.1186/1471-2105-6-210] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2005] [Accepted: 08/25/2005] [Indexed: 11/26/2022] Open
Abstract
Background The functional sites of a protein present important information for determining its cellular function and are fundamental in drug design. Accordingly, accurate methods for the prediction of functional sites are of immense value. Most available methods are based on a set of homologous sequences and structural or evolutionary information, and assume that functional sites are more conserved than the average. In the analysis presented here, we have investigated the conservation of location and type of amino acids at functional sites, and compared the behaviour of functional sites between different protein domains. Results Functional sites were extracted from experimentally determined structural complexes from the Protein Data Bank harbouring a conserved protein domain from the SMART database. In general, functional (i.e. interacting) sites whose location is more highly conserved are also more conserved in their type of amino acid. However, even highly conserved functional sites can present a wide spectrum of amino acids. The degree of conservation strongly depends on the function of the protein domain and ranges from highly conserved in location and amino acid to very variable. Differentiation by binding partner shows that ion binding sites tend to be more conserved than functional sites binding peptides or nucleotides. Conclusion The results gained by this analysis will help improve the accuracy of functional site prediction and facilitate the characterization of unknown protein sequences.
Collapse
Affiliation(s)
- Birgit Pils
- Department of Bioinformatics, University of Würzburg, Biocenter, Am Hubland, 97074 Würzburg, Germany
| | - Richard R Copley
- Wellcome Trust Centre for Human Genetics, University of Oxford, Headington, OX3 7BN Oxford, UK
| | - Jörg Schultz
- Department of Bioinformatics, University of Würzburg, Biocenter, Am Hubland, 97074 Würzburg, Germany
| |
Collapse
|
27
|
Xie L, Bourne PE. Functional coverage of the human genome by existing structures, structural genomics targets, and homology models. PLoS Comput Biol 2005; 1:e31. [PMID: 16118666 PMCID: PMC1188274 DOI: 10.1371/journal.pcbi.0010031] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2005] [Accepted: 07/18/2005] [Indexed: 11/23/2022] Open
Abstract
The bias in protein structure and function space resulting from experimental limitations and targeting of particular functional classes of proteins by structural biologists has long been recognized, but never continuously quantified. Using the Enzyme Commission and the Gene Ontology classifications as a reference frame, and integrating structure data from the Protein Data Bank (PDB), target sequences from the structural genomics projects, structure homology derived from the SUPERFAMILY database, and genome annotations from Ensembl and NCBI, we provide a quantified view, both at the domain and whole-protein levels, of the current and projected coverage of protein structure and function space relative to the human genome. Protein structures currently provide at least one domain that covers 37% of the functional classes identified in the genome; whole structure coverage exists for 25% of the genome. If all the structural genomics targets were solved (twice the current number of structures in the PDB), it is estimated that structures of one domain would cover 69% of the functional classes identified and complete structure coverage would be 44%. Homology models from existing experimental structures extend the 37% coverage to 56% of the genome as single domains and 25% to 31% for complete structures. Coverage from homology models is not evenly distributed by protein family, reflecting differing degrees of sequence and structure divergence within families. While these data provide coverage, conversely, they also systematically highlight functional classes of proteins for which structures should be determined. Current key functional families without structure representation are highlighted here; updated information on the “most wanted list” that should be solved is available on a weekly basis from http://function.rcsb.org:8080/pdb/function_distribution/index.html. The sequencing of the human genome provides biologists with new opportunities to understand the molecular basis of physiological processes and disease states. To take full advantage of these opportunities, the three-dimensional structures of the gene products are needed to provide the appropriate level of detail. Since protein structure determination lags behind protein sequence determination, an important and ongoing question becomes: what degree of coverage of the human proteome do we have from experimental structures, and what can we infer by modeling? Or, turning the question around: what structures do we need to determine (the “most wanted list”) to further our understanding of the human condition? This paper addresses these questions through integration of existing data resources correlated using comparative functional features, namely the Gene Ontology, which describes biochemical process, molecular function, and cellular location for all types of proteins, and the Enzyme Commission classification for enzymes. Genetic disease states are linked through the Online Mendelian Inheritance in Man resource. Readers can ask their own questions of the resource at http://function.rcsb.org:8080/pdb/function_distribution/index.html. The resource should prove particularly useful to the structural genomics community as it strives to undertake large-scale structure determination with a goal of improving the understanding of protein functional space.
Collapse
Affiliation(s)
- Lei Xie
- San Diego Supercomputer Center and Department of Pharmacology, University of California, San Diego, California, United States of America
| | - Philip E Bourne
- San Diego Supercomputer Center and Department of Pharmacology, University of California, San Diego, California, United States of America
- *To whom correspondence should be addressed. E-mail:
| |
Collapse
|
28
|
Abstract
We address the question of whether or not the positions of protein-binding sites on homologous protein structures are conserved irrespective of the identities of their binding partners. First, for each domain family in the Structural Classification of Proteins (SCOP), protein-binding sites are extracted from our comprehensive database of structurally defined binary domain interactions (PIBASE). Second, the binding sites within each family are superposed using a structural alignment of its members. Finally, the degree of localization of binding sites within each family is quantified by comparing it with localization expected by chance. We found that 72% of the 1847 SCOP domain families in PIBASE have binding sites with localization values greater than expected by chance. Moreover, 554 (30%) of these families have localizations that are statistically significant (i.e., more than four standard deviations away from the mean expected by chance). In contrast, only 144 (8%) families have significantly low localization. The absence of a significant correlation of the binding site localization with the average sequence and structural conservations in a family suggests that localization can be helpful for describing the functional diversity of protein-protein interactions, complementing measures of sequence and structural conservation. Consideration of the binding site localization may also result in spatial restraints for the modeling of protein assembly structures.
Collapse
Affiliation(s)
- Dmitry Korkin
- Department of Biopharmaceutical Sciences, University of California at San Francisco, San Francisco, CA 94143-2552, USA
| | | | | |
Collapse
|