51
|
Lees JG, Dawson NL, Sillitoe I, Orengo CA. Functional innovation from changes in protein domains and their combinations. Curr Opin Struct Biol 2016; 38:44-52. [DOI: 10.1016/j.sbi.2016.05.016] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2016] [Revised: 05/17/2016] [Accepted: 05/24/2016] [Indexed: 10/21/2022]
|
52
|
Stogios PJ, Cox G, Spanogiannopoulos P, Pillon MC, Waglechner N, Skarina T, Koteva K, Guarné A, Savchenko A, Wright GD. Rifampin phosphotransferase is an unusual antibiotic resistance kinase. Nat Commun 2016; 7:11343. [PMID: 27103605 PMCID: PMC4844700 DOI: 10.1038/ncomms11343] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2015] [Accepted: 03/15/2016] [Indexed: 11/11/2022] Open
Abstract
Rifampin (RIF) phosphotransferase (RPH) confers antibiotic resistance by conversion of RIF and ATP, to inactive phospho-RIF, AMP and Pi. Here we present the crystal structure of RPH from Listeria monocytogenes (RPH-Lm), which reveals that the enzyme is comprised of three domains: two substrate-binding domains (ATP-grasp and RIF-binding domains); and a smaller phosphate-carrying His swivel domain. Using solution small-angle X-ray scattering and mutagenesis, we reveal a mechanism where the swivel domain transits between the spatially distinct substrate-binding sites during catalysis. RPHs are previously uncharacterized dikinases that are widespread in environmental and pathogenic bacteria. These enzymes are members of a large unexplored group of bacterial enzymes with substrate affinities that have yet to be fully explored. Such an enzymatically complex mechanism of antibiotic resistance augments the spectrum of strategies used by bacteria to evade antimicrobial compounds.
Collapse
Affiliation(s)
- Peter J. Stogios
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada M5G 1L6
| | - Georgina Cox
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, McMaster University, 1280 Main St W, Hamilton, Ontario, Canada L8S 4K1
| | - Peter Spanogiannopoulos
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, McMaster University, 1280 Main St W, Hamilton, Ontario, Canada L8S 4K1
| | - Monica C. Pillon
- Department of Biochemistry and Biomedical Sciences, McMaster University, 1280 Main St W, Hamilton, Ontario, Canada L8S 4K1
| | - Nicholas Waglechner
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, McMaster University, 1280 Main St W, Hamilton, Ontario, Canada L8S 4K1
| | - Tatiana Skarina
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, McMaster University, 1280 Main St W, Hamilton, Ontario, Canada L8S 4K1
| | - Kalinka Koteva
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, McMaster University, 1280 Main St W, Hamilton, Ontario, Canada L8S 4K1
| | - Alba Guarné
- Department of Biochemistry and Biomedical Sciences, McMaster University, 1280 Main St W, Hamilton, Ontario, Canada L8S 4K1
| | - Alexei Savchenko
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada M5G 1L6
| | - Gerard D. Wright
- M.G. DeGroote Institute for Infectious Disease Research, Department of Biochemistry and Biomedical Sciences, McMaster University, 1280 Main St W, Hamilton, Ontario, Canada L8S 4K1
| |
Collapse
|
53
|
Papaleo E, Saladino G, Lambrughi M, Lindorff-Larsen K, Gervasio FL, Nussinov R. The Role of Protein Loops and Linkers in Conformational Dynamics and Allostery. Chem Rev 2016; 116:6391-423. [DOI: 10.1021/acs.chemrev.5b00623] [Citation(s) in RCA: 239] [Impact Index Per Article: 29.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Affiliation(s)
- Elena Papaleo
- Computational
Biology Laboratory, Unit of Statistics, Bioinformatics and Registry, Danish Cancer Society Research Center, Strandboulevarden 49, 2100 Copenhagen, Denmark
- Structural
Biology and NMR Laboratory, Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Giorgio Saladino
- Department
of Chemistry, University College London, London WC1E 6BT, United Kingdom
| | - Matteo Lambrughi
- Department
of Biotechnology and Biosciences, University of Milano-Bicocca, Piazza
della Scienza 2, 20126 Milan, Italy
| | - Kresten Lindorff-Larsen
- Structural
Biology and NMR Laboratory, Department of Biology, University of Copenhagen, 2200 Copenhagen, Denmark
| | | | - Ruth Nussinov
- Cancer
and Inflammation Program, Leidos Biomedical Research, Inc., Frederick
National Laboratory for Cancer Research, National Cancer Institute Frederick, Frederick, Maryland 21702, United States
- Sackler Institute
of Molecular Medicine, Department of Human Genetics and Molecular
Medicine Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
54
|
Das S, Orengo CA. Protein function annotation using protein domain family resources. Methods 2016; 93:24-34. [DOI: 10.1016/j.ymeth.2015.09.029] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2015] [Revised: 09/28/2015] [Accepted: 09/29/2015] [Indexed: 01/25/2023] Open
|
55
|
Das S, Dawson NL, Orengo CA. Diversity in protein domain superfamilies. Curr Opin Genet Dev 2015; 35:40-9. [PMID: 26451979 PMCID: PMC4686048 DOI: 10.1016/j.gde.2015.09.005] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2015] [Revised: 09/07/2015] [Accepted: 09/08/2015] [Indexed: 01/25/2023]
Abstract
Whilst ∼93% of domain superfamilies appear to be relatively structurally and functionally conserved based on the available data from the CATH-Gene3D domain classification resource, the remainder are much more diverse. In this review, we consider how domains in some of the most ubiquitous and promiscuous superfamilies have evolved, in particular the plasticity in their functional sites and surfaces which expands the repertoire of molecules they interact with and actions performed on them. To what extent can we identify a core function for these superfamilies which would allow us to develop a ‘domain grammar of function’ whereby a protein's biological role can be proposed from its constituent domains? Clearly the first step is to understand the extent to which these components vary and how changes in their molecular make-up modifies function.
Collapse
Affiliation(s)
- Sayoni Das
- Institute of Structural and Molecular Biology, UCL, 627 Darwin Building, Gower Street, WC1E 6BT, UK
| | - Natalie L Dawson
- Institute of Structural and Molecular Biology, UCL, 627 Darwin Building, Gower Street, WC1E 6BT, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, UCL, 627 Darwin Building, Gower Street, WC1E 6BT, UK.
| |
Collapse
|
56
|
Assessing the Metabolic Diversity of Streptococcus from a Protein Domain Point of View. PLoS One 2015; 10:e0137908. [PMID: 26366735 PMCID: PMC4569324 DOI: 10.1371/journal.pone.0137908] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2015] [Accepted: 08/22/2015] [Indexed: 01/17/2023] Open
Abstract
Understanding the diversity and robustness of the metabolism of bacteria is fundamental for understanding how bacteria evolve and adapt to different environments. In this study, we characterised 121 Streptococcus strains and studied metabolic diversity from a protein domain perspective. Metabolic pathways were described in terms of the promiscuity of domains participating in metabolic pathways that were inferred to be functional. Promiscuity was defined by adapting existing measures based on domain abundance and versatility. The approach proved to be successful in capturing bacterial metabolic flexibility and species diversity, indicating that it can be described in terms of reuse and sharing functional domains in different proteins involved in metabolic activity. Additionally, we showed striking differences among metabolic organisation of the pathogenic serotype 2 Streptococcus suis and other strains.
Collapse
|
57
|
Shahzad K, Mittenthal JE, Caetano-Anollés G. The organization of domains in proteins obeys Menzerath-Altmann's law of language. BMC SYSTEMS BIOLOGY 2015; 9:44. [PMID: 26260760 PMCID: PMC4531524 DOI: 10.1186/s12918-015-0192-9] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2015] [Accepted: 07/30/2015] [Indexed: 11/10/2022]
Abstract
BACKGROUND The combination of domains in multidomain proteins enhances their function and structure but lengthens the molecules and increases their cost at cellular level. METHODS The dependence of domain length on the number of domains a protein holds was surveyed for a set of 60 proteomes representing free-living organisms from all kingdoms of life. Distributions were fitted using non-linear functions and fitted parameters interpreted with a formulation of decreasing returns. RESULTS We find that domain length decreases with increasing number of domains in proteins, following the Menzerath-Altmann (MA) law of language. Highly significant negative correlations exist for the set of proteomes examined. Mathematically, the MA law expresses as a power law relationship that unfolds when molecular persistence P is a function of domain accretion. P holds two terms, one reflecting the matter-energy cost of adding domains and extending their length, the other reflecting how domain length and number impinges on information and biophysics. The pattern of diminishing returns can therefore be explained as a frustrated interplay between the strategies of economy, flexibility and robustness, matching previously observed trade-offs in the domain makeup of proteomes. Proteomes of Archaea, Fungi and to a lesser degree Plants show the largest push towards molecular economy, each at their own economic stratum. Fungi increase domain size in single domain proteins while reinforcing the pattern of diminishing returns. In contrast, Metazoa, and to lesser degrees Protista and Bacteria, relax economy. Metazoa achieves maximum flexibility and robustness by harboring compact molecules and complex domain organization, offering a new functional vocabulary for molecular biology. CONCLUSIONS The tendency of parts to decrease their size when systems enlarge is universal for language and music, and now for parts of macromolecules, extending the MA law to natural systems.
Collapse
Affiliation(s)
| | - Jay E Mittenthal
- Department of Cell and Developmental Biology, Urbana, IL, 61801, USA.
| | - Gustavo Caetano-Anollés
- Illinois Informatics Institute, Urbana, IL, 61801, USA. .,Department of Crop Sciences, Evolutionary Bioinformatics Laboratory, University of Illinois, 332 NSRC, Urbana, IL, 61801, USA.
| |
Collapse
|
58
|
Das S, Lee D, Sillitoe I, Dawson NL, Lees JG, Orengo CA. Functional classification of CATH superfamilies: a domain-based approach for protein function annotation. Bioinformatics 2015; 31:3460-7. [PMID: 26139634 PMCID: PMC4612221 DOI: 10.1093/bioinformatics/btv398] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2015] [Accepted: 06/24/2015] [Indexed: 11/18/2022] Open
Abstract
Motivation: Computational approaches that can predict protein functions are essential to bridge the widening function annotation gap especially since <1.0% of all proteins in UniProtKB have been experimentally characterized. We present a domain-based method for protein function classification and prediction of functional sites that exploits functional sub-classification of CATH superfamilies. The superfamilies are sub-classified into functional families (FunFams) using a hierarchical clustering algorithm supervised by a new classification method, FunFHMMer. Results: FunFHMMer generates more functionally coherent groupings of protein sequences than other domain-based protein classifications. This has been validated using known functional information. The conserved positions predicted by the FunFams are also found to be enriched in known functional residues. Moreover, the functional annotations provided by the FunFams are found to be more precise than other domain-based resources. FunFHMMer currently identifies 110 439 FunFams in 2735 superfamilies which can be used to functionally annotate > 16 million domain sequences. Availability and implementation: All FunFam annotation data are made available through the CATH webpages (http://www.cathdb.info). The FunFHMMer webserver (http://www.cathdb.info/search/by_funfhmmer) allows users to submit query sequences for assignment to a CATH FunFam. Contact:sayoni.das.12@ucl.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sayoni Das
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, WC1E 6BT, UK
| | - David Lee
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, WC1E 6BT, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, WC1E 6BT, UK
| | - Natalie L Dawson
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, WC1E 6BT, UK
| | - Jonathan G Lees
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, WC1E 6BT, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, WC1E 6BT, UK
| |
Collapse
|
59
|
Lu Y, Lu Y, Deng J, Peng H, Lu H, Lu LJ. A novel essential domain perspective for exploring gene essentiality. Bioinformatics 2015; 31:2921-9. [PMID: 26002906 DOI: 10.1093/bioinformatics/btv312] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2015] [Accepted: 05/13/2015] [Indexed: 02/05/2023] Open
Abstract
MOTIVATION Genes with indispensable functions are identified as essential; however, the traditional gene-level studies of essentiality have several limitations. In this study, we characterized gene essentiality from a new perspective of protein domains, the independent structural or functional units of a polypeptide chain. RESULTS To identify such essential domains, we have developed an Expectation-Maximization (EM) algorithm-based Essential Domain Prediction (EDP) Model. With simulated datasets, the model provided convergent results given different initial values and offered accurate predictions even with noise. We then applied the EDP model to six microbial species and predicted 1879 domains to be essential in at least one species, ranging 10-23% in each species. The predicted essential domains were more conserved than either non-essential domains or essential genes. Comparing essential domains in prokaryotes and eukaryotes revealed an evolutionary distance consistent with that inferred from ribosomal RNA. When utilizing these essential domains to reproduce the annotation of essential genes, we received accurate results that suggest protein domains are more basic units for the essentiality of genes. Furthermore, we presented several examples to illustrate how the combination of essential and non-essential domains can lead to genes with divergent essentiality. In summary, we have described the first systematic analysis on gene essentiality on the level of domains. CONTACT huilu.bioinfo@gmail.com or Long.Lu@cchmc.org SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yao Lu
- Shanghai Institute of Medical Genetics, Shanghai Children's Hospital, Shanghai Jiao Tong University, 24/1400 Beijing (W) Road, Shanghai 200040, People's Republic of China
| | - Yulan Lu
- State Key Laboratory of Genetic Engineering Institute of Biostatistics, School of Life Science, Fudan University, Shanghai 200433, People's Republic of China
| | - Jingyuan Deng
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Hai Peng
- Institute for Systems Biology, Jianghan University, Wuhan, Hubei, People's Republic of China
| | - Hui Lu
- Shanghai Institute of Medical Genetics, Shanghai Children's Hospital, Shanghai Jiao Tong University, 24/1400 Beijing (W) Road, Shanghai 200040, People's Republic of China, Department of Bioengineering (MC 063), University of Illinois at Chicago, Chicago, IL 60607-7052, USA and Collaborative Innovation Center for Biotherapy, West China Hospital, Sichuan University, Chengdu, China
| | - Long Jason Lu
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA, Institute for Systems Biology, Jianghan University, Wuhan, Hubei, People's Republic of China
| |
Collapse
|
60
|
Martínez Cuesta S, Rahman SA, Furnham N, Thornton JM. The Classification and Evolution of Enzyme Function. Biophys J 2015; 109:1082-6. [PMID: 25986631 DOI: 10.1016/j.bpj.2015.04.020] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2015] [Revised: 04/16/2015] [Accepted: 04/17/2015] [Indexed: 11/30/2022] Open
Abstract
Enzymes are the proteins responsible for the catalysis of life. Enzymes sharing a common ancestor as defined by sequence and structure similarity are grouped into families and superfamilies. The molecular function of enzymes is defined as their ability to catalyze biochemical reactions; it is manually classified by the Enzyme Commission and robust approaches to quantitatively compare catalytic reactions are just beginning to appear. Here, we present an overview of studies at the interface of the evolution and function of enzymes.
Collapse
Affiliation(s)
- Sergio Martínez Cuesta
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Syed Asad Rahman
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom
| | - Nicholas Furnham
- Department of Pathogen Molecular Biology, London School of Hygiene & Tropical Medicine, London, United Kingdom
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
| |
Collapse
|
61
|
Das S, Sillitoe I, Lee D, Lees JG, Dawson NL, Ward J, Orengo CA. CATH FunFHMMer web server: protein functional annotations using functional family assignments. Nucleic Acids Res 2015; 43:W148-53. [PMID: 25964299 PMCID: PMC4489299 DOI: 10.1093/nar/gkv488] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2015] [Accepted: 05/02/2015] [Indexed: 12/20/2022] Open
Abstract
The widening function annotation gap in protein databases and the increasing number and diversity of the proteins being sequenced presents new challenges to protein function prediction methods. Multidomain proteins complicate the protein sequence–structure–function relationship further as new combinations of domains can expand the functional repertoire, creating new proteins and functions. Here, we present the FunFHMMer web server, which provides Gene Ontology (GO) annotations for query protein sequences based on the functional classification of the domain-based CATH-Gene3D resource. Our server also provides valuable information for the prediction of functional sites. The predictive power of FunFHMMer has been validated on a set of 95 proteins where FunFHMMer performs better than BLAST, Pfam and CDD. Recent validation by an independent international competition ranks FunFHMMer as one of the top function prediction methods in predicting GO annotations for both the Biological Process and Molecular Function Ontology. The FunFHMMer web server is available at http://www.cathdb.info/search/by_funfhmmer.
Collapse
Affiliation(s)
- Sayoni Das
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, WC1E 6BT, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, WC1E 6BT, UK
| | - David Lee
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, WC1E 6BT, UK
| | - Jonathan G Lees
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, WC1E 6BT, UK
| | - Natalie L Dawson
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, WC1E 6BT, UK
| | - John Ward
- Department of Biochemical Engineering, UCL, Gower Street, WC1E 6BT, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, WC1E 6BT, UK
| |
Collapse
|
62
|
Hybrid and rogue kinases encoded in the genomes of model eukaryotes. PLoS One 2014; 9:e107956. [PMID: 25255313 PMCID: PMC4177888 DOI: 10.1371/journal.pone.0107956] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2014] [Accepted: 08/18/2014] [Indexed: 11/19/2022] Open
Abstract
The highly modular nature of protein kinases generates diverse functional roles mediated by evolutionary events such as domain recombination, insertion and deletion of domains. Usually domain architecture of a kinase is related to the subfamily to which the kinase catalytic domain belongs. However outlier kinases with unusual domain architectures serve in the expansion of the functional space of the protein kinase family. For example, Src kinases are made-up of SH2 and SH3 domains in addition to the kinase catalytic domain. A kinase which lacks these two domains but retains sequence characteristics within the kinase catalytic domain is an outlier that is likely to have modes of regulation different from classical src kinases. This study defines two types of outlier kinases: hybrids and rogues depending on the nature of domain recombination. Hybrid kinases are those where the catalytic kinase domain belongs to a kinase subfamily but the domain architecture is typical of another kinase subfamily. Rogue kinases are those with kinase catalytic domain characteristic of a kinase subfamily but the domain architecture is typical of neither that subfamily nor any other kinase subfamily. This report provides a consolidated set of such hybrid and rogue kinases gleaned from six eukaryotic genomes-S.cerevisiae, D. melanogaster, C.elegans, M.musculus, T.rubripes and H.sapiens-and discusses their functions. The presence of such kinases necessitates a revisiting of the classification scheme of the protein kinase family using full length sequences apart from classical classification using solely the sequences of kinase catalytic domains. The study of these kinases provides a good insight in engineering signalling pathways for a desired output. Lastly, identification of hybrids and rogues in pathogenic protozoa such as P.falciparum sheds light on possible strategies in host-pathogen interactions.
Collapse
|
63
|
Tóth-Petróczy A, Tawfik DS. The robustness and innovability of protein folds. Curr Opin Struct Biol 2014; 26:131-8. [PMID: 25038399 DOI: 10.1016/j.sbi.2014.06.007] [Citation(s) in RCA: 93] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2013] [Revised: 06/26/2014] [Accepted: 06/26/2014] [Indexed: 11/30/2022]
Abstract
Assignment of protein folds to functions indicates that >60% of folds carry out one or two enzymatic functions, while few folds, for example, the TIM-barrel and Rossmann folds, exhibit hundreds. Are there structural features that make a fold amenable to functional innovation (innovability)? Do these features relate to robustness--the ability to readily accumulate sequence changes? We discuss several hypotheses regarding the relationship between the architecture of a protein and its evolutionary potential. We describe how, in a seemingly paradoxical manner, opposite properties, such as high stability and rigidity versus conformational plasticity and structural order versus disorder, promote robustness and/or innovability. We hypothesize that polarity--differentiation and low connectivity between a protein's scaffold and its active-site--is a key prerequisite for innovability.
Collapse
Affiliation(s)
- Agnes Tóth-Petróczy
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Dan S Tawfik
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 76100, Israel.
| |
Collapse
|
64
|
Martinez Cuesta S, Furnham N, Rahman SA, Sillitoe I, Thornton JM. The evolution of enzyme function in the isomerases. Curr Opin Struct Biol 2014; 26:121-30. [PMID: 25000289 PMCID: PMC4139412 DOI: 10.1016/j.sbi.2014.06.002] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2014] [Revised: 06/02/2014] [Accepted: 06/10/2014] [Indexed: 01/14/2023]
Abstract
The advent of computational approaches to measure functional similarity between enzymes adds a new dimension to existing evolutionary studies based on sequence and structure. This paper reviews research efforts aiming to understand the evolution of enzyme function in superfamilies, presenting a novel strategy to provide an overview of the evolution of enzymes belonging to an individual EC class, using the isomerases as an exemplar.
Collapse
Affiliation(s)
- Sergio Martinez Cuesta
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom.
| | - Nicholas Furnham
- Department of Pathogen Molecular Biology, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, United Kingdom
| | - Syed Asad Rahman
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London, WC1E 6BT, United Kingdom
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom.
| |
Collapse
|
65
|
Computational prediction of protein function based on weighted mapping of domains and GO terms. BIOMED RESEARCH INTERNATIONAL 2014; 2014:641469. [PMID: 24868539 PMCID: PMC4017789 DOI: 10.1155/2014/641469] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/21/2013] [Accepted: 03/12/2014] [Indexed: 11/17/2022]
Abstract
In this paper, we propose a novel method, SeekFun, to predict protein function based on weighted mapping of domains and GO terms. Firstly, a weighted mapping of domains and GO terms is constructed according to GO annotations and domain composition of the proteins. The association strength between domain and GO term is weighted by symmetrical conditional probability. Secondly, the mapping is extended along the true paths of the terms based on GO hierarchy. Finally, the terms associated with resident domains are transferred to host protein and real annotations of the host protein are determined by association strengths. Our careful comparisons demonstrate that SeekFun outperforms the concerned methods on most occasions. SeekFun provides a flexible and effective way for protein function prediction. It benefits from the well-constructed mapping of domains and GO terms, as well as the reasonable strategy for inferring annotations of protein from those of its domains.
Collapse
|
66
|
Joseph AP, de Brevern AG. From local structure to a global framework: recognition of protein folds. J R Soc Interface 2014; 11:20131147. [PMID: 24740960 DOI: 10.1098/rsif.2013.1147] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Protein folding has been a major area of research for many years. Nonetheless, the mechanisms leading to the formation of an active biological fold are still not fully apprehended. The huge amount of available sequence and structural information provides hints to identify the putative fold for a given sequence. Indeed, protein structures prefer a limited number of local backbone conformations, some being characterized by preferences for certain amino acids. These preferences largely depend on the local structural environment. The prediction of local backbone conformations has become an important factor to correctly identifying the global protein fold. Here, we review the developments in the field of local structure prediction and especially their implication in protein fold recognition.
Collapse
Affiliation(s)
- Agnel Praveen Joseph
- Science and Technology Facilities Council, Rutherford Appleton Laboratory, Harwell Oxford, , Didcot OX11 0QX, UK
| | | |
Collapse
|
67
|
Peng W, Wang J, Cai J, Chen L, Li M, Wu FX. Improving protein function prediction using domain and protein complexes in PPI networks. BMC SYSTEMS BIOLOGY 2014; 8:35. [PMID: 24655481 PMCID: PMC3994332 DOI: 10.1186/1752-0509-8-35] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2012] [Accepted: 03/14/2014] [Indexed: 01/25/2023]
Abstract
Background Characterization of unknown proteins through computational approaches is one of the most challenging problems in silico biology, which has attracted world-wide interests and great efforts. There have been some computational methods proposed to address this problem, which are either based on homology mapping or in the context of protein interaction networks. Results In this paper, two algorithms are proposed by integrating the protein-protein interaction (PPI) network, proteins’ domain information and protein complexes. The one is domain combination similarity (DCS), which combines the domain compositions of both proteins and their neighbors. The other is domain combination similarity in context of protein complexes (DSCP), which extends the protein functional similarity definition of DCS by combining the domain compositions of both proteins and the complexes including them. The new algorithms are tested on networks of the model species of Saccharomyces cerevisiae to predict functions of unknown proteins using cross validations. Comparing with other several existing algorithms, the results have demonstrated the effectiveness of our proposed methods in protein function prediction. Furthermore, the algorithm DSCP using experimental determined complex data is robust when a large percentage of the proteins in the network is unknown, and it outperforms DCS and other several existing algorithms. Conclusions The accuracy of predicting protein function can be improved by integrating the protein-protein interaction (PPI) network, proteins’ domain information and protein complexes.
Collapse
Affiliation(s)
| | - Jianxin Wang
- School of Information Science and Engineering, Central South University, Changsha, Hunan 410083, PR China.
| | | | | | | | | |
Collapse
|
68
|
Shi JY, Yiu SM, Zhang YN, Chin FYL. Effective moment feature vectors for protein domain structures. PLoS One 2014; 8:e83788. [PMID: 24391828 DOI: 10.1371/journal.pone.0083788] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2013] [Accepted: 11/08/2013] [Indexed: 11/19/2022] Open
Abstract
Imaging processing techniques have been shown to be useful in studying protein domain structures. The idea is to represent the pairwise distances of any two residues of the structure in a 2D distance matrix (DM). Features and/or submatrices are extracted from this DM to represent a domain. Existing approaches, however, may involve a large number of features (100-400) or complicated mathematical operations. Finding fewer but more effective features is always desirable. In this paper, based on some key observations on DMs, we are able to decompose a DM image into four basic binary images, each representing the structural characteristics of a fundamental secondary structure element (SSE) or a motif in the domain. Using the concept of moments in image processing, we further derive 45 structural features based on the four binary images. Together with 4 features extracted from the basic images, we represent the structure of a domain using 49 features. We show that our feature vectors can represent domain structures effectively in terms of the following. (1) We show a higher accuracy for domain classification. (2) We show a clear and consistent distribution of domains using our proposed structural vector space. (3) We are able to cluster the domains according to our moment features and demonstrate a relationship between structural variation and functional diversity.
Collapse
Affiliation(s)
- Jian-Yu Shi
- School of Life Science, Northwestern Polytechnical University, Xi'an, Shaanxi Province, China ; Department of Computer Science, The University of Hong Kong, Hong Kong, China
| | - Siu-Ming Yiu
- Department of Computer Science, The University of Hong Kong, Hong Kong, China
| | - Yan-Ning Zhang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, Shaanxi Province, China
| | | |
Collapse
|
69
|
Bhaskara RM, Mehrotra P, Rakshambikai R, Gnanavel M, Martin J, Srinivasan N. The relationship between classification of multi-domain proteins using an alignment-free approach and their functions: a case study with immunoglobulins. MOLECULAR BIOSYSTEMS 2014; 10:1082-93. [DOI: 10.1039/c3mb70443b] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|
70
|
Radou G, Enciso M, Krivov S, Paci E. Modulation of a protein free-energy landscape by circular permutation. J Phys Chem B 2013; 117:13743-7. [PMID: 24090448 PMCID: PMC3821731 DOI: 10.1021/jp406818t] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
![]()
Circular
permutations usually retain the native structure and function
of a protein while inevitably perturbing its folding dynamics. By
using simulations with a structure-based model and a rigorous methodology
to determine free-energy surfaces from trajectories, we evaluate the
effect of a circular permutation on the free-energy landscape of the
protein T4 lysozyme. We observe changes which, although subtle, largely
affect the cooperativity between the two subdomains. Such a change
in cooperativity has been previously experimentally observed and recently
also characterized using single molecule optical tweezers and the
Crooks relation. The free-energy landscapes show that both the wild
type and circular permutant have an on-pathway intermediate, previously
experimentally characterized, in which one of the subdomains is completely
formed. The landscapes, however, differ in the position of the rate-limiting
step for folding, which occurs before the intermediate in the wild
type and after in the circular permutant. This shift of transition
state explains the observed change in the cooperativity. The underlying
free-energy landscape thus provides a microscopic description of the
folding dynamics and the connection between circular permutation and
the loss of cooperativity experimentally observed.
Collapse
Affiliation(s)
- Gaël Radou
- Astbury Centre for Structural Molecular Biology, University of Leeds , Leeds LS2 9JT, United Kingdom
| | | | | | | |
Collapse
|
71
|
Going over the three dimensional protein structure similarity problem. Artif Intell Rev 2013. [DOI: 10.1007/s10462-013-9416-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
72
|
Mohanty S, Purwar M, Srinivasan N, Rekha N. Tethering preferences of domain families co-occurring in multi-domain proteins. MOLECULAR BIOSYSTEMS 2013; 9:1708-25. [PMID: 23571467 DOI: 10.1039/c3mb25481j] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Genomic data of several organisms have revealed the presence of a vast repertoire of multi-domain proteins. The role played by individual domains in a multi-domain protein has a profound influence on the overall function of the protein. In the present analysis an attempt has been made to better understand the tethering preferences of domain families that occur in multi-domain proteins. The analysis has been carried out on an exhaustive dataset of 2 961 898 sequences of proteins from 930 organisms, where 741 274 proteins are comprised of at least two domain families. For every domain family, the number of other domain families with which it co-occurs within a protein in this dataset has been enumerated and is referred to as the tethering number of the domain family. It was found that, in the general dataset, the AAA ATPase family and the family of Ser/Thr kinases have the highest tethering numbers of 450 and 444 respectively. Further analysis reveals significant correlation between the number of members in a family and its tethering number. Positive correlation was also observed for the extent of a sequence and functional diversity within a family and the tethering numbers of domain families. Domain families that are present ubiquitously in diverse organisms tend to have large tethering numbers, while organism/kingdom-specific families have low tethering numbers. Thus, the analysis uncovers how domain families recombine and evolve to give rise to multi-domain proteins.
Collapse
Affiliation(s)
- Smita Mohanty
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560012, India
| | | | | | | |
Collapse
|
73
|
Yafremava LS, Wielgos M, Thomas S, Nasir A, Wang M, Mittenthal JE, Caetano-Anollés G. A general framework of persistence strategies for biological systems helps explain domains of life. Front Genet 2013; 4:16. [PMID: 23443991 PMCID: PMC3580334 DOI: 10.3389/fgene.2013.00016] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2012] [Accepted: 01/28/2013] [Indexed: 11/13/2022] Open
Abstract
The nature and cause of the division of organisms in superkingdoms is not fully understood. Assuming that environment shapes physiology, here we construct a novel theoretical framework that helps identify general patterns of organism persistence. This framework is based on Jacob von Uexküll's organism-centric view of the environment and James G. Miller's view of organisms as matter-energy-information processing molecular machines. Three concepts describe an organism's environmental niche: scope, umwelt, and gap. Scope denotes the entirety of environmental events and conditions to which the organism is exposed during its lifetime. Umwelt encompasses an organism's perception of these events. The gap is the organism's blind spot, the scope that is not covered by umwelt. These concepts bring organisms of different complexity to a common ecological denominator. Ecological and physiological data suggest organisms persist using three strategies: flexibility, robustness, and economy. All organisms use umwelt information to flexibly adapt to environmental change. They implement robustness against environmental perturbations within the gap generally through redundancy and reliability of internal constituents. Both flexibility and robustness improve survival. However, they also incur metabolic matter-energy processing costs, which otherwise could have been used for growth and reproduction. Lineages evolve unique tradeoff solutions among strategies in the space of what we call "a persistence triangle." Protein domain architecture and other evidence support the preferential use of flexibility and robustness properties. Archaea and Bacteria gravitate toward the triangle's economy vertex, with Archaea biased toward robustness. Eukarya trade economy for survivability. Protista occupy a saddle manifold separating akaryotes from multicellular organisms. Plants and the more flexible Fungi share an economic stratum, and Metazoa are locked in a positive feedback loop toward flexibility.
Collapse
Affiliation(s)
- Liudmila S Yafremava
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois Urbana, IL, USA
| | | | | | | | | | | | | |
Collapse
|
74
|
Moore AD, Grath S, Schüler A, Huylmans AK, Bornberg-Bauer E. Quantification and functional analysis of modular protein evolution in a dense phylogenetic tree. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1834:898-907. [PMID: 23376183 DOI: 10.1016/j.bbapap.2013.01.007] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2012] [Revised: 01/06/2013] [Accepted: 01/09/2013] [Indexed: 12/24/2022]
Abstract
Modularity is a hallmark of molecular evolution. Whether considering gene regulation, the components of metabolic pathways or signaling cascades, the ability to reuse autonomous modules in different molecular contexts can expedite evolutionary innovation. Similarly, protein domains are the modules of proteins, and modular domain rearrangements can create diversity with seemingly few operations in turn allowing for swift changes to an organism's functional repertoire. Here, we assess the patterns and functional effects of modular rearrangements at high resolution. Using a well resolved and diverse group of pancrustaceans, we illustrate arrangement diversity within closely related organisms, estimate arrangement turnover frequency and establish, for the first time, branch-specific rate estimates for fusion, fission, domain addition and terminal loss. Our results show that roughly 16 new arrangements arise per million years and that between 64% and 81% of these can be explained by simple, single-step modular rearrangement events. We find evidence that the frequencies of fission and terminal deletion events increase over time, and that modular rearrangements impact all levels of the cellular signaling apparatus and thus may have strong adaptive potential. Novel arrangements that cannot be explained by simple modular rearrangements contain a significant amount of repeat domains that occur in complex patterns which we term "supra-repeats". Furthermore, these arrangements are significantly longer than those with a single-step rearrangement solution, suggesting that such arrangements may result from multi-step events. In summary, our analysis provides an integrated view and initial quantification of the patterns and functional impact of modular protein evolution in a well resolved phylogenetic tree. This article is part of a Special Issue entitled: The emerging dynamic view of proteins: Protein plasticity in allostery, evolution and self-assembly.
Collapse
Affiliation(s)
- Andrew D Moore
- Institute for Evolution and Biodiversity, Münster, Germany
| | | | | | | | | |
Collapse
|
75
|
Residue mutations and their impact on protein structure and function: detecting beneficial and pathogenic changes. Biochem J 2013; 449:581-94. [DOI: 10.1042/bj20121221] [Citation(s) in RCA: 131] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The present review focuses on the evolution of proteins and the impact of amino acid mutations on function from a structural perspective. Proteins evolve under the law of natural selection and undergo alternating periods of conservative evolution and of relatively rapid change. The likelihood of mutations being fixed in the genome depends on various factors, such as the fitness of the phenotype or the position of the residues in the three-dimensional structure. For example, co-evolution of residues located close together in three-dimensional space can occur to preserve global stability. Whereas point mutations can fine-tune the protein function, residue insertions and deletions (‘decorations’ at the structural level) can sometimes modify functional sites and protein interactions more dramatically. We discuss recent developments and tools to identify such episodic mutations, and examine their applications in medical research. Such tools have been tested on simulated data and applied to real data such as viruses or animal sequences. Traditionally, there has been little if any cross-talk between the fields of protein biophysics, protein structure–function and molecular evolution. However, the last several years have seen some exciting developments in combining these approaches to obtain an in-depth understanding of how proteins evolve. For example, a better understanding of how structural constraints affect protein evolution will greatly help us to optimize our models of sequence evolution. The present review explores this new synthesis of perspectives.
Collapse
|
76
|
Koide S, Huang J. Generation of high-performance binding proteins for peptide motifs by affinity clamping. Methods Enzymol 2013; 523:285-302. [PMID: 23422435 DOI: 10.1016/b978-0-12-394292-0.00013-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
We describe concepts and methodologies for generating "Affinity Clamps," a new class of recombinant binding proteins that achieve high affinity and high specificity toward short peptide motifs of biological importance, which is a major challenge in protein engineering. The Affinity Clamping concept exploits the potential of nonhomologous recombination of protein domains in generating large changes in protein function and the inherent binding affinity and specificity of the so-called modular interaction domains toward short peptide motifs. Affinity Clamping creates a clamshell architecture that clamps onto a target peptide. The design processes involve (i) choosing a starting modular interaction domain appropriate for the target and applying structure-guided modifications; (ii) attaching a second domain, termed "enhancer domain"; and (iii) optimizing the peptide-binding site located between the domains by directed evolution. The two connected domains work synergistically to achieve high levels of affinity and specificity that are unattainable with either domain alone. Because of the simple and modular architecture, Affinity Clamps are particularly well suited as building blocks for designing more complex functionalities. Affinity Clamping represents a major advance in protein design that is broadly applicable to the recognition of peptide motifs.
Collapse
Affiliation(s)
- Shohei Koide
- Department of Biochemistry and Molecular Biology, The University of Chicago, Chicago, Illinois, USA.
| | | |
Collapse
|
77
|
Furnham N, Laskowski RA, Thornton JM. Abstracting knowledge from the protein data bank. Biopolymers 2012; 99:183-8. [DOI: 10.1002/bip.22107] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2012] [Accepted: 05/25/2012] [Indexed: 12/27/2022]
|
78
|
Meinhardt S, Manley MW, Becker NA, Hessman JA, Maher LJ, Swint-Kruse L. Novel insights from hybrid LacI/GalR proteins: family-wide functional attributes and biologically significant variation in transcription repression. Nucleic Acids Res 2012; 40:11139-54. [PMID: 22965134 PMCID: PMC3505978 DOI: 10.1093/nar/gks806] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
LacI/GalR transcription regulators have extensive, non-conserved interfaces between their regulatory domains and the 18 amino acids that serve as ‘linkers’ to their DNA-binding domains. These non-conserved interfaces might contribute to functional differences between paralogs. Previously, two chimeras created by domain recombination displayed novel functional properties. Here, we present a synthetic protein family, which was created by joining the LacI DNA-binding domain/linker to seven additional regulatory domains. Despite ‘mismatched’ interfaces, chimeras maintained allosteric response to their cognate effectors. Therefore, allostery in many LacI/GalR proteins does not require interfaces with precisely matched interactions. Nevertheless, the chimeric interfaces were not silent to mutagenesis, and preliminary comparisons suggest that the chimeras provide an ideal context for systematically exploring functional contributions of non-conserved positions. DNA looping experiments revealed higher order (dimer–dimer) oligomerization in several chimeras, which might be possible for the natural paralogs. Finally, the biological significance of repression differences was determined by measuring bacterial growth rates on lactose minimal media. Unexpectedly, moderate and strong repressors showed an apparent induction phase, even though inducers were not provided; therefore, an unknown mechanism might contribute to regulation of the lac operon. Nevertheless, altered growth correlated with altered repression, which indicates that observed functional modifications are significant.
Collapse
Affiliation(s)
- Sarah Meinhardt
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, KS 66160, USA
| | | | | | | | | | | |
Collapse
|
79
|
Leclère L, Rentzsch F. Repeated evolution of identical domain architecture in metazoan netrin domain-containing proteins. Genome Biol Evol 2012; 4:883-99. [PMID: 22813778 PMCID: PMC3516229 DOI: 10.1093/gbe/evs061] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2012] [Indexed: 12/13/2022] Open
Abstract
The majority of proteins in eukaryotes are composed of multiple domains, and the number and order of these domains is an important determinant of protein function. Although multidomain proteins with a particular domain architecture were initially considered to have a common evolutionary origin, recent comparative studies of protein families or whole genomes have reported that a minority of multidomain proteins could have appeared multiple times independently. Here, we test this scenario in detail for the signaling molecules netrin and secreted frizzled-related proteins (sFRPs), two groups of netrin domain-containing proteins with essential roles in animal development. Our primary phylogenetic analyses suggest that the particular domain architectures of each of these proteins were present in the eumetazoan ancestor and evolved a second time independently within the metazoan lineage from laminin and frizzled proteins, respectively. Using an array of phylogenetic methods, statistical tests, and character sorting analyses, we show that the polyphyly of netrin and sFRP is well supported and cannot be explained by classical phylogenetic reconstruction artifacts. Despite their independent origins, the two groups of netrins and of sFRPs have the same protein interaction partners (Deleted in Colorectal Cancer/neogenin and Unc5 for netrins and Wnts for sFRPs) and similar developmental functions. Thus, these cases of convergent evolution emphasize the importance of domain architecture for protein function by uncoupling shared domain architecture from shared evolutionary history. Therefore, we propose the terms merology to describe the repeated evolution of proteins with similar domain architecture and discuss the potential of merologous proteins to help understanding protein evolution.
Collapse
Affiliation(s)
- Lucas Leclère
- Sars International Centre for Marine Molecular Biology, University of Bergen, Norway.
| | | |
Collapse
|
80
|
Lei L, Zhou SL, Ma H, Zhang LS. Expansion and diversification of the SET domain gene family following whole-genome duplications in Populus trichocarpa. BMC Evol Biol 2012; 12:51. [PMID: 22497662 PMCID: PMC3402991 DOI: 10.1186/1471-2148-12-51] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2011] [Accepted: 04/12/2012] [Indexed: 01/03/2023] Open
Abstract
Background Histone lysine methylation modifies chromatin structure and regulates eukaryotic gene transcription and a variety of developmental and physiological processes. SET domain proteins are lysine methyltransferases containing the evolutionarily-conserved SET domain, which is known to be the catalytic domain. Results We identified 59 SET genes in the Populus genome. Phylogenetic analyses of 106 SET genes from Populus and Arabidopsis supported the clustering of SET genes into six distinct subfamilies and identified 19 duplicated gene pairs in Populus. The chromosome locations of these gene pairs and the distribution of synonymous substitution rates showed that the expansion of the SET gene family might be caused by large-scale duplications in Populus. Comparison of gene structures and domain architectures of each duplicate pair indicated that divergence took place at the 3'- and 5'-terminal transcribed regions and at the N- and C-termini of the predicted proteins, respectively. Expression profile analysis of Populus SET genes suggested that most Populus SET genes were expressed widely, many with the highest expression in young leaves. In particular, the expression profiles of 12 of the 19 duplicated gene pairs fell into two types of expression patterns. Conclusions The 19 duplicated SET genes could have originated from whole genome duplication events. The differences in SET gene structure, domain architecture, and expression profiles in various tissues of Populus suggest that members of the SET gene family have a variety of developmental and physiological functions. Our study provides clues about the evolution of epigenetic regulation of chromatin structure and gene expression.
Collapse
Affiliation(s)
- Li Lei
- 1State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, the Chinese Academy of Sciences, Beijing 100093, China
| | | | | | | |
Collapse
|
81
|
Furnham N, Sillitoe I, Holliday GL, Cuff AL, Laskowski RA, Orengo CA, Thornton JM. Exploring the evolution of novel enzyme functions within structurally defined protein superfamilies. PLoS Comput Biol 2012; 8:e1002403. [PMID: 22396634 PMCID: PMC3291543 DOI: 10.1371/journal.pcbi.1002403] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2011] [Accepted: 01/09/2012] [Indexed: 11/18/2022] Open
Abstract
In order to understand the evolution of enzyme reactions and to gain an overview of biological catalysis we have combined sequence and structural data to generate phylogenetic trees in an analysis of 276 structurally defined enzyme superfamilies, and used these to study how enzyme functions have evolved. We describe in detail the analysis of two superfamilies to illustrate different paradigms of enzyme evolution. Gathering together data from all the superfamilies supports and develops the observation that they have all evolved to act on a diverse set of substrates, whilst the evolution of new chemistry is much less common. Despite that, by bringing together so much data, we can provide a comprehensive overview of the most common and rare types of changes in function. Our analysis demonstrates on a larger scale than previously studied, that modifications in overall chemistry still occur, with all possible changes at the primary level of the Enzyme Commission (E.C.) classification observed to a greater or lesser extent. The phylogenetic trees map out the evolutionary route taken within a superfamily, as well as all the possible changes within a superfamily. This has been used to generate a matrix of observed exchanges from one enzyme function to another, revealing the scale and nature of enzyme evolution and that some types of exchanges between and within E.C. classes are more prevalent than others. Surprisingly a large proportion (71%) of all known enzyme functions are performed by this relatively small set of 276 superfamilies. This reinforces the hypothesis that relatively few ancient enzymatic domain superfamilies were progenitors for most of the chemistry required for life. Enzymes, as biological catalysts, are crucial to life. Understanding how enzymes have evolved to perform the wide variety of reactions found across all kingdoms of life is fundamental to a broad range of biological studies, especially those leading to new therapeutics. To unravel the evolution of novel enzyme function requires combining information on protein structure, sequence, phylogeny and chemistry (in terms of interacting small molecules and reaction mechanisms). We have developed a protocol for integrating this wide range of data, which we have applied to a relatively large number of families comprising some very diverse relatives. This has permitted us to present an initial overview of the evolution of novel enzyme functions, in which we observe that some changes in function between relatives are more common than others, with most of the functionality observed in nature confined to relatively few families. Moreover, we are able to identify the evolutionary route taken within a superfamily to change the enzyme function from one reaction to another. This information may help in predicting the function of an enzyme that has yet to be experimentally characterised as well as in designing new enzymes for industrial and medical purposes.
Collapse
Affiliation(s)
- Nicholas Furnham
- EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.
| | | | | | | | | | | | | |
Collapse
|
82
|
Olvera C, Centeno-Leija S, Ruiz-Leyva P, López-Munguía A. Design of chimeric levansucrases with improved transglycosylation activity. Appl Environ Microbiol 2012; 78:1820-5. [PMID: 22247149 PMCID: PMC3298123 DOI: 10.1128/aem.07222-11] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2011] [Accepted: 12/19/2011] [Indexed: 11/20/2022] Open
Abstract
Fructansucrases (FSs), including levansucrases and inulosucrases, are enzymes that synthesize fructose polymers from sucrose by the direct transfer of the fructosyl moiety to a growing polymer chain. These enzymes, particularly the single domain fructansucrases, also possess an important hydrolytic activity, which may account for as much as 70 to 80% of substrate conversion, depending on reaction conditions. Here, we report the construction of four chimeric levansucrases from SacB, a single domain levansucrase produced by Bacillus subtilis. Based on observations derived from the effect of domain deletion in both multidomain fructansucrases and glucansucrases, we attached different extensions to SacB. These extensions included the transitional domain and complete C-terminal domain of Leuconostoc citreum inulosucrase (IslA), Leuconostoc mesenteroides levansucrase (LevC), and a L. mesenteroides glucansucrase (DsrP). It was found that in some cases the hydrolytic activity was reduced to less than 10% of substrate conversion; however, all of the constructs were as stable as SacB. This shift in enzyme specificity was observed even when the SacB catalytic domain was extended only with the transitional region found in multidomain FSs. Specific kinetic analysis revealed that this change in specificity of the SacB chimeric constructs was derived from a 5-fold increase in the transfructosylation k(cat) and not from a reduction of the hydrolytic k(cat), which remained constant.
Collapse
Affiliation(s)
- Clarita Olvera
- Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | | | | | | |
Collapse
|
83
|
Kinjo AR, Nakamura H. Composite structural motifs of binding sites for delineating biological functions of proteins. PLoS One 2012; 7:e31437. [PMID: 22347478 PMCID: PMC3275580 DOI: 10.1371/journal.pone.0031437] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2011] [Accepted: 01/08/2012] [Indexed: 11/19/2022] Open
Abstract
Most biological processes are described as a series of interactions between proteins and other molecules, and interactions are in turn described in terms of atomic structures. To annotate protein functions as sets of interaction states at atomic resolution, and thereby to better understand the relation between protein interactions and biological functions, we conducted exhaustive all-against-all atomic structure comparisons of all known binding sites for ligands including small molecules, proteins and nucleic acids, and identified recurring elementary motifs. By integrating the elementary motifs associated with each subunit, we defined composite motifs that represent context-dependent combinations of elementary motifs. It is demonstrated that function similarity can be better inferred from composite motif similarity compared to the similarity of protein sequences or of individual binding sites. By integrating the composite motifs associated with each protein function, we define meta-composite motifs each of which is regarded as a time-independent diagrammatic representation of a biological process. It is shown that meta-composite motifs provide richer annotations of biological processes than sequence clusters. The present results serve as a basis for bridging atomic structures to higher-order biological phenomena by classification and integration of binding site structures.
Collapse
Affiliation(s)
- Akira R Kinjo
- Institute for Protein Research, Osaka University, Suita, Osaka, Japan.
| | | |
Collapse
|
84
|
Furnham N, Sillitoe I, Holliday GL, Cuff AL, Rahman SA, Laskowski RA, Orengo CA, Thornton JM. FunTree: a resource for exploring the functional evolution of structurally defined enzyme superfamilies. Nucleic Acids Res 2012; 40:D776-82. [PMID: 22006843 PMCID: PMC3245072 DOI: 10.1093/nar/gkr852] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2011] [Accepted: 09/24/2011] [Indexed: 11/12/2022] Open
Abstract
FunTree is a new resource that brings together sequence, structure, phylogenetic, chemical and mechanistic information for structurally defined enzyme superfamilies. Gathering together this range of data into a single resource allows the investigation of how novel enzyme functions have evolved within a structurally defined superfamily as well as providing a means to analyse trends across many superfamilies. This is done not only within the context of an enzyme's sequence and structure but also the relationships of their reactions. Developed in tandem with the CATH database, it currently comprises 276 superfamilies covering ~1800 (70%) of sequence assigned enzyme reactions. Central to the resource are phylogenetic trees generated from structurally informed multiple sequence alignments using both domain structural alignments supplemented with domain sequences and whole sequence alignments based on commonality of multi-domain architectures. These trees are decorated with functional annotations such as metabolite similarity as well as annotations from manually curated resources such the catalytic site atlas and MACiE for enzyme mechanisms. The resource is freely available through a web interface: www.ebi.ac.uk/thorton-srv/databases/FunTree.
Collapse
Affiliation(s)
- Nicholas Furnham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | | | | | | | | | |
Collapse
|
85
|
Pérez-Nueno VI, Ritchie DW. Identifying and characterizing promiscuous targets: implications for virtual screening. Expert Opin Drug Discov 2011; 7:1-17. [PMID: 22468890 DOI: 10.1517/17460441.2011.632406] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
INTRODUCTION Ligand-based shape matching approaches have become established as important and popular virtual screening (VS) techniques. However, despite their relative success, the question of how to best choose the initial query compounds and their conformations remains largely unsolved. This issue gains importance when dealing with promiscuous targets, that is, proteins that bind multiple ligand scaffold families in one or more binding site. Conventional shape matching VS approaches assume that there is only one binding mode for a given protein target. This may be true for some targets, but it is certainly not true in all cases. Several recent studies have shown that some protein targets bind to different ligands in different ways. AREAS COVERED The authors discuss the concept of promiscuity in the context of virtual drug screening, and present and analyze several examples of promiscuous targets. The article also reports on the impact of the query conformation on the performance of shape-based VS and the potential to improve VS performance by using consensus shape clustering techniques. EXPERT OPINION The notion of polypharmacology is becoming highly relevant in drug discovery. Understanding and exploiting promiscuity present challenges and opportunities for drug discovery endeavors. The examples of promiscuity presented here suggest that promiscuous targets and ligands are much more common than previously assumed, and this should be taken into account in practical VS protocols. Although some progress has been made, there is a need to develop more sophisticated computational techniques and protocols that can identify and characterize promiscuous targets on a genomic scale.
Collapse
|
86
|
|
87
|
Rogers RL, Hartl DL. Chimeric genes as a source of rapid evolution in Drosophila melanogaster. Mol Biol Evol 2011; 29:517-29. [PMID: 21771717 DOI: 10.1093/molbev/msr184] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Chimeric genes form through the combination of portions of existing coding sequences to create a new open reading frame. These new genes can create novel protein structures that are likely to serve as a strong source of novelty upon which selection can act. We have identified 14 chimeric genes that formed through DNA-level mutations in Drosophila melanogaster, and we investigate expression profiles, domain structures, and population genetics for each of these genes to examine their potential to effect adaptive evolution. We find that chimeric gene formation commonly produces mid-domain breaks and unites portions of wholly unrelated peptides, creating novel protein structures that are entirely distinct from other constructs in the genome. These new genes are often involved in selective sweeps. We further find a disparity between chimeric genes that have recently formed and swept to fixation versus chimeric genes that have been preserved over long periods of time, suggesting that preservation and adaptation are distinct processes. Finally, we demonstrate that chimeric gene formation can produce qualitative expression changes that are difficult to mimic through duplicate gene formation, and that extremely young chimeric genes (d(S) < 0.03) are more likely to be associated with selective sweeps than duplicate genes of the same age. Hence, chimeric genes can serve as an exceptional source of genetic novelty that can have a profound influence on adaptive evolution in D. melanogaster.
Collapse
Affiliation(s)
- Rebekah L Rogers
- Department of Organismic and Evolutionary Biology, Harvard University, USA.
| | | |
Collapse
|
88
|
Meng EC, Babbitt PC. Topological variation in the evolution of new reactions in functionally diverse enzyme superfamilies. Curr Opin Struct Biol 2011; 21:391-7. [PMID: 21458983 PMCID: PMC3551608 DOI: 10.1016/j.sbi.2011.03.007] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2011] [Revised: 03/05/2011] [Accepted: 03/09/2011] [Indexed: 10/18/2022]
Abstract
In functionally diverse enzyme superfamilies (SFs), conserved structural and active site features reflect catalytic capabilities 'hard-wired' in each SF architecture. Overlaid on this foundation, evolutionary changes in active site machinery, structural topology and other aspects of structural organization and interactions support the emergence of new reactions, mechanisms, and substrate specificity. This review connects topological with functional variation in each of the haloalkanoic acid dehalogenase (HAD) and vicinal oxygen chelate fold (VOC) SFs and a set of redox-active thioredoxin (Trx)-fold SFs to illustrate a few of the varied themes nature has used to evolve new functions from a limited set of structural scaffolds.
Collapse
Affiliation(s)
- Elaine C. Meng
- Department of Pharmaceutical Chemistry, University of California, M/S 2240, 600 16th Street, San Francisco, CA 94158-2517, USA,
| | - Patricia C. Babbitt
- Department of Pharmaceutical Chemistry, University of California, M/S 2240, 600 16th Street, San Francisco, CA 94158-2517, USA,
- Department of Bioengineering and Therapeutic Sciences, University of California, M/S 2250, 1700 4 Street, San Francisco, CA 94158-2330, USA
- California Institute for Quantitative Biosciences, University of California, San Francisco
| |
Collapse
|
89
|
Arbuckle JL, Rahman NS, Zhao S, Rodgers W, Rodgers KK. Elucidating the domain architecture and functions of non-core RAG1: the capacity of a non-core zinc-binding domain to function in nuclear import and nucleic acid binding. BMC BIOCHEMISTRY 2011; 12:23. [PMID: 21599978 PMCID: PMC3124419 DOI: 10.1186/1471-2091-12-23] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2011] [Accepted: 05/20/2011] [Indexed: 12/19/2022]
Abstract
Background The repertoire of the antigen-binding receptors originates from the rearrangement of immunoglobulin and T-cell receptor genetic loci in a process known as V(D)J recombination. The initial site-specific DNA cleavage steps of this process are catalyzed by the lymphoid specific proteins RAG1 and RAG2. The majority of studies on RAG1 and RAG2 have focused on the minimal, core regions required for catalytic activity. Though not absolutely required, non-core regions of RAG1 and RAG2 have been shown to influence the efficiency and fidelity of the recombination reaction. Results Using a partial proteolysis approach in combination with bioinformatics analyses, we identified the domain boundaries of a structural domain that is present in the 380-residue N-terminal non-core region of RAG1. We term this domain the Central Non-core Domain (CND; residues 87-217). Conclusions We show how the CND alone, and in combination with other regions of non-core RAG1, functions in nuclear localization, zinc coordination, and interactions with nucleic acid. Together, these results demonstrate the multiple roles that the non-core region can play in the function of the full length protein.
Collapse
Affiliation(s)
- Janeen L Arbuckle
- Department of Biochemistry and Molecular Biology, The University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma 73190, USA
| | | | | | | | | |
Collapse
|
90
|
Dessailly BH, Redfern OC, Cuff AL, Orengo CA. Detailed analysis of function divergence in a large and diverse domain superfamily: toward a refined protocol of function classification. Structure 2011; 18:1522-35. [PMID: 21070951 DOI: 10.1016/j.str.2010.08.017] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2010] [Revised: 08/06/2010] [Accepted: 08/13/2010] [Indexed: 10/18/2022]
Abstract
Some superfamilies contain large numbers of protein domains with very different functions. The ability to refine the functional classification of domains within these superfamilies is necessary for better understanding the evolution of functions and to guide function prediction of new relatives. To achieve this, a suitable starting point is the detailed analysis of functional divisions and mechanisms of functional divergence in a single superfamily. Here, we present such a detailed analysis in the superfamily of HUP domains. A biologically meaningful functional classification of HUP domains is obtained manually. Mechanisms of function diversification are investigated in detail using this classification. We observe that structural motifs play an important role in shaping broad functional divergence, whereas residue-level changes shape diversity at a more specific level. In parallel we examine the ability of an automated protocol to capture the biologically meaningful classification, with a view to automatically extending this classification in the future.
Collapse
Affiliation(s)
- Benoit H Dessailly
- Department of Structural and Molecular Biology, University College of London, Gower Street, London WC1E6BT, UK.
| | | | | | | |
Collapse
|
91
|
Seidl MF, Van den Ackerveken G, Govers F, Snel B. A domain-centric analysis of oomycete plant pathogen genomes reveals unique protein organization. PLANT PHYSIOLOGY 2011; 155:628-644. [PMID: 21119047 PMCID: PMC3032455 DOI: 10.1104/pp.110.167841] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2010] [Accepted: 11/24/2010] [Indexed: 05/29/2023]
Abstract
Oomycetes comprise a diverse group of organisms that morphologically resemble fungi but belong to the stramenopile lineage within the supergroup of chromalveolates. Recent studies have shown that plant pathogenic oomycetes have expanded gene families that are possibly linked to their pathogenic lifestyle. We analyzed the protein domain organization of 67 eukaryotic species including four oomycete and five fungal plant pathogens. We detected 246 expanded domains in fungal and oomycete plant pathogens. The analysis of genes differentially expressed during infection revealed a significant enrichment of genes encoding expanded domains as well as signal peptides linking a substantial part of these genes to pathogenicity. Overrepresentation and clustering of domain abundance profiles revealed domains that might have important roles in host-pathogen interactions but, as yet, have not been linked to pathogenicity. The number of distinct domain combinations (bigrams) in oomycetes was significantly higher than in fungi. We identified 773 oomycete-specific bigrams, with the majority composed of domains common to eukaryotes. The analyses enabled us to link domain content to biological processes such as host-pathogen interaction, nutrient uptake, or suppression and elicitation of plant immune responses. Taken together, this study represents a comprehensive overview of the domain repertoire of fungal and oomycete plant pathogens and points to novel features like domain expansion and species-specific bigram types that could, at least partially, explain why oomycetes are such remarkable plant pathogens.
Collapse
Affiliation(s)
- Michael F Seidl
- Theoretical Biology and Bioinformatics , Department of Biology, Utrecht University, 3584 CH Utrecht, The Netherlands.
| | | | | | | |
Collapse
|
92
|
Guo M, Yang XL, Schimmel P. New functions of aminoacyl-tRNA synthetases beyond translation. Nat Rev Mol Cell Biol 2010; 11:668-74. [PMID: 20700144 DOI: 10.1038/nrm2956] [Citation(s) in RCA: 255] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Over the course of evolution, eukaryotic aminoacyl-tRNA synthetases (aaRSs) progressively incorporated domains and motifs that have no essential connection to aminoacylation reactions. Their accretive addition to virtually all aaRSs correlates with the progressive evolution and complexity of eukaryotes. Based on recent experimental findings focused on a few of these additions and analysis of the aaRS proteome, we propose that they are markers for aaRS-associated functions beyond translation.
Collapse
Affiliation(s)
- Min Guo
- Min Guo, Xiang-Lei Yang and Paul Schimmel are at The Skaggs Institute for Chemical Biology and Department of Molecular Biology, The Scripps Research Institute, La Jolla, California 92037, USA
| | | | | |
Collapse
|
93
|
Schlessinger A, Matsson P, Shima JE, Pieper U, Yee SW, Kelly L, Apeltsin L, Stroud RM, Ferrin TE, Giacomini KM, Sali A. Comparison of human solute carriers. Protein Sci 2010; 19:412-28. [PMID: 20052679 DOI: 10.1002/pro.320] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Solute carriers are eukaryotic membrane proteins that control the uptake and efflux of solutes, including essential cellular compounds, environmental toxins, and therapeutic drugs. Solute carriers can share similar structural features despite weak sequence similarities. Identification of sequence relationships among solute carriers is needed to enhance our ability to model individual carriers and to elucidate the molecular mechanisms of their substrate specificity and transport. Here, we describe a comprehensive comparison of solute carriers. We link the proteins using sensitive profile-profile alignments and two classification approaches, including similarity networks. The clusters are analyzed in view of substrate type, transport mode, organism conservation, and tissue specificity. Solute carrier families with similar substrates generally cluster together, despite exhibiting relatively weak sequence similarities. In contrast, some families cluster together with no apparent reason, revealing unexplored relationships. We demonstrate computationally and experimentally the functional overlap between representative members of these families. Finally, we identify four putative solute carriers in the human genome. The solute carriers include a biomedically important group of membrane proteins that is diverse in sequence and structure. The proposed classification of solute carriers, combined with experiment, reveals new relationships among the individual families and identifies new solute carriers. The classification scheme will inform future attempts directed at modeling the structures of the solute carriers, a prerequisite for describing the substrate specificities of the individual families.
Collapse
Affiliation(s)
- Avner Schlessinger
- Department of Bioengineering and Therapeutic Sciences, California Institute for Quantitative Biosciences, University of California, San Francisco, California.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
94
|
Abstract
In this paper we provide an overview of our current knowledge of the mapping between small molecule ligands and protein domains. We give an overview of the present data resources available on the Web, which provide information about protein-ligand interactions, as well as discussing our own PROCOGNATE database. We present an update of ligand binding in large protein superfamilies and identify those ligands most frequently utilized by nature. Finally we discuss potential uses for this type of data.
Collapse
Affiliation(s)
- Matthew Bashton
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.
| | | |
Collapse
|
95
|
Peisajovich SG, Garbarino JE, Wei P, Lim WA. Rapid diversification of cell signaling phenotypes by modular domain recombination. Science 2010; 328:368-72. [PMID: 20395511 DOI: 10.1126/science.1182376] [Citation(s) in RCA: 122] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Cell signaling proteins are often modular, containing distinct catalytic and regulatory domains. Recombination of such biological modules has been proposed to be a major source of evolutionary innovation. We systematically analyzed the phenotypic diversity of a signaling response that results from domain recombination by using 11 proteins in the yeast mating pathway to construct a library of 66 chimeric domain recombinants. Domain recombination resulted in greater diversity in pathway response dynamics than did duplication of genes, of single domains, or of two unlinked domains. Domain recombination also led to changes in mating phenotype, including recombinants with increased mating efficiency over the wild type. Thus, novel linkages between preexisting domains may have a major role in the evolution of protein networks and novel phenotypic behaviors.
Collapse
Affiliation(s)
- Sergio G Peisajovich
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, 600 16th Street, San Francisco, CA 94158, USA
| | | | | | | |
Collapse
|
96
|
Tamuri AU, Laskowski RA. ArchSchema: a tool for interactive graphing of related Pfam domain architectures. Bioinformatics 2010; 26:1260-1. [DOI: 10.1093/bioinformatics/btq119] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
97
|
Almonacid DE, Yera ER, Mitchell JBO, Babbitt PC. Quantitative comparison of catalytic mechanisms and overall reactions in convergently evolved enzymes: implications for classification of enzyme function. PLoS Comput Biol 2010; 6:e1000700. [PMID: 20300652 PMCID: PMC2837397 DOI: 10.1371/journal.pcbi.1000700] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2009] [Accepted: 02/02/2010] [Indexed: 11/19/2022] Open
Abstract
Functionally analogous enzymes are those that catalyze similar reactions on similar substrates but do not share common ancestry, providing a window on the different structural strategies nature has used to evolve required catalysts. Identification and use of this information to improve reaction classification and computational annotation of enzymes newly discovered in the genome projects would benefit from systematic determination of reaction similarities. Here, we quantified similarity in bond changes for overall reactions and catalytic mechanisms for 95 pairs of functionally analogous enzymes (non-homologous enzymes with identical first three numbers of their EC codes) from the MACiE database. Similarity of overall reactions was computed by comparing the sets of bond changes in the transformations from substrates to products. For similarity of mechanisms, sets of bond changes occurring in each mechanistic step were compared; these similarities were then used to guide global and local alignments of mechanistic steps. Using this metric, only 44% of pairs of functionally analogous enzymes in the dataset had significantly similar overall reactions. For these enzymes, convergence to the same mechanism occurred in 33% of cases, with most pairs having at least one identical mechanistic step. Using our metric, overall reaction similarity serves as an upper bound for mechanistic similarity in functional analogs. For example, the four carbon-oxygen lyases acting on phosphates (EC 4.2.3) show neither significant overall reaction similarity nor significant mechanistic similarity. By contrast, the three carboxylic-ester hydrolases (EC 3.1.1) catalyze overall reactions with identical bond changes and have converged to almost identical mechanisms. The large proportion of enzyme pairs that do not show significant overall reaction similarity (56%) suggests that at least for the functionally analogous enzymes studied here, more stringent criteria could be used to refine definitions of EC sub-subclasses for improved discrimination in their classification of enzyme reactions. The results also indicate that mechanistic convergence of reaction steps is widespread, suggesting that quantitative measurement of mechanistic similarity can inform approaches for functional annotation.
Collapse
Affiliation(s)
- Daniel E. Almonacid
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, United States of America
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California, United States of America
- California Institute for Quantitative Biosciences, University of California San Francisco, San Francisco, California, United States of America
| | - Emmanuel R. Yera
- Biological and Medical Informatics Graduate Program, University of California San Francisco, San Francisco, California, United States of America
| | - John B. O. Mitchell
- Centre for Biomolecular Sciences, University of St Andrews, St Andrews, United Kingdom
| | - Patricia C. Babbitt
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, United States of America
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California, United States of America
- California Institute for Quantitative Biosciences, University of California San Francisco, San Francisco, California, United States of America
| |
Collapse
|
98
|
Ko J, Ryu KS, Kim H, Shin JS, Lee JO, Cheong C, Choi BS. Structure of PP4397 reveals the molecular basis for different c-di-GMP binding modes by Pilz domain proteins. J Mol Biol 2010; 398:97-110. [PMID: 20226196 DOI: 10.1016/j.jmb.2010.03.007] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2009] [Revised: 03/03/2010] [Accepted: 03/03/2010] [Indexed: 10/19/2022]
Abstract
Cyclic diguanylate (c-di-GMP) is a global regulator that modulates pathogen virulence and biofilm formation in bacteria. Although a bioinformatic study revealed that PilZ domain proteins are the long-sought c-di-GMP binding proteins, the mechanism by which c-di-GMP regulates them is uncertain. Pseudomonas putida PP4397 is one such protein that contains YcgR-N and PilZ domains and the apo-PP4397 structure was solved earlier by the Joint Center for Structural Genomics. We determined the crystal structure of holo-PP4397 and found that two intercalated c-di-GMPs fit into the junction of its YcgR-N and PilZ domains. Moreover, c-di-GMP binding induces PP4397 to undergo a dimer-to-monomer transition. Interestingly, another PilZ domain protein, VCA0042, binds to a single molecule of c-di-GMP, and both its apo and holo forms are dimeric. Mutational studies and the additional crystal structure of holo-VCA0042 (L135R) showed that the Arg122 residue of PP4397 is crucial for the recognition of two molecules of c-di-GMP. Thus, PilZ domain proteins exhibit different c-di-GMP binding stoichiometry and quaternary structure, and these differences are expected to play a role in generating diverse forms of c-di-GMP-mediated regulation.
Collapse
Affiliation(s)
- Junsang Ko
- Department of Chemistry, KAIST, Gusong-dong 373-1, Yuseong-gu, Daejeon 305-701, South Korea
| | | | | | | | | | | | | |
Collapse
|
99
|
Peng RH, Xiong AS, Xue Y, Fu XY, Gao F, Zhao W, Tian YS, Yao QH. A profile of ring-hydroxylating oxygenases that degrade aromatic pollutants. REVIEWS OF ENVIRONMENTAL CONTAMINATION AND TOXICOLOGY 2010; 206:65-94. [PMID: 20652669 DOI: 10.1007/978-1-4419-6260-7_4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Numerous aromatic compounds are pollutants to which exposure exists or is possible, and are of concern because they are mutagenic, carcinogenic, or display other toxic characteristics. Depending on the types of dioxygenation reactions of which microorganisms are capable, they utilize ring-hydroxylating oxygenases (RHOs) to initiate the degradation and detoxification of such aromatic compound pollutants. Gene families encoding for RHOs appear to be most common in bacteria. Oxygenases are important in degrading both natural and synthetic aromatic compounds and are particularly important for their role in degrading toxic pollutants; for this reason, it is useful for environmental scientists and others to understand more of their characteristics and capabilities. It is the purpose of this review to address RHOs and to describe much of their known character, starting with a review as to how RHOs are classified. A comprehensive phylogenetic analysis has revealed that all RHOs are, in some measure, related, presumably by divergent evolution from a common ancestor, and this is reflected in how they are classified. After we describe RHO classification schemes, we address the relationship between RHO structure and function. Structural differences affect substrate specificity and product formation. In the alpha subunit of the known terminal oxygenase of RHOs, there is a catalytic domain with a mononuclear iron center that serves as a substrate-binding site and a Rieske domain that retains a [2Fe-2S] cluster that acts as an entity of electron transfer for the mononuclear iron center. Oxygen activation and substrate dihydroxylation occurring at the catalytic domain are dependent on the binding of substrate at the active site and the redox state of the Rieske center. The electron transfer from NADH to the catalytic pocket of RHO and catalyzing mechanism of RHOs is depicted in our review and is based on the results of recent studies. Electron transfer involving the RHO system typically involves four steps: NADH-ferredoxin reductase receives two electrons from NADH; ferredoxin binds with NADH-ferredoxin reductase and accepts electron from it; the reduced ferredoxin dissociates from NADH-ferredoxin reductase and shuttles the electron to the Rieske domain of the terminal oxygenase; the Rieske cluster donates electrons to O2 through the mononuclear iron. On the basis of crystal structure studies, it has been proposed that the broad specificity of the RHOs results from the large size and specific topology of its hydrophobic substrate-binding pocket. Several amino acids that determine the substrate specificity and enantioselectivity of RHOs have been identified through sequence comparison and site-directed mutagenesis at the active site. Exploiting the crystal structure data and the available active site information, engineered RHO enzymes have been and can be designed to improve their capacity to degrade environmental pollutants. Such attempts to enhance degradation capabilities of RHOs have been made. Dioxygenases have been modified to improve the degradation capacities toward PCBs, PAHs, dioxins, and some other aromatic hydrocarbons. We hope that the results of this review and future research on enhancing RHOs will promote their expanded usage and effectiveness for successfully degrading environmental aromatic pollutants.
Collapse
Affiliation(s)
- Ri-He Peng
- Shanghai Key Laboratory of Agricultural Genetics and Breeding, Agro-Biotechnology Research Institute, Shanghai Academy of Agricultural Sciences, 2901 Beidi Rd, Shanghai, People's Republic of China
| | | | | | | | | | | | | | | |
Collapse
|
100
|
Koike R, Kidera A, Ota M. Alteration of oligomeric state and domain architecture is essential for functional transformation between transferase and hydrolase with the same scaffold. Protein Sci 2009; 18:2060-6. [PMID: 19670211 DOI: 10.1002/pro.218] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Transferases and hydrolases catalyze different chemical reactions and express different dynamic responses upon ligand binding. To insulate the ligand molecule from the surrounding water, transferases bury it inside the protein by closing the cleft, while hydrolases undergo a small conformational change and leave the ligand molecule exposed to the solvent. Despite these distinct ligand-binding modes, some transferases and hydrolases are homologous. To clarify how such different catalytic modes are possible with the same scaffold, we examined the solvent accessibility of ligand molecules for 15 SCOP superfamilies, each containing both transferase and hydrolase catalytic domains. In contrast to hydrolases, we found that nine superfamilies of transferases use two major strategies, oligomerization and domain fusion, to insulate the ligand molecules. The subunits and domains that were recruited by the transferases often act as a cover for the ligand molecule. The other strategies adopted by transferases to insulate the ligand molecule are the relocation of catalytic sites, the rearrangement of secondary structure elements, and the insertion of peripheral regions. These findings provide insights into how proteins have evolved and acquired distinct functions with a limited number of scaffolds.
Collapse
|