1
|
Bordin N, Scholes H, Rauer C, Roca-Martínez J, Sillitoe I, Orengo C. Clustering protein functional families at large scale with hierarchical approaches. Protein Sci 2024; 33:e5140. [PMID: 39145441 PMCID: PMC11325189 DOI: 10.1002/pro.5140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 07/22/2024] [Accepted: 07/24/2024] [Indexed: 08/16/2024]
Abstract
Proteins, fundamental to cellular activities, reveal their function and evolution through their structure and sequence. CATH functional families (FunFams) are coherent clusters of protein domain sequences in which the function is conserved across their members. The increasing volume and complexity of protein data enabled by large-scale repositories like MGnify or AlphaFold Database requires more powerful approaches that can scale to the size of these new resources. In this work, we introduce MARC and FRAN, two algorithms developed to build upon and address limitations of GeMMA/FunFHMMER, our original methods developed to classify proteins with related functions using a hierarchical approach. We also present CATH-eMMA, which uses embeddings or Foldseek distances to form relationship trees from distance matrices, reducing computational demands and handling various data types effectively. CATH-eMMA offers a highly robust and much faster tool for clustering protein functions on a large scale, providing a new tool for future studies in protein function and evolution.
Collapse
Affiliation(s)
- Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Harry Scholes
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Clemens Rauer
- Institute of Structural and Molecular Biology, University College London, London, UK
- Universidad Autonoma de Madrid, Ciudad Universitaria de Cantoblanco, Madrid, Spain
| | - Joel Roca-Martínez
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London, UK
| |
Collapse
|
2
|
Huh E, Agosto MA, Wensel TG, Lichtarge O. Coevolutionary signals in metabotropic glutamate receptors capture residue contacts and long-range functional interactions. J Biol Chem 2023; 299:103030. [PMID: 36806686 PMCID: PMC10060750 DOI: 10.1016/j.jbc.2023.103030] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 02/09/2023] [Accepted: 02/10/2023] [Indexed: 02/18/2023] Open
Abstract
Upon ligand binding to a G protein-coupled receptor, extracellular signals are transmitted into a cell through sets of residue interactions that translate ligand binding into structural rearrangements. These interactions needed for functions impose evolutionary constraints so that, on occasion, mutations in one position may be compensated by other mutations at functionally coupled positions. To quantify the impact of amino acid substitutions in the context of major evolutionary divergence in the G protein-coupled receptor subfamily of metabotropic glutamate receptors (mGluRs), we combined two phylogenetic-based algorithms, Evolutionary Trace and covariation Evolutionary Trace, to infer potential structure-function couplings and roles in mGluRs. We found a subset of evolutionarily important residues at known functional sites and evidence of coupling among distinct structural clusters in mGluR. In addition, experimental mutagenesis and functional assays confirmed that some highly covariant residues are coupled, revealing their synergy. Collectively, these findings inform a critical step toward understanding the molecular and structural basis of amino acid variation patterns within mGluRs and provide insight for drug development, protein engineering, and analysis of naturally occurring variants.
Collapse
Affiliation(s)
- Eunna Huh
- Department of Pharmacology and Chemical Biology, Baylor College of Medicine, Houston, Texas, USA
| | - Melina A Agosto
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, USA; Retina and Optic Nerve Research Laboratory, Department of Physiology and Biophysics, Dalhousie University, Halifax, Canada
| | - Theodore G Wensel
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, USA
| | - Olivier Lichtarge
- Department of Pharmacology and Chemical Biology, Baylor College of Medicine, Houston, Texas, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA.
| |
Collapse
|
3
|
Gu J, Xu Y, Nie Y. Role of distal sites in enzyme engineering. Biotechnol Adv 2023; 63:108094. [PMID: 36621725 DOI: 10.1016/j.biotechadv.2023.108094] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 11/15/2022] [Accepted: 01/01/2023] [Indexed: 01/06/2023]
Abstract
The limitations associated with natural enzyme catalysis have triggered the rise of the field of protein engineering. Traditional rational design was based on the analysis of protein structural information and catalytic mechanisms to identify key active sites or ligand binding sites to reshape the substrate pocket. The role and significance of functional sites in the active center have been studied extensively. With a deeper understanding of the structure-catalysis relationship map, the entire protein molecule can be filled with residues that play a substantial role in its structure and function. However, the catalytic mechanism underlying distal mutations remains unclear. The aim of this review was to highlight the criticality of the distal site in enzyme engineering based on the following three aspects: What can distal mutations exert on function from mutability landscape? How do distal sites influence enzyme function? How to predict and design distal mutations? This review provides insights into the catalytic mechanism of enzymes from the global interaction network, knowledge from sequence-structure-dynamics-function relationships, and strategies for distal mutation-based protein engineering.
Collapse
Affiliation(s)
- Jie Gu
- Lab of Brewing Microbiology and Applied Enzymology, School of Biotechnology and Key laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Yan Xu
- Lab of Brewing Microbiology and Applied Enzymology, School of Biotechnology and Key laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi 214122, China; State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi 214122, China
| | - Yao Nie
- Lab of Brewing Microbiology and Applied Enzymology, School of Biotechnology and Key laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi 214122, China; Suqian Industrial Technology Research Institute of Jiangnan University, Suqian 223814, China.
| |
Collapse
|
4
|
Kennedy EN, Foster CA, Barr SA, Bourret RB. General strategies for using amino acid sequence data to guide biochemical investigation of protein function. Biochem Soc Trans 2022; 50:1847-1858. [PMID: 36416676 PMCID: PMC10257402 DOI: 10.1042/bst20220849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 11/04/2022] [Accepted: 11/09/2022] [Indexed: 11/24/2022]
Abstract
The rapid increase of '-omics' data warrants the reconsideration of experimental strategies to investigate general protein function. Studying individual members of a protein family is likely insufficient to provide a complete mechanistic understanding of family functions, especially for diverse families with thousands of known members. Strategies that exploit large amounts of available amino acid sequence data can inspire and guide biochemical experiments, generating broadly applicable insights into a given family. Here we review several methods that utilize abundant sequence data to focus experimental efforts and identify features truly representative of a protein family or domain. First, coevolutionary relationships between residues within primary sequences can be successfully exploited to identify structurally and/or functionally important positions for experimental investigation. Second, functionally important variable residue positions typically occupy a limited sequence space, a property useful for guiding biochemical characterization of the effects of the most physiologically and evolutionarily relevant amino acids. Third, amino acid sequence variation within domains shared between different protein families can be used to sort a particular domain into multiple subtypes, inspiring further experimental designs. Although generally applicable to any kind of protein domain because they depend solely on amino acid sequences, the second and third approaches are reviewed in detail because they appear to have been used infrequently and offer immediate opportunities for new advances. Finally, we speculate that future technologies capable of analyzing and manipulating conserved and variable aspects of the three-dimensional structures of a protein family could lead to broad insights not attainable by current methods.
Collapse
Affiliation(s)
- Emily N. Kennedy
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| | - Clay A. Foster
- Department of Pediatrics, Section Hematology/Oncology, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, United States of America
| | - Sarah A. Barr
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| | - Robert B. Bourret
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| |
Collapse
|
5
|
Rauer C, Sen N, Waman VP, Abbasian M, Orengo CA. Computational approaches to predict protein functional families and functional sites. Curr Opin Struct Biol 2021; 70:108-122. [PMID: 34225010 DOI: 10.1016/j.sbi.2021.05.012] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 05/13/2021] [Accepted: 05/25/2021] [Indexed: 01/06/2023]
Abstract
Understanding the mechanisms of protein function is indispensable for many biological applications, such as protein engineering and drug design. However, experimental annotations are sparse, and therefore, theoretical strategies are needed to fill the gap. Here, we present the latest developments in building functional subclassifications of protein superfamilies and using evolutionary conservation to detect functional determinants, for example, catalytic-, binding- and specificity-determining residues important for delineating the functional families. We also briefly review other features exploited for functional site detection and new machine learning strategies for combining multiple features.
Collapse
Affiliation(s)
- Clemens Rauer
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Neeladri Sen
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Vaishali P Waman
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Mahnaz Abbasian
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK.
| |
Collapse
|
6
|
Lemberg MK, Strisovsky K. Maintenance of organellar protein homeostasis by ER-associated degradation and related mechanisms. Mol Cell 2021; 81:2507-2519. [PMID: 34107306 DOI: 10.1016/j.molcel.2021.05.004] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Revised: 04/14/2021] [Accepted: 05/05/2021] [Indexed: 12/19/2022]
Abstract
Protein homeostasis mechanisms are fundamentally important to match cellular needs and to counteract stress conditions. A fundamental challenge is to understand how defective proteins are recognized and extracted from cellular organelles to be degraded in the cytoplasm. The endoplasmic reticulum (ER)-associated degradation (ERAD) pathway is the best-understood organellar protein quality control system. Here, we review new insights into the mechanism of recognition and retrotranslocation of client proteins in ERAD. In addition to the membrane-integral ERAD E3 ubiquitin ligases, we highlight one protein family that is remarkably often involved in various aspects of membrane protein quality control and protein dislocation: the rhomboid superfamily, which includes derlins and intramembrane serine proteases. Rhomboid-like proteins have been found to control protein homeostasis in the ER, but also in other eukaryotic organelles and in bacteria, pointing toward conserved principles of membrane protein quality control across organelles and evolution.
Collapse
Affiliation(s)
- Marius K Lemberg
- Center for Molecular Biology of Heidelberg University (ZMBH), Im Neuenheimer Feld 282, 69120 Heidelberg, Germany; Center for Biochemistry, Medical Faculty, University of Cologne, Joseph-Stelzmann-Strasse 52, 50931 Cologne, Germany.
| | - Kvido Strisovsky
- Institute of Organic Chemistry and Biochemistry, Academy of Sciences of the Czech Republic, Prague, Czechia.
| |
Collapse
|
7
|
Phosphatidylglyerol Lipid Binding at the Active Site of an Intramembrane Protease. J Membr Biol 2020; 253:563-576. [PMID: 33210155 PMCID: PMC7688093 DOI: 10.1007/s00232-020-00152-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 11/04/2020] [Indexed: 10/25/2022]
Abstract
Transmembrane substrate cleavage by the small Escherichia coli rhomboid protease GlpG informs on mechanisms by which lipid interactions shape reaction coordinates of membrane-embedded enzymes. Here, I review and discuss new work on the molecular picture of protein-lipid interactions that might govern the formation of the substrate-enzyme complex in fluid lipid membranes. Negatively charged PG-type lipids are of particular interest, because they are a major component of bacterial membranes. Atomistic computer simulations indicate POPG and DOPG lipids bridge remote parts of GlpG and might pre-occupy the substrate-docking site. Inhibition of catalytic activity by PG lipids could arise from ligand-like lipid binding at the active site, which could delay or prevent substrate docking. Dynamic protein-lipid H-bond networks, water access to the active site, and fluctuations in the orientation of GlpG suggest that GlpG has lipid-coupled dynamics that could shape the energy landscape of transmembrane substrate docking.
Collapse
|