1
|
Bordin N, Scholes H, Rauer C, Roca-Martínez J, Sillitoe I, Orengo C. Clustering protein functional families at large scale with hierarchical approaches. Protein Sci 2024; 33:e5140. [PMID: 39145441 PMCID: PMC11325189 DOI: 10.1002/pro.5140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 07/22/2024] [Accepted: 07/24/2024] [Indexed: 08/16/2024]
Abstract
Proteins, fundamental to cellular activities, reveal their function and evolution through their structure and sequence. CATH functional families (FunFams) are coherent clusters of protein domain sequences in which the function is conserved across their members. The increasing volume and complexity of protein data enabled by large-scale repositories like MGnify or AlphaFold Database requires more powerful approaches that can scale to the size of these new resources. In this work, we introduce MARC and FRAN, two algorithms developed to build upon and address limitations of GeMMA/FunFHMMER, our original methods developed to classify proteins with related functions using a hierarchical approach. We also present CATH-eMMA, which uses embeddings or Foldseek distances to form relationship trees from distance matrices, reducing computational demands and handling various data types effectively. CATH-eMMA offers a highly robust and much faster tool for clustering protein functions on a large scale, providing a new tool for future studies in protein function and evolution.
Collapse
Affiliation(s)
- Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Harry Scholes
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Clemens Rauer
- Institute of Structural and Molecular Biology, University College London, London, UK
- Universidad Autonoma de Madrid, Ciudad Universitaria de Cantoblanco, Madrid, Spain
| | - Joel Roca-Martínez
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London, UK
| |
Collapse
|
2
|
Hlouchová K. Peptides En Route from Prebiotic to Biotic Catalysis. Acc Chem Res 2024; 57:2027-2037. [PMID: 39016062 PMCID: PMC11308367 DOI: 10.1021/acs.accounts.4c00137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 05/24/2024] [Accepted: 07/03/2024] [Indexed: 07/18/2024]
Abstract
ConspectusIn the quest to understand prebiotic catalysis, different molecular entities, mainly minerals, metal ions, organic cofactors, and ribozymes, have been implied as key players. Of these, inorganic and organic cofactors have gained attention for their ability to catalyze a wide array of reactions central to modern metabolism and frequently participate in these reactions within modern enzymes. Nevertheless, bridging the gap between prebiotic and modern metabolism remains a fundamental question in the origins of life.In this Account, peptides are investigated as a potential bridge linking prebiotic catalysis by minerals/cofactors to enzymes that dominate modern life's chemical reactions. Before ribosomal synthesis emerged, peptides of random sequences were plausible on early Earth. This was made possible by different sources of amino acid delivery and synthesis, as well as their condensation under a variety of conditions. Early peptides and proteins probably exhibited distinct compositions, enriched in small aliphatic and acidic residues. An increase in abundance of amino acids with larger side chains and canonical basic groups was most likely dependent on the emergence of their more challenging (bio)synthesis. Pressing questions thus arise: how did this composition influence the early peptide properties, and to what extent could they contribute to early metabolism?Recent research from our group and colleagues shows that highly acidic peptides/proteins comprising only the presumably "early" amino acids are in fact competent at secondary structure formation and even possess adaptive folding characteristics such as spontaneous refoldability and chaperone independence to achieve soluble structures. Moreover, we showed that highly acidic proteins of presumably "early" composition can still bind RNA by utilizing metal ions as cofactors to bridge carboxylate and phosphoester functional groups. And finally, ancient organic cofactors were shown to be capable of binding to sequences from amino acids considered prebiotically plausible, supporting their folding properties and providing functional groups, which would nominate them as catalytic hubs of great prebiotic relevance.These findings underscore the biochemical plausibility of an early peptide/protein world devoid of more complex amino acids yet collaborating with other catalytic species. Drawing from the mechanistic properties of protein-cofactor catalysis, it is speculated here that the early peptide/protein-cofactor ensemble could facilitate a similar range of chemical reactions, albeit with lower catalytic rates. This hypothesis invites a systematic experimental test.Nonetheless, this Account does not exclude other scenarios of prebiotic-to-biotic catalysis or prioritize any specific pathways of prebiotic syntheses. The objective is to examine peptide availability, composition, and functional potential among the various factors involved in the emergence of early life.
Collapse
Affiliation(s)
- Klára Hlouchová
- Department
of Cell Biology, Faculty of Science, Charles
University, Prague 12800, Czech Republic
- Institute
of Organic Chemistry and Biochemistry, Czech
Academy of Sciences, Prague 16610, Czech Republic
| |
Collapse
|
3
|
Murata H, Toko K, Chikenji G. Protein superfolds are characterised as frustration-free topologies: A case study of pure parallel β-sheet topologies. PLoS Comput Biol 2024; 20:e1012282. [PMID: 39110764 PMCID: PMC11333010 DOI: 10.1371/journal.pcbi.1012282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 08/19/2024] [Accepted: 06/26/2024] [Indexed: 08/21/2024] Open
Abstract
A protein superfold is a type of protein fold that is observed in at least three distinct, non-homologous protein families. Structural classification studies have revealed a limited number of prevalent superfolds alongside several infrequent occurring folds, and in α/β type superfolds, the C-terminal β-strand tends to favor the edge of the β-sheet, while the N-terminal β-strand is often found in the middle. The reasons behind these observations, whether they are due to evolutionary sampling bias or physical interactions, remain unclear. This article offers a physics-based explanation for these observations, specifically for pure parallel β-sheet topologies. Our investigation is grounded in several established structural rules that are based on physical interactions. We have identified "frustration-free topologies" which are topologies that can satisfy all the rules simultaneously. In contrast, topologies that cannot are termed "frustrated topologies." Our findings reveal that frustration-free topologies represent only a fraction of all theoretically possible patterns, these topologies strongly favor positioning the C-terminal β-strand at the edge of the β-sheet and the N-terminal β-strand in the middle, and there is significant overlap between frustration-free topologies and superfolds. We also used a lattice protein model to thoroughly investigate sequence-structure relationships. Our results show that frustration-free structures are highly designable, while frustrated structures are poorly designable. These findings suggest that superfolds are highly designable due to their lack of frustration, and the preference for positioning C-terminal β-strands at the edge of the β-sheet is a direct result of frustration-free topologies. These insights not only enhance our understanding of sequence-structure relationships but also have significant implications for de novo protein design.
Collapse
Affiliation(s)
- Hiroto Murata
- Department of Applied Physics, Nagoya University, Nagoya, Aichi, Japan
| | - Kazuma Toko
- Department of Applied Physics, Nagoya University, Nagoya, Aichi, Japan
| | - George Chikenji
- Department of Applied Physics, Nagoya University, Nagoya, Aichi, Japan
| |
Collapse
|
4
|
Škrbić T, Giacometti A, Hoang TX, Maritan A, Banavar JR. Amino-Acid Characteristics in Protein Native State Structures. Biomolecules 2024; 14:805. [PMID: 39062519 PMCID: PMC11274641 DOI: 10.3390/biom14070805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 07/02/2024] [Accepted: 07/05/2024] [Indexed: 07/28/2024] Open
Abstract
The molecular machines of life, proteins, are made up of twenty kinds of amino acids, each with distinctive side chains. We present a geometrical analysis of the protrusion statistics of side chains in more than 4000 high-resolution protein structures. We employ a coarse-grained representation of the protein backbone viewed as a linear chain of Cα atoms and consider just the heavy atoms of the side chains. We study the large variety of behaviors of the amino acids based on both rudimentary structural chemistry as well as geometry. Our geometrical analysis uses a backbone Frenet coordinate system for the common study of all amino acids. Our analysis underscores the richness of the repertoire of amino acids that is available to nature to design protein sequences that fit within the putative native state folds.
Collapse
Affiliation(s)
- Tatjana Škrbić
- Department of Molecular Sciences and Nanosystems, Ca’ Foscari University of Venice, Campus Scientifico, Via Torino 155, 30170 Venice Mestre, Italy;
- Department of Physics and Institute for Fundamental Science, University of Oregon, Eugene, OR 97403, USA;
| | - Achille Giacometti
- Department of Molecular Sciences and Nanosystems, Ca’ Foscari University of Venice, Campus Scientifico, Via Torino 155, 30170 Venice Mestre, Italy;
- European Centre for Living Technology (ECLT), Ca’ Bottacin, Dorsoduro 3911, Calle Crosera, 30123 Venice, Italy
| | - Trinh X. Hoang
- Institute of Physics, Vietnam Academy of Science and Technology, 10 DaoTan, Ba Dinh, Hanoi 11108, Vietnam;
| | - Amos Maritan
- Department of Physics and Astronomy, University of Padua, Via Marzolo 8, 35131 Padua, Italy;
| | - Jayanth R. Banavar
- Department of Physics and Institute for Fundamental Science, University of Oregon, Eugene, OR 97403, USA;
| |
Collapse
|
5
|
de Crécy-Lagard V, Dias R, Friedberg I, Yuan Y, Swairjo MA. Limitations of Current Machine-Learning Models in Predicting Enzymatic Functions for Uncharacterized Proteins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.01.601547. [PMID: 39005379 PMCID: PMC11244979 DOI: 10.1101/2024.07.01.601547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Thirty to seventy percent of proteins in any given genome have no assigned function and have been labeled as the protein "unknownme". This large knowledge gap prevents the biological community from fully leveraging the plethora of genomic data that is now available. Machine-learning approaches are showing some promise in propagating functional knowledge from experimentally characterized proteins to the correct set of isofunctional orthologs. However, they largely fail to predict enzymatic functions unseen in the training set, as shown by dissecting the predictions made for 450 enzymes of unknown function from the model bacteria Escherichia coli using the DeepECTransformer platform. Lessons from these failures can help the community develop machine-learning methods that assist domain experts in making testable functional predictions for more members of the uncharacterized proteome.
Collapse
|
6
|
Park A, Lee C, Lee JY. Genomic Evolution and Recombination Dynamics of Human Adenovirus D Species: Insights from Comprehensive Bioinformatic Analysis. J Microbiol 2024; 62:393-407. [PMID: 38451451 DOI: 10.1007/s12275-024-00112-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 01/10/2024] [Accepted: 01/14/2024] [Indexed: 03/08/2024]
Abstract
Human adenoviruses (HAdVs) can infect various epithelial mucosal cells, ultimately causing different symptoms in infected organ systems. With more than 110 types classified into seven species (A-G), HAdV-D species possess the highest number of viruses and are the fastest proliferating. The emergence of new adenovirus types and increased diversity are driven by homologous recombination (HR) between viral genes, primarily in structural elements such as the penton base, hexon and fiber proteins, and the E1 and E3 regions. A comprehensive analysis of the HAdV genome provides valuable insights into the evolution of human adenoviruses and identifies genes that display high variation across the entire genome to determine recombination patterns. Hypervariable regions within genetic sequences correlate with functional characteristics, thus allowing for adaptation to new environments and hosts. Proteotyping of newly emerging and already established adenoviruses allows for prediction of the characteristics of novel viruses. HAdV-D species evolved in a direction that increased diversity through gene recombination. Bioinformatics analysis across the genome, particularly in highly variable regions, allows for the verification or re-evaluation of recombination patterns in both newly introduced and pre-existing viruses, ultimately aiding in tracing various biological traits such as virus tropism and pathogenesis. Our research does not only assist in predicting the emergence of new adenoviruses but also offers critical guidance in regard to identifying potential regulatory factors of homologous recombination hotspots.
Collapse
Affiliation(s)
- Anyeseu Park
- The Laboratory of Viromics and Evolution, Korea Zoonosis Research Institute, Jeonbuk National University, Iksan, 54531, Republic of Korea
| | - Chanhee Lee
- The Laboratory of Viromics and Evolution, Korea Zoonosis Research Institute, Jeonbuk National University, Iksan, 54531, Republic of Korea
| | - Jeong Yoon Lee
- The Laboratory of Viromics and Evolution, Korea Zoonosis Research Institute, Jeonbuk National University, Iksan, 54531, Republic of Korea.
| |
Collapse
|
7
|
Schaeffer RD, Zhang J, Medvedev KE, Kinch LN, Cong Q, Grishin NV. ECOD domain classification of 48 whole proteomes from AlphaFold Structure Database using DPAM2. PLoS Comput Biol 2024; 20:e1011586. [PMID: 38416793 PMCID: PMC10927120 DOI: 10.1371/journal.pcbi.1011586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 03/11/2024] [Accepted: 02/20/2024] [Indexed: 03/01/2024] Open
Abstract
Protein structure prediction has now been deployed widely across several different large protein sets. Large-scale domain annotation of these predictions can aid in the development of biological insights. Using our Evolutionary Classification of Protein Domains (ECOD) from experimental structures as a basis for classification, we describe the detection and cataloging of domains from 48 whole proteomes deposited in the AlphaFold Database. On average, we can provide positive classification (either of domains or other identifiable non-domain regions) for 90% of residues in all proteomes. We classified 746,349 domains from 536,808 proteins comprised of over 226,424,000 amino acid residues. We examine the varying populations of homologous groups in both eukaryotes and bacteria. In addition to containing a higher fraction of disordered regions and unassigned domains, eukaryotes show a higher proportion of repeated proteins, both globular and small repeats. We enumerate those highly populated domains that are shared in both eukaryotes and bacteria, such as the Rossmann domains, TIM barrels, and P-loop domains. Additionally, we compare the sampling of homologous groups from this whole proteome set against our stable ECOD reference and discuss groups that have been enriched by structure predictions. Finally, we discuss the implication of these results for protein target selection for future classification strategies for very large protein sets.
Collapse
Affiliation(s)
- R. Dustin Schaeffer
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Jing Zhang
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Kirill E. Medvedev
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Lisa N. Kinch
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Qian Cong
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Nick V. Grishin
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| |
Collapse
|
8
|
Javed A, Habib S, Ayub A. Evolution of protein domain repertoires of CALHM6. PeerJ 2024; 12:e16063. [PMID: 38188152 PMCID: PMC10768655 DOI: 10.7717/peerj.16063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 08/18/2023] [Indexed: 01/09/2024] Open
Abstract
Calcium (Ca2 +) homeostasis is essential in conducting various cellular processes including nerve transmission, muscular movement, and immune response. Changes in Ca2 + concentration in the cytoplasm are significant in bringing about various immune responses such as pathogen clearance and apoptosis. Various key players are involved in calcium homeostasis such as calcium binders, pumps, and channels. Sequence-based evolutionary information has recently been exploited to predict the biophysical behaviors of proteins, giving critical clues about their functionality. Ion channels are reportedly the first channels developed during evolution. Calcium homeostasis modulator protein 6 (CALHM6) is one such channel. Comprised of a single domain called Ca_hom_mod, CALHM6 is a stable protein interacting with various other proteins in calcium regulation. No previous attempt has been made to trace the exact evolutionary events in the domain of CALHM6, leaving plenty of room for exploring its evolution across a wide range of organisms. The current study aims to answer the questions by employing a computational-based strategy that used profile Hidden Markov Models (HMMs) to scan for the CALHM6 domain, integrated the data with a time-calibrated phylogenetic tree using BEAST and Mesquite, and visualized through iTOL. Around 4,000 domains were identified, and 14,000 domain gain, loss, and duplication events were observed at the end which also included various protein domains other than CALHM6. The data were analyzed concerning CALHM6 evolution as well as the domain gain, loss, and duplication of its interacting partners: Calpain, Vinculin, protein S100-A7, Thioredoxin, Peroxiredoxin, and Calmodulin-like protein 5. Duplication events of CALHM6 near higher eukaryotes showed its increasing complexity in structure and function. This in-silico phylogenetic approach applied to trace the evolution of CALHM6 was an effective approach to get a better understanding of the protein CALHM6.
Collapse
Affiliation(s)
- Aneela Javed
- Molecular Immunology Laboratory, Department of Healthcare Biotechnology, Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), Islamabad, Pakistan
| | - Sabahat Habib
- Molecular Immunology Laboratory, Department of Healthcare Biotechnology, Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), Islamabad, Pakistan
| | - Aaima Ayub
- Molecular Immunology Laboratory, Department of Healthcare Biotechnology, Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology (NUST), Islamabad, Pakistan
| |
Collapse
|
9
|
Ardern Z. Alternative Reading Frames are an Underappreciated Source of Protein Sequence Novelty. J Mol Evol 2023; 91:570-580. [PMID: 37326679 DOI: 10.1007/s00239-023-10122-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 05/31/2023] [Indexed: 06/17/2023]
Abstract
Protein-coding DNA sequences can be translated into completely different amino acid sequences if the nucleotide triplets used are shifted by a non-triplet amount on the same DNA strand or by translating codons from the opposite strand. Such "alternative reading frames" of protein-coding genes are a major contributor to the evolution of novel protein products. Recent studies demonstrating this include examples across the three domains of cellular life and in viruses. These sequences increase the number of trials potentially available for the evolutionary invention of new genes and also have unusual properties which may facilitate gene origin. There is evidence that the structure of the standard genetic code contributes to the features and gene-likeness of some alternative frame sequences. These findings have important implications across diverse areas of molecular biology, including for genome annotation, structural biology, and evolutionary genomics.
Collapse
|
10
|
Morris MA, Mills CE, Paloni JM, Miller EA, Sikes HD, Olsen BD. High-Throughput Screening of Streptavidin-Binding Proteins in Self-Assembled Solid Films for Directed Evolution of Materials. NANO LETTERS 2023; 23:7303-7310. [PMID: 37566825 DOI: 10.1021/acs.nanolett.3c01229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/13/2023]
Abstract
Evolution has shaped the development of proteins with an incredible diversity of properties. Incorporating proteins into materials is desirable for applications including biosensing; however, high-throughput selection techniques for screening protein libraries in materials contexts is lacking. In this work, a high-throughput platform to assess the binding affinity for ordered sensing proteins was established. A library of fusion proteins, consisting of an elastin-like polypeptide block, one of 22 variants of rcSso7d, and a coiled-coil order-directing sequence, was generated. All selected variants had high binding in films, likely due to the similarity of the assay to magnetic bead sorting used for initial selection, while solution binding was more variable. From these results, both the assembly of the fusion proteins in their operating state and the functionality of the binding protein are key factors in the biosensing performance. Thus, the integration of directed evolution with assembled systems is necessary to the design of better materials.
Collapse
Affiliation(s)
- Melody A Morris
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Carolyn E Mills
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Justin M Paloni
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Eric A Miller
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Hadley D Sikes
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Bradley D Olsen
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
11
|
Banavar JR, Giacometti A, Hoang TX, Maritan A, Škrbić T. A geometrical framework for thinking about proteins. Proteins 2023. [PMID: 37565735 DOI: 10.1002/prot.26567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Revised: 07/16/2023] [Accepted: 07/21/2023] [Indexed: 08/12/2023]
Abstract
We present a model, based on symmetry and geometry, for proteins. Using elementary ideas from mathematics and physics, we derive the geometries of discrete helices and sheets. We postulate a compatible solvent-mediated emergent pairwise attraction that assembles these building blocks, while respecting their individual symmetries. Instead of seeking to mimic the complexity of proteins, we look for a simple abstraction of reality that yet captures the essence of proteins. We employ analytic calculations and detailed Monte Carlo simulations to explore some consequences of our theory. The predictions of our approach are in accord with experimental data. Our framework provides a rationalization for understanding the common characteristics of proteins. Our results show that the free energy landscape of a globular protein is pre-sculpted at the backbone level, sequences and functionalities evolve in the fixed backdrop of the folds determined by geometry and symmetry, and that protein structures are unique in being simultaneously characterized by stability, diversity, and sensitivity.
Collapse
Affiliation(s)
- Jayanth R Banavar
- Department of Physics and Institute for Fundamental Science, University of Oregon, Eugene, Oregon, USA
| | - Achille Giacometti
- Ca' Foscari University of Venice, Department of Molecular Sciences and Nanosystems, Venice, Italy
- European Centre for Living Technology (ECLT), Venice, Italy
| | - Trinh X Hoang
- Vietnam Academy of Science and Technology, Institute of Physics, Hanoi, Vietnam
| | - Amos Maritan
- University of Padua, Department of Physics and Astronomy, Padua, Italy
| | - Tatjana Škrbić
- Department of Physics and Institute for Fundamental Science, University of Oregon, Eugene, Oregon, USA
- Ca' Foscari University of Venice, Department of Molecular Sciences and Nanosystems, Venice, Italy
| |
Collapse
|
12
|
Sologova SS, Zavadskiy SP, Mokhosoev IM, Moldogazieva NT. Short Linear Motifs Orchestrate Functioning of Human Proteins during Embryonic Development, Redox Regulation, and Cancer. Metabolites 2022; 12:metabo12050464. [PMID: 35629968 PMCID: PMC9144484 DOI: 10.3390/metabo12050464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 05/18/2022] [Accepted: 05/19/2022] [Indexed: 11/16/2022] Open
Abstract
Short linear motifs (SLiMs) are evolutionarily conserved functional modules of proteins that represent amino acid stretches composed of 3 to 10 residues. The biological activities of two short peptide segments of human alpha-fetoprotein (AFP), a major embryo-specific and cancer-related protein, have been confirmed experimentally. This is a heptapeptide segment LDSYQCT in domain I designated as AFP14–20 and a nonapeptide segment EMTPVNPGV in domain III designated as GIP-9. In our work, we searched the UniprotKB database for human proteins that contain SLiMs with sequence similarity to the both segments of human AFP and undertook gene ontology (GO)-based functional categorization of retrieved proteins. Gene set enrichment analysis included GO terms for biological process, molecular function, metabolic pathway, KEGG pathway, and protein–protein interaction (PPI) categories. We identified the SLiMs of interest in a variety of non-homologous proteins involved in multiple cellular processes underlying embryonic development, cancer progression, and, unexpectedly, the regulation of redox homeostasis. These included transcription factors, cell adhesion proteins, ubiquitin-activating and conjugating enzymes, cell signaling proteins, and oxidoreductase enzymes. They function by regulating cell proliferation and differentiation, cell cycle, DNA replication/repair/recombination, metabolism, immune/inflammatory response, and apoptosis. In addition to the retrieved genes, new interacting genes were identified. Our data support the hypothesis that conserved SLiMs are incorporated into non-homologous proteins to serve as functional blocks for their orchestrated functioning.
Collapse
Affiliation(s)
- Susanna S. Sologova
- Nelyubin Institute of Pharmacy, Sechenov First Moscow State Medical University, (Sechenov University), 119991 Moscow, Russia; (S.S.S.); (S.P.Z.)
| | - Sergey P. Zavadskiy
- Nelyubin Institute of Pharmacy, Sechenov First Moscow State Medical University, (Sechenov University), 119991 Moscow, Russia; (S.S.S.); (S.P.Z.)
| | - Innokenty M. Mokhosoev
- Department of Biochemistry and Molecular Biology, Pirogov Russian National Research Medical University, 117997 Moscow, Russia;
| | - Nurbubu T. Moldogazieva
- Nelyubin Institute of Pharmacy, Sechenov First Moscow State Medical University, (Sechenov University), 119991 Moscow, Russia; (S.S.S.); (S.P.Z.)
- Correspondence:
| |
Collapse
|
13
|
Tsybovsky Y, Sereda V, Golczak M, Krupenko NI, Krupenko SA. Structure of putative tumor suppressor ALDH1L1. Commun Biol 2022; 5:3. [PMID: 35013550 PMCID: PMC8748788 DOI: 10.1038/s42003-021-02963-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 12/10/2021] [Indexed: 11/08/2022] Open
Abstract
Putative tumor suppressor ALDH1L1, the product of natural fusion of three unrelated genes, regulates folate metabolism by catalyzing NADP+-dependent conversion of 10-formyltetrahydrofolate to tetrahydrofolate and CO2. Cryo-EM structures of tetrameric rat ALDH1L1 revealed the architecture and functional domain interactions of this complex enzyme. Highly mobile N-terminal domains, which remove formyl from 10-formyltetrahydrofolate, undergo multiple transient inter-domain interactions. The C-terminal aldehyde dehydrogenase domains, which convert formyl to CO2, form unusually large interfaces with the intermediate domains, homologs of acyl/peptidyl carrier proteins (A/PCPs), which transfer the formyl group between the catalytic domains. The 4'-phosphopantetheine arm of the intermediate domain is fully extended and reaches deep into the catalytic pocket of the C-terminal domain. Remarkably, the tetrameric state of ALDH1L1 is indispensable for catalysis because the intermediate domain transfers formyl between the catalytic domains of different protomers. These findings emphasize the versatility of A/PCPs in complex, highly dynamic enzymatic systems.
Collapse
Affiliation(s)
- Yaroslav Tsybovsky
- Cancer Research Technology Program, Leidos Biomedical Research Inc., Frederick National Laboratory for Cancer Research, 8560 Progress Drive, Frederick, MD, 21701, USA.
| | - Valentin Sereda
- Nutrition Research Institute, University of North Carolina at Chapel Hill, 500 Laureate Way, Kannapolis, NC, 28081, USA
| | - Marcin Golczak
- Department of Pharmacology, School of Medicine, Case Western Reserve University, 10900 Euclid Ave, Cleveland, OH, 44106, USA
| | - Natalia I Krupenko
- Nutrition Research Institute, University of North Carolina at Chapel Hill, 500 Laureate Way, Kannapolis, NC, 28081, USA
- Department of Nutrition, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC, 27599, USA
| | - Sergey A Krupenko
- Nutrition Research Institute, University of North Carolina at Chapel Hill, 500 Laureate Way, Kannapolis, NC, 28081, USA.
- Department of Nutrition, University of North Carolina at Chapel Hill, 135 Dauer Drive, Chapel Hill, NC, 27599, USA.
| |
Collapse
|
14
|
Waman VP, Orengo C, Kleywegt GJ, Lesk AM. Three-dimensional Structure Databases of Biological Macromolecules. Methods Mol Biol 2022; 2449:43-91. [PMID: 35507259 DOI: 10.1007/978-1-0716-2095-3_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Databases of three-dimensional structures of proteins (and their associated molecules) provide: (a) Curated repositories of coordinates of experimentally determined structures, including extensive metadata; for instance information about provenance, details about data collection and interpretation, and validation of results. (b) Information-retrieval tools to allow searching to identify entries of interest and provide access to them. (c) Links among databases, especially to databases of amino-acid and genetic sequences, and of protein function; and links to software for analysis of amino-acid sequence and protein structure, and for structure prediction. (d) Collections of predicted three-dimensional structures of proteins. These will become more and more important after the breakthrough in structure prediction achieved by AlphaFold2. The single global archive of experimentally determined biomacromolecular structures is the Protein Data Bank (PDB). It is managed by wwPDB, a consortium of five partner institutions: the Protein Data Bank in Europe (PDBe), the Research Collaboratory for Structural Bioinformatics (RCSB), the Protein Data Bank Japan (PDBj), the BioMagResBank (BMRB), and the Electron Microscopy Data Bank (EMDB). In addition to jointly managing the PDB repository, the individual wwPDB partners offer many tools for analysis of protein and nucleic acid structures and their complexes, including providing computer-graphic representations. Their collective and individual websites serve as hubs of the community of structural biologists, offering newsletters, reports from Task Forces, training courses, and "helpdesks," as well as links to external software.Many specialized projects are based on the information contained in the PDB. Especially important are SCOP, CATH, and ECOD, which present classifications of protein domains.
Collapse
Affiliation(s)
- Vaishali P Waman
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London, UK
| | - Gerard J Kleywegt
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Arthur M Lesk
- Department of Biochemistry and Molecular Biology and Center for Computational Biology and Bioinformatics, The Pennsylvania State University, University Park, PA, USA.
| |
Collapse
|