1
|
Listov D, Goverde CA, Correia BE, Fleishman SJ. Opportunities and challenges in design and optimization of protein function. Nat Rev Mol Cell Biol 2024; 25:639-653. [PMID: 38565617 PMCID: PMC7616297 DOI: 10.1038/s41580-024-00718-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/27/2024] [Indexed: 04/04/2024]
Abstract
The field of protein design has made remarkable progress over the past decade. Historically, the low reliability of purely structure-based design methods limited their application, but recent strategies that combine structure-based and sequence-based calculations, as well as machine learning tools, have dramatically improved protein engineering and design. In this Review, we discuss how these methods have enabled the design of increasingly complex structures and therapeutically relevant activities. Additionally, protein optimization methods have improved the stability and activity of complex eukaryotic proteins. Thanks to their increased reliability, computational design methods have been applied to improve therapeutics and enzymes for green chemistry and have generated vaccine antigens, antivirals and drug-delivery nano-vehicles. Moreover, the high success of design methods reflects an increased understanding of basic rules that govern the relationships among protein sequence, structure and function. However, de novo design is still limited mostly to α-helix bundles, restricting its potential to generate sophisticated enzymes and diverse protein and small-molecule binders. Designing complex protein structures is a challenging but necessary next step if we are to realize our objective of generating new-to-nature activities.
Collapse
Affiliation(s)
- Dina Listov
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel
| | - Casper A Goverde
- Institute of Bioengineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Bruno E Correia
- Institute of Bioengineering, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.
| | - Sarel Jacob Fleishman
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel.
| |
Collapse
|
2
|
Xu Y, Hu X, Wang C, Liu Y, Chen Q, Liu H. De novo design of cavity-containing proteins with a backbone-centered neural network energy function. Structure 2024; 32:424-432.e4. [PMID: 38325370 DOI: 10.1016/j.str.2024.01.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Revised: 10/04/2023] [Accepted: 01/11/2024] [Indexed: 02/09/2024]
Abstract
The design of small-molecule-binding proteins requires protein backbones that contain cavities. Previous design efforts were based on naturally occurring cavity-containing backbone architectures. Here, we designed diverse cavity-containing backbones without predefined architectures by introducing tailored restraints into the backbone sampling driven by SCUBA (Side Chain-Unknown Backbone Arrangement), a neural network statistical energy function. For 521 out of 5816 designs, the root-mean-square deviations (RMSDs) of the Cα atoms for the AlphaFold2-predicted structures and our designed structures are within 2.0 Å. We experimentally tested 10 designed proteins and determined the crystal structures of two of them. One closely agrees with the designed model, while the other forms a domain-swapped dimer, where the partial structures are in agreement with the designed structures. Our results indicate that data-driven methods such as SCUBA hold great potential for designing de novo proteins with tailored small-molecule-binding function.
Collapse
Affiliation(s)
- Yang Xu
- Department of Rheumatology and Immunology, The First Affiliated Hospital of USTC, Centre for Advanced Interdisciplinary Science and Biomedicine of IHM, Hefei National Center for Interdisciplinary Sciences at the Microscale, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230001, China; MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, Hefei National Laboratory for Physical Sciences at the Microscale, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Xiuhong Hu
- Department of Rheumatology and Immunology, The First Affiliated Hospital of USTC, Centre for Advanced Interdisciplinary Science and Biomedicine of IHM, Hefei National Center for Interdisciplinary Sciences at the Microscale, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230001, China; MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, Hefei National Laboratory for Physical Sciences at the Microscale, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Chenchen Wang
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, Hefei National Laboratory for Physical Sciences at the Microscale, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Yongrui Liu
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, Hefei National Laboratory for Physical Sciences at the Microscale, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Quan Chen
- Department of Rheumatology and Immunology, The First Affiliated Hospital of USTC, Centre for Advanced Interdisciplinary Science and Biomedicine of IHM, Hefei National Center for Interdisciplinary Sciences at the Microscale, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230001, China; MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, Hefei National Laboratory for Physical Sciences at the Microscale, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China; Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, Anhui 230027, China.
| | - Haiyan Liu
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, Hefei National Laboratory for Physical Sciences at the Microscale, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China; Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, Anhui 230027, China; School of Data Science, University of Science and Technology of China, Hefei, Anhui 230027, China.
| |
Collapse
|
3
|
Niitsu A, Sugita Y. Towards de novo design of transmembrane α-helical assemblies using structural modelling and molecular dynamics simulation. Phys Chem Chem Phys 2023; 25:3595-3606. [PMID: 36647771 DOI: 10.1039/d2cp03972a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Computational de novo protein design involves iterative processes consisting of amino acid sequence design, structural modelling and scoring, and design validation by synthesis and experimental characterisation. Recent advances in protein structure prediction and modelling methods have enabled the highly efficient and accurate design of water-soluble proteins. However, the design of membrane proteins remains a major challenge. To advance membrane protein design, considering the higher complexity of membrane protein folding, stability, and dynamic interactions between water, ions, lipids, and proteins is an important task. For introducing explicit solvents and membranes to these design methods, all-atom molecular dynamics (MD) simulations of designed proteins provide useful information that cannot be obtained experimentally. In this review, we first describe two major approaches to designing transmembrane α-helical assemblies, consensus and de novo design. We further illustrate recent MD studies of membrane protein folding related to protein design, as well as advanced treatments in molecular models and conformational sampling techniques in the simulations. Finally, we discuss the possibility to introduce MD simulations after the existing static modelling and screening of design decoys as an additional step for refinement of the design, which considers membrane protein folding dynamics and interactions with explicit membranes.
Collapse
Affiliation(s)
- Ai Niitsu
- Theoretical Molecular Science Laboratory, RIKEN Cluster for Pioneering Research, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan.
| | - Yuji Sugita
- Theoretical Molecular Science Laboratory, RIKEN Cluster for Pioneering Research, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan. .,Computational Biophysics Research Team, RIKEN Center for Computational Science, 7-1-26 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan.,Laboratory for Biomolecular Function Simulation, RIKEN Center for Biosystems Dynamics Research, 6-7-1 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
| |
Collapse
|
4
|
Abstract
De novo protein design enables the exploration of novel sequences and structures absent from the natural protein universe. De novo design also stands as a stringent test for our understanding of the underlying physical principles of protein folding and may lead to the development of proteins with unmatched functional characteristics. The first fundamental challenge of de novo design is to devise "designable" structural templates leading to sequences that will adopt the predicted fold. Here, we built on the TopoBuilder (TB) de novo design method, to automatically assemble structural templates with native-like features starting from string descriptors that capture the overall topology of proteins. Our framework eliminates the dependency of hand-crafted and fold-specific rules through an iterative, data-driven approach that extracts geometrical parameters from structural tertiary motifs. We evaluated the TopoBuilder framework by designing sequences for a set of five protein folds and experimental characterization revealed that several sequences were folded and stable in solution. The TopoBuilder de novo design framework will be broadly useful to guide the generation of artificial proteins with customized geometries, enabling the exploration of the protein universe.
Collapse
|
5
|
Leipart V, Ludvigsen J, Kent M, Sandve S, To T, Árnyasi M, Kreibich CD, Dahle B, Amdam GV. Identification of 121 variants of honey bee Vitellogenin protein sequences with structural differences at functional sites. Protein Sci 2022; 31:e4369. [PMID: 35762708 PMCID: PMC9207902 DOI: 10.1002/pro.4369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 05/21/2022] [Indexed: 12/04/2022]
Abstract
Proteins are under selection to maintain central functions and to accommodate needs that arise in ever-changing environments. The positive selection and neutral drift that preserve functions result in a diversity of protein variants. The amount of diversity differs between proteins: multifunctional or disease-related proteins tend to have fewer variants than proteins involved in some aspects of immunity. Our work focuses on the extensively studied protein Vitellogenin (Vg), which in honey bees (Apis mellifera) is multifunctional and highly expressed and plays roles in immunity. Yet, almost nothing is known about the natural variation in the coding sequences of this protein or how amino acid-altering variants might impact structure-function relationships. Here, we map out allelic variation in honey bee Vg using biological samples from 15 countries. The successful barcoded amplicon Nanopore sequencing of 543 bees revealed 121 protein variants, indicating a high level of diversity in Vg. We find that the distribution of non-synonymous single nucleotide polymorphisms (nsSNPs) differs between protein regions with different functions; domains involved in DNA and protein-protein interactions contain fewer nsSNPs than the protein's lipid binding cavities. We outline how the central functions of the protein can be maintained in different variants and how the variation pattern may inform about selection from pathogens and nutrition.
Collapse
Affiliation(s)
- Vilde Leipart
- Faculty of Environmental Sciences and Natural Resource ManagementNorwegian University of Life SciencesÅsNorway
| | - Jane Ludvigsen
- Faculty of Environmental Sciences and Natural Resource ManagementNorwegian University of Life SciencesÅsNorway
- Fürst Medisinsk LaboratoriumOsloNorway
| | - Matthew Kent
- Department of Animal and Aquacultural Sciences, Centre for Integrative Genetics (CIGENE)Norwegian University of Life SciencesÅsNorway
| | - Simen Sandve
- Department of Animal and Aquacultural Sciences, Centre for Integrative Genetics (CIGENE)Norwegian University of Life SciencesÅsNorway
| | - Thu‐Hien To
- Department of Animal and Aquacultural Sciences, Centre for Integrative Genetics (CIGENE)Norwegian University of Life SciencesÅsNorway
| | - Mariann Árnyasi
- Department of Animal and Aquacultural Sciences, Centre for Integrative Genetics (CIGENE)Norwegian University of Life SciencesÅsNorway
| | - Claus D. Kreibich
- Faculty of Environmental Sciences and Natural Resource ManagementNorwegian University of Life SciencesÅsNorway
| | - Bjørn Dahle
- Faculty of Environmental Sciences and Natural Resource ManagementNorwegian University of Life SciencesÅsNorway
- Norwegian Beekeepers AssociationKløftaNorway
| | - Gro V. Amdam
- Faculty of Environmental Sciences and Natural Resource ManagementNorwegian University of Life SciencesÅsNorway
- School of Life SciencesArizona State UniversityTempeArizonaUSA
| |
Collapse
|
6
|
Yadahalli S, Jayanthi LP, Gosavi S. A Method for Assessing the Robustness of Protein Structures by Randomizing Packing Interactions. Front Mol Biosci 2022; 9:849272. [PMID: 35832734 PMCID: PMC9271847 DOI: 10.3389/fmolb.2022.849272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Accepted: 04/27/2022] [Indexed: 12/02/2022] Open
Abstract
Many single-domain proteins are not only stable and water-soluble, but they also populate few to no intermediates during folding. This reduces interactions between partially folded proteins, misfolding, and aggregation, and makes the proteins tractable in biotechnological applications. Natural proteins fold thus, not necessarily only because their structures are well-suited for folding, but because their sequences optimize packing and fit their structures well. In contrast, folding experiments on the de novo designed Top7 suggest that it populates several intermediates. Additionally, in de novo protein design, where sequences are designed for natural and new non-natural structures, tens of sequences still need to be tested before success is achieved. Both these issues may be caused by the specific scaffolds used in design, i.e., some protein scaffolds may be more tolerant to packing perturbations and varied sequences. Here, we report a computational method for assessing the response of protein structures to packing perturbations. We then benchmark this method using designed proteins and find that it can identify scaffolds whose folding gets disrupted upon perturbing packing, leading to the population of intermediates. The method can also isolate regions of both natural and designed scaffolds that are sensitive to such perturbations and identify contacts which when present can rescue folding. Overall, this method can be used to identify protein scaffolds that are more amenable to whole protein design as well as to identify protein regions which are sensitive to perturbations and where further mutations should be avoided during protein engineering.
Collapse
Affiliation(s)
| | | | - Shachi Gosavi
- Simons Centre for the Study of Living Machines, National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, India
| |
Collapse
|
7
|
Ding W, Nakai K, Gong H. Protein design via deep learning. Brief Bioinform 2022; 23:bbac102. [PMID: 35348602 PMCID: PMC9116377 DOI: 10.1093/bib/bbac102] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 02/26/2022] [Accepted: 03/01/2022] [Indexed: 12/11/2022] Open
Abstract
Proteins with desired functions and properties are important in fields like nanotechnology and biomedicine. De novo protein design enables the production of previously unseen proteins from the ground up and is believed as a key point for handling real social challenges. Recent introduction of deep learning into design methods exhibits a transformative influence and is expected to represent a promising and exciting future direction. In this review, we retrospect the major aspects of current advances in deep-learning-based design procedures and illustrate their novelty in comparison with conventional knowledge-based approaches through noticeable cases. We not only describe deep learning developments in structure-based protein design and direct sequence design, but also highlight recent applications of deep reinforcement learning in protein design. The future perspectives on design goals, challenges and opportunities are also comprehensively discussed.
Collapse
Affiliation(s)
- Wenze Ding
- School of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing 210044, China
- School of Future Technology, Nanjing University of Information Science and Technology, Nanjing 210044, China
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing 100084, China
| | - Kenta Nakai
- Institute of Medical Science, the University of Tokyo, Tokyo 1088639, Japan
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing 100084, China
| |
Collapse
|
8
|
Raven SA, Payne B, Bruce M, Filipovska A, Rackham O. In silico evolution of nucleic acid-binding proteins from a nonfunctional scaffold. Nat Chem Biol 2022; 18:403-411. [PMID: 35210620 DOI: 10.1038/s41589-022-00967-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 01/04/2022] [Indexed: 11/09/2022]
Abstract
Directed evolution emulates the process of natural selection to produce proteins with improved or altered functions. These approaches have proven to be very powerful but are technically challenging and particularly time and resource intensive. To bypass these limitations, we constructed a system to perform the entire process of directed evolution in silico. We employed iterative computational cycles of mutation and evaluation to predict mutations that confer high-affinity binding activities for DNA and RNA to an initial de novo designed protein with no inherent function. Beneficial mutations revealed modes of nucleic acid recognition not previously observed in natural proteins, highlighting the ability of computational directed evolution to access new molecular functions. Furthermore, the process by which new functions were obtained closely resembles natural evolution and can provide insights into the contributions of mutation rate, population size and selective pressure on functionalization of macromolecules in nature.
Collapse
Affiliation(s)
- Samuel A Raven
- Harry Perkins Institute of Medical Research, Nedlands, Western Australia, Australia.,University of Western Australia Centre for Medical Research, Nedlands, Western Australia, Australia
| | - Blake Payne
- Harry Perkins Institute of Medical Research, Nedlands, Western Australia, Australia.,University of Western Australia Centre for Medical Research, Nedlands, Western Australia, Australia
| | - Mitchell Bruce
- Curtin Medical School, Curtin University, Bentley, Western Australia, Australia
| | - Aleksandra Filipovska
- Harry Perkins Institute of Medical Research, Nedlands, Western Australia, Australia.,University of Western Australia Centre for Medical Research, Nedlands, Western Australia, Australia.,School of Molecular Sciences, The University of Western Australia, Crawley, Western Australia, Australia.,Telethon Kids Institute, Northern Entrance, Perth Children's Hospital, Nedlands, Western Australia, Australia
| | - Oliver Rackham
- Harry Perkins Institute of Medical Research, Nedlands, Western Australia, Australia. .,Curtin Medical School, Curtin University, Bentley, Western Australia, Australia. .,Telethon Kids Institute, Northern Entrance, Perth Children's Hospital, Nedlands, Western Australia, Australia. .,Curtin Health Innovation Research Institute, Curtin University, Bentley, Western Australia, Australia.
| |
Collapse
|
9
|
Boral A, Khamaru M, Mitra D. Designing synthetic transcription factors: A structural perspective. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 130:245-287. [PMID: 35534109 DOI: 10.1016/bs.apcsb.2021.12.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this chapter, we discuss different design strategies of synthetic proteins, especially synthetic transcription factors. Design and engineering of synthetic transcription factors is particularly relevant for the need-based manipulation of gene expression. With recent advances in structural biology techniques and with the emergence of other precision biochemical/physical tools, accurate knowledge on structure-function relations is increasingly becoming available. Besides discussing the underlying principles of design, we go through individual cases, especially those involving four major groups of transcription factors-basic leucine zippers, zinc fingers, helix-turn-helix and homeodomains. We further discuss how synthetic biology can come together with structural biology to alter the genetic blueprint of life.
Collapse
Affiliation(s)
- Aparna Boral
- Department of Life Sciences, Presidency University, Kolkata, West Bengal, India
| | - Madhurima Khamaru
- Department of Life Sciences, Presidency University, Kolkata, West Bengal, India
| | - Devrani Mitra
- Department of Life Sciences, Presidency University, Kolkata, West Bengal, India.
| |
Collapse
|
10
|
Pan X, Kortemme T. Recent advances in de novo protein design: Principles, methods, and applications. J Biol Chem 2021; 296:100558. [PMID: 33744284 PMCID: PMC8065224 DOI: 10.1016/j.jbc.2021.100558] [Citation(s) in RCA: 93] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 03/12/2021] [Accepted: 03/16/2021] [Indexed: 02/06/2023] Open
Abstract
The computational de novo protein design is increasingly applied to address a number of key challenges in biomedicine and biological engineering. Successes in expanding applications are driven by advances in design principles and methods over several decades. Here, we review recent innovations in major aspects of the de novo protein design and include how these advances were informed by principles of protein architecture and interactions derived from the wealth of structures in the Protein Data Bank. We describe developments in de novo generation of designable backbone structures, optimization of sequences, design scoring functions, and the design of the function. The advances not only highlight design goals reachable now but also point to the challenges and opportunities for the future of the field.
Collapse
Affiliation(s)
- Xingjie Pan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA; UC Berkeley - UCSF Graduate Program in Bioengineering, University of California San Francisco, San Francisco, California, USA.
| | - Tanja Kortemme
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA; UC Berkeley - UCSF Graduate Program in Bioengineering, University of California San Francisco, San Francisco, California, USA; Quantitative Biosciences Institute (QBI), University of California San Francisco, San Francisco, California, USA.
| |
Collapse
|
11
|
Kundert K, Kortemme T. Computational design of structured loops for new protein functions. Biol Chem 2019; 400:275-288. [PMID: 30676995 PMCID: PMC6530579 DOI: 10.1515/hsz-2018-0348] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Accepted: 12/18/2018] [Indexed: 12/20/2022]
Abstract
The ability to engineer the precise geometries, fine-tuned energetics and subtle dynamics that are characteristic of functional proteins is a major unsolved challenge in the field of computational protein design. In natural proteins, functional sites exhibiting these properties often feature structured loops. However, unlike the elements of secondary structures that comprise idealized protein folds, structured loops have been difficult to design computationally. Addressing this shortcoming in a general way is a necessary first step towards the routine design of protein function. In this perspective, we will describe the progress that has been made on this problem and discuss how recent advances in the field of loop structure prediction can be harnessed and applied to the inverse problem of computational loop design.
Collapse
Affiliation(s)
- Kale Kundert
- Graduate Group in Biophysics, University of California San Francisco, San Francisco, CA 94158, USA
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158, USA
| | - Tanja Kortemme
- Graduate Group in Biophysics, University of California San Francisco, San Francisco, CA 94158, USA
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158, USA
- Chan Zuckerberg Biohub, 499 Illinois St, San Francisco, CA 94158, USA
| |
Collapse
|
12
|
Cardelli C, Tubiana L, Bianco V, Nerattini F, Dellago C, Coluzza I. Heteropolymer Design and Folding of Arbitrary Topologies Reveals an Unexpected Role of Alphabet Size on the Knot Population. Macromolecules 2018. [DOI: 10.1021/acs.macromol.8b01359] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Chiara Cardelli
- Faculty of Physics, University of Vienna, Boltzmanngasse 5, 1090 Vienna, Austria
| | - Luca Tubiana
- Faculty of Physics, University of Vienna, Boltzmanngasse 5, 1090 Vienna, Austria
| | - Valentino Bianco
- Faculty of Physics, University of Vienna, Boltzmanngasse 5, 1090 Vienna, Austria
| | - Francesca Nerattini
- Faculty of Physics, University of Vienna, Boltzmanngasse 5, 1090 Vienna, Austria
| | - Christoph Dellago
- Faculty of Physics, University of Vienna, Boltzmanngasse 5, 1090 Vienna, Austria
| | - Ivan Coluzza
- CIC biomaGUNE, Paseo Miramon 182, 20014 San Sebastian, Spain
- IKERBASQUE,
Basque
Foundation for Science, Maria Diaz de Haro 3, 48013 Bilbao, Spain
| |
Collapse
|
13
|
The role of directional interactions in the designability of generalized heteropolymers. Sci Rep 2017; 7:4986. [PMID: 28694466 PMCID: PMC5504045 DOI: 10.1038/s41598-017-04720-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Accepted: 05/18/2017] [Indexed: 12/31/2022] Open
Abstract
Heteropolymers are important examples of self-assembling systems. However, in the design of artificial heteropolymers the control over the single chain self-assembling properties does not reach that of the natural bio-polymers, and in particular proteins. Here, we introduce a sufficiency criterion to identify polymers that can be designed to adopt a predetermined structure and show that it is fulfilled by polymers made of monomers interacting through directional (anisotropic) interactions. The criterion is based on the appearance of a particular peak in the radial distribution function, that we show being a universal feature of all designable heteropolymers, as it is present also in natural proteins. Our criterion can be used to engineer new self-assembling modular polymers that will open new avenues for applications in materials science.
Collapse
|
14
|
Abstract
Here, we systematically decompose the known protein structural universe into its basic elements, which we dub tertiary structural motifs (TERMs). A TERM is a compact backbone fragment that captures the secondary, tertiary, and quaternary environments around a given residue, comprising one or more disjoint segments (three on average). We seek the set of universal TERMs that capture all structure in the Protein Data Bank (PDB), finding remarkable degeneracy. Only ∼600 TERMs are sufficient to describe 50% of the PDB at sub-Angstrom resolution. However, more rare geometries also exist, and the overall structural coverage grows logarithmically with the number of TERMs. We go on to show that universal TERMs provide an effective mapping between sequence and structure. We demonstrate that TERM-based statistics alone are sufficient to recapitulate close-to-native sequences given either NMR or X-ray backbones. Furthermore, sequence variability predicted from TERM data agrees closely with evolutionary variation. Finally, locations of TERMs in protein chains can be predicted from sequence alone based on sequence signatures emergent from TERM instances in the PDB. For multisegment motifs, this method identifies spatially adjacent fragments that are not contiguous in sequence-a major bottleneck in structure prediction. Although all TERMs recur in diverse proteins, some appear specialized for certain functions, such as interface formation, metal coordination, or even water binding. Structural biology has benefited greatly from previously observed degeneracies in structure. The decomposition of the known structural universe into a finite set of compact TERMs offers exciting opportunities toward better understanding, design, and prediction of protein structure.
Collapse
|
15
|
Brisendine JM, Koder RL. Fast, cheap and out of control--Insights into thermodynamic and informatic constraints on natural protein sequences from de novo protein design. BIOCHIMICA ET BIOPHYSICA ACTA 2016; 1857:485-492. [PMID: 26498191 PMCID: PMC4856154 DOI: 10.1016/j.bbabio.2015.10.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2015] [Accepted: 10/06/2015] [Indexed: 12/15/2022]
Abstract
The accumulated results of thirty years of rational and computational de novo protein design have taught us important lessons about the stability, information content, and evolution of natural proteins. First, de novo protein design has complicated the assertion that biological function is equivalent to biological structure - demonstrating the capacity to abstract active sites from natural contexts and paste them into non-native topologies without loss of function. The structure-function relationship has thus been revealed to be either a generality or strictly true only in a local sense. Second, the simplification to "maquette" topologies carried out by rational protein design also has demonstrated that even sophisticated functions such as conformational switching, cooperative ligand binding, and light-activated electron transfer can be achieved with low-information design approaches. This is because for simple topologies the functional footprint in sequence space is enormous and easily exceeds the number of structures which could have possibly existed in the history of life on Earth. Finally, the pervasiveness of extraordinary stability in designed proteins challenges accepted models for the "marginal stability" of natural proteins, suggesting that there must be a selection pressure against highly stable proteins. This can be explained using recent theories which relate non-equilibrium thermodynamics and self-replication. This article is part of a Special Issue entitled Biodesign for Bioenergetics--The design and engineering of electronc transfer cofactors, proteins and protein networks, edited by Ronald L. Koder and J.L. Ross Anderson.
Collapse
Affiliation(s)
- Joseph M Brisendine
- Department of Physics, The City College of New York, New York, NY 10031, United States; The Graduate Program in Biochemistry, The Graduate Center of CUNY, New York, NY 10016, United States
| | - Ronald L Koder
- Department of Physics, The City College of New York, New York, NY 10031, United States; Graduate Programs of Physics, Chemistry and Biochemistry, The Graduate Center of CUNY, New York, NY 10016, United States.
| |
Collapse
|
16
|
Leelananda SP, Jernigan RL, Kloczkowski A. Predicting Designability of Small Proteins from Graph Features of Contact Maps. J Comput Biol 2016; 23:400-11. [PMID: 27159634 PMCID: PMC4876523 DOI: 10.1089/cmb.2015.0209] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Highly designable structures can be distinguished based on certain geometric graphical features of the interactions, confirming the fact that the topology of a protein structure and its residue-residue interaction network are important determinants of its designability. The most designable structures and least designable structures obtained for sets of proteins having the same number of residues are compared. It is shown that the most designable structures predicted by the graph features of the contact diagrams are more densely packed, whereas the poorly designable structures are more open structures or structures that are loosely packed. Interestingly enough, it can also be seen that the highly designable identified are also common structural motifs found in nature.
Collapse
Affiliation(s)
| | - Robert L. Jernigan
- Iowa State University, Ames, Iowa
- Baker Center for Bioinformatics and Biological Statistics, Ames, Iowa
| | - Andrzej Kloczkowski
- Nationwide Children's Hospital, Columbus, Ohio
- The Ohio State University, Columbus, Ohio
| |
Collapse
|
17
|
Chino M, Maglio O, Nastri F, Pavone V, DeGrado WF, Lombardi A. Artificial Diiron Enzymes with a De Novo Designed Four-Helix Bundle Structure. Eur J Inorg Chem 2015; 2015:3371-3390. [PMID: 27630532 PMCID: PMC5019575 DOI: 10.1002/ejic.201500470] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2015] [Indexed: 12/26/2022]
Abstract
A single polypeptide chain may provide an astronomical number of conformers. Nature selected only a trivial number of them through evolution, composing an alphabet of scaffolds, that can afford the complete set of chemical reactions needed to support life. These structural templates are so stable that they allow several mutations without disruption of the global folding, even having the ability to bind several exogenous cofactors. With this perspective, metal cofactors play a crucial role in the regulation and catalysis of several processes. Nature is able to modulate the chemistry of metals, adopting only a few ligands and slightly different geometries. Several scaffolds and metal-binding motifs are representing the focus of intense interest in the literature. This review discusses the widespread four-helix bundle fold, adopted as a scaffold for metal binding sites in the context of de novo protein design to obtain basic biochemical components for biosensing or catalysis. In particular, we describe the rational refinement of structure/function in diiron-oxo protein models from the due ferri (DF) family. The DF proteins were developed by us through an iterative process of design and rigorous characterization, which has allowed a shift from structural to functional models. The examples reported herein demonstrate the importance of the synergic application of de novo design methods as well as spectroscopic and structural characterization to optimize the catalytic performance of artificial enzymes.
Collapse
Affiliation(s)
- Marco Chino
- Department of Chemical Sciences, University of Naples “Federico II”, Via Cintia, 80126 Naples, Italy
| | - Ornella Maglio
- Department of Chemical Sciences, University of Naples “Federico II”, Via Cintia, 80126 Naples, Italy
- IBB, CNR, Via Mezzocannone 16, 80134 Naples, Italy
| | - Flavia Nastri
- Department of Chemical Sciences, University of Naples “Federico II”, Via Cintia, 80126 Naples, Italy
| | - Vincenzo Pavone
- Department of Structural and Functional Biology, University of Naples “Federico II”, Via Cintia, 80126 Naples, Italy
| | - William F. DeGrado
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco San Francisco, CA 94158, USA
| | - Angela Lombardi
- Department of Chemical Sciences, University of Naples “Federico II”, Via Cintia, 80126 Naples, Italy
| |
Collapse
|
18
|
Abstract
The observation of a limited secondary-structural alphabet in native proteins, with significant sequence preferences, has profoundly influenced the fields of protein design and structure prediction (Simons, Kooperberg, Huang, & Baker, 1997; Verschueren et al., 2011). In the era of structural genomics, as the size of the structural dataset continues to grow rapidly, it is becoming possible to extend this analysis to tertiary structural motifs and their sequences. For a hypothetical tertiary motif, the rate of its utilization in natural proteins may be used to assess its designability-the ease with which the motif can be realized with natural amino acids. This requires a structural similarity search methodology, which rather than looking for global topological agreement (more appropriate for categorization of full proteins or domains), identifies detailed geometric matches. In this chapter, we introduce such a method, called MaDCaT, and demonstrate its use by assessing the designability landscapes of two tertiary structural motifs. We also show that such analysis can establish structure/sequence links by providing the sequence constraints necessary to encode designable motifs. As logical extension of their secondary-structure counterparts, tertiary structural preferences will likely prove extremely useful in de novo protein design and structure prediction.
Collapse
Affiliation(s)
- Jian Zhang
- Department of Computer Science, Dartmouth College, Fax: 603-646-1672, 6211 Sudikoff Lab, Room 210, Hanover, NH 03755-3510, USA
| | - Gevorg Grigoryan
- Adjunct Professor of Biology, Dartmouth College, Phone: 603-646-3173, Fax: 603-646-1672, 6211 Sudikoff Lab, Room 113, Hanover, NH 03755-3510, USA
| |
Collapse
|
19
|
Burke S, Elber R. Super folds, networks, and barriers. Proteins 2012; 80:463-70. [PMID: 22095563 PMCID: PMC3290721 DOI: 10.1002/prot.23212] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2011] [Revised: 08/31/2011] [Accepted: 09/22/2011] [Indexed: 11/06/2022]
Abstract
Exhaustive enumeration of sequences and folds is conducted for a simple lattice model of conformations, sequences, and energies. Examination of all foldable sequences and their nearest connected neighbors (sequences that differ by no more than a point mutation) illustrates the following: (i) There exist unusually large number of sequences that fold into a few structures (super-folds). The same observation was made experimentally and computationally using stochastic sampling and exhaustive enumeration of related models. (ii) There exist only a few large networks of connected sequences that are not restricted to one fold. These networks cover a significant fraction of fold spaces (super-networks). (iii) There exist barriers in sequence space that prevent foldable sequences of the same structure to "connect" through a series of single point mutations (super-barrier), even in the presence of the sequence connection between folds. While there is ample experimental evidence for the existence of super-folds, evidence for a super-network is just starting to emerge. The prediction of a sequence barrier is an intriguing characteristic of sequence space, suggesting that the overall sequence space may be disconnected. The implications and limitations of these observations for evolution of protein structures are discussed.
Collapse
Affiliation(s)
- Sean Burke
- Institute for Computational Engineering and Sciences, University of Texas at Austin, Austin TX 78712
| | - Ron Elber
- Institute for Computational Engineering and Sciences, University of Texas at Austin, Austin TX 78712
- Department of Chemistry and Biochemistry, University of Texas at Austin, Austin TX 78712
| |
Collapse
|
20
|
Leelananda SP, Towfic F, Jernigan RL, Kloczkowski A. Exploration of the relationship between topology and designability of conformations. J Chem Phys 2011; 134:235101. [PMID: 21702580 DOI: 10.1063/1.3596947] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Protein structures are evolutionarily more conserved than sequences, and sequences with very low sequence identity frequently share the same fold. This leads to the concept of protein designability. Some folds are more designable and lots of sequences can assume that fold. Elucidating the relationship between protein sequence and the three-dimensional (3D) structure that the sequence folds into is an important problem in computational structural biology. Lattice models have been utilized in numerous studies to model protein folds and predict the designability of certain folds. In this study, all possible compact conformations within a set of two-dimensional and 3D lattice spaces are explored. Complementary interaction graphs are then generated for each conformation and are described using a set of graph features. The full HP sequence space for each lattice model is generated and contact energies are calculated by threading each sequence onto all the possible conformations. Unique conformation giving minimum energy is identified for each sequence and the number of sequences folding to each conformation (designability) is obtained. Machine learning algorithms are used to predict the designability of each conformation. We find that the highly designable structures can be distinguished from other non-designable conformations based on certain graphical geometric features of the interactions. This finding confirms the fact that the topology of a conformation is an important determinant of the extent of its designability and suggests that the interactions themselves are important for determining the designability.
Collapse
Affiliation(s)
- Sumudu P Leelananda
- L. H. Baker Center for Bioinformatics and Biological Statistics, Iowa State University, Ames, Iowa 50010, USA
| | | | | | | |
Collapse
|
21
|
Saunders R, Mann M, Deane CM. Signatures of co-translational folding. Biotechnol J 2011; 6:742-51. [DOI: 10.1002/biot.201000330] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2010] [Revised: 03/01/2011] [Accepted: 03/03/2011] [Indexed: 12/11/2022]
|
22
|
Goldstein M, Fredj E, Gerber RB. A new hybrid algorithm for finding the lowest minima of potential surfaces: Approach and application to peptides. J Comput Chem 2011; 32:1785-800. [DOI: 10.1002/jcc.21755] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2010] [Revised: 11/13/2010] [Accepted: 12/18/2010] [Indexed: 11/11/2022]
|
23
|
Grigoryan G, Degrado WF. Probing designability via a generalized model of helical bundle geometry. J Mol Biol 2010; 405:1079-100. [PMID: 20932976 DOI: 10.1016/j.jmb.2010.08.058] [Citation(s) in RCA: 171] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2010] [Revised: 08/26/2010] [Accepted: 08/31/2010] [Indexed: 10/19/2022]
Abstract
Because the space of folded protein structures is highly degenerate, with recurring secondary and tertiary motifs, methods for representing protein structure in terms of collective physically relevant coordinates are of great interest. By collapsing structural diversity to a handful of parameters, such methods can be used to delineate the space of designable structures (i.e., conformations that can be stabilized with a large number of sequences)-a crucial task for de novo protein design. We first demonstrate this on natural α-helical coiled coils using the Crick parameterization. We show that over 95% of known coiled-coil structures are within 1-Å C(α) root mean square deviation of a Crick-ideal backbone. Derived parameters show that natural geometric space of coiled coils is highly restricted and can be represented by "allowed" conformations amidst a potential continuum of conformers. Allowed structures have (1) restricted axial offsets between helices, which differ starkly between parallel and anti-parallel structures; (2) preferred superhelical radii, which depend linearly on the oligomerization state; (3) pronounced radius-dependent a- and d-position amino acid propensities; and (4) discrete angles of rotation of helices about their axes, which are surprisingly independent of oligomerization state or orientation. In all, we estimate the space of designable coiled-coil structures to be reduced at least 160-fold relative to the space of geometrically feasible structures. To extend the benefits of structural parameterization to other systems, we developed a general mathematical framework for parameterizing arbitrary helical structures, which reduces to the Crick parameterization as a special case. The method is successfully validated on a set of non-coiled-coil helical bundles, frequent in channels and transporter proteins, which show significant helix bending but not supercoiling. Programs for coiled-coil parameter fitting and structure generation are provided via a web interface at http://www.gevorggrigoryan.com/cccp/, and code for generalized helical parameterization is available upon request.
Collapse
Affiliation(s)
- Gevorg Grigoryan
- Department of Biochemistry, School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | | |
Collapse
|
24
|
Trifonov EN, Frenkel ZM. Evolution of protein modularity. Curr Opin Struct Biol 2009; 19:335-40. [PMID: 19386484 DOI: 10.1016/j.sbi.2009.03.007] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2009] [Accepted: 03/16/2009] [Indexed: 10/20/2022]
Abstract
Proteins in their evolution appear to follow several discrete stages, which is reflected in their modular organization. The sequences of the protein modules are highly variable while their functions and structures are rather conserved. The relatedness of the variable sequences is well represented by the networks in natural protein sequence space that also suggests evolutionary connections.
Collapse
Affiliation(s)
- Edward N Trifonov
- Genome Diversity Center, Institute of Evolution, University of Haifa, Haifa 31905, Israel.
| | | |
Collapse
|
25
|
Franzosa E, Xia Y. Structural Perspectives on Protein Evolution. ANNUAL REPORTS IN COMPUTATIONAL CHEMISTRY 2008. [DOI: 10.1016/s1574-1400(08)00001-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
26
|
Armstrong KA, Tidor B. Computationally mapping sequence space to understand evolutionary protein engineering. Biotechnol Prog 2007; 24:62-73. [PMID: 18020358 DOI: 10.1021/bp070134h] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Evolutionary protein engineering has been dramatically successful, producing a wide variety of new proteins with altered stability, binding affinity, and enzymatic activity. However, the success of such procedures is often unreliable, and the impact of the choice of protein, engineering goal, and evolutionary procedure is not well understood. We have created a framework for understanding aspects of the protein engineering process by computationally mapping regions of feasible sequence space for three small proteins using structure-based design protocols. We then tested the ability of different evolutionary search strategies to explore these sequence spaces. The results point to a non-intuitive relationship between the error-prone PCR mutation rate and the number of rounds of replication. The evolutionary relationships among feasible sequences reveal hub-like sequences that serve as particularly fruitful starting sequences for evolutionary search. Moreover, genetic recombination procedures were examined, and tradeoffs relating sequence diversity and search efficiency were identified. This framework allows us to consider the impact of protein structure on the allowed sequence space and therefore on the challenges that each protein presents to error-prone PCR and genetic recombination procedures.
Collapse
Affiliation(s)
- Kathryn A Armstrong
- Computer Science and Artificial Intelligence Laboratory, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139-4307, USA
| | | |
Collapse
|
27
|
Peto M, Kloczkowski A, Jernigan RL. Shape-dependent designability studies of lattice proteins. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2007; 19:285220-285230. [PMID: 18079979 PMCID: PMC2134837 DOI: 10.1088/0953-8984/19/28/285220] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
One important problem in computational structural biology is protein designability, that is, why protein sequences are not random strings of amino acids but instead show regular patterns that encode protein structures. Many previous studies that have attempted to solve the problem have relied upon reduced models of proteins. In particular, the 2D square and the 3D cubic lattices together with reduced amino acid alphabet models have been examined extensively and have lead to interesting results that shed some light on evolutionary relationship among proteins. Here we perform designability studies on the 2D square lattice and explore the effects of variable overall shapes on protein designability using a binary hydrophobic-polar (HP) amino acid alphabet. Because we rely on a simple energy function that counts the total number of H-H interactions between non-sequential residues, we restrict our studies to protein shapes that have the same number of residues and also a constant number of non-bonded contacts. We have found that there is a marked difference in the designability between various protein shapes, with some of them accounting for a significantly larger share of the total foldable sequences.
Collapse
Affiliation(s)
- Myron Peto
- Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011-3020
| | | | | |
Collapse
|
28
|
Dias CL, Grant M. Designable structures are easy to unfold. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2006; 74:042902. [PMID: 17155116 DOI: 10.1103/physreve.74.042902] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2005] [Indexed: 05/12/2023]
Abstract
We study the structural stability of models of proteins for which the selected folds are unusually stable to mutation, that is, designable. A two-dimensional hydrophobic-polar lattice model was used to determine designable folds and these folds were investigated through Langevin dynamics. We find that the phase diagram of these proteins depends on their designability. In particular, highly designable folds are found to be weaker, i.e., easier to unfold, than low designable ones. We expect this to be related to protein flexibility.
Collapse
Affiliation(s)
- Cristiano L Dias
- Physics Department, Rutherford Building, McGill University, 3600 rue University, Montréal, Québec H3A 2T8, Canada
| | | |
Collapse
|
29
|
Aynechi T, Kuntz ID. An information theoretic approach to macromolecular modeling: I. Sequence alignments. Biophys J 2005; 89:2998-3007. [PMID: 16254389 PMCID: PMC1366797 DOI: 10.1529/biophysj.104.054072] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2004] [Accepted: 08/15/2005] [Indexed: 11/18/2022] Open
Abstract
We are interested in applying the principles of information theory to structural biology calculations. In this article, we explore the information content of an important computational procedure: sequence alignment. Using a reference state developed from exhaustive sequences, we measure alignment statistics and evaluate gap penalties based on first-principle considerations and gap distributions. We show that there are different gap penalties for different alphabet sizes and that the gap penalties can depend on the length of the sequences being aligned. In a companion article, we examine the information content of molecular force fields.
Collapse
Affiliation(s)
- Tiba Aynechi
- Graduate Group in Biophysics, and Department of Pharmaceutical Chemistry, University of California-San Francisco, San Francisco, CA 94143, USA
| | | |
Collapse
|
30
|
Abstract
Naturally occurring proteins comprise a special subset of all plausible sequences and structures selected through evolution. Simulating protein evolution with simplified and all-atom models has shed light on the evolutionary dynamics of protein populations, the nature of evolved sequences and structures, and the extent to which today's proteins are shaped by selection pressures on folding, structure and function. Extensive mapping of the native structure, stability and folding rate in sequence space using lattice proteins has revealed organizational principles of the sequence/structure map important for evolutionary dynamics. Evolutionary simulations with lattice proteins have highlighted the importance of fitness landscapes, evolutionary mechanisms, population dynamics and sequence space entropy in shaping the generic properties of proteins. Finally, evolutionary-like simulations with all-atom models, in particular computational protein design, have helped identify the dominant selection pressures on naturally occurring protein sequences and structures.
Collapse
Affiliation(s)
- Yu Xia
- Department of Molecular Biophysics and Biochemistry, Yale University, 266 Whitney Avenue, New Haven, CT 06520, USA
| | | |
Collapse
|
31
|
Porto M, Roman HE, Vendruscolo M, Bastolla U. Prediction of site-specific amino acid distributions and limits of divergent evolutionary changes in protein sequences. Mol Biol Evol 2004; 22:630-8. [PMID: 15537801 DOI: 10.1093/molbev/msi048] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We derive an analytic expression for site-specific stationary distributions of amino acids from the structurally constrained neutral (SCN) model of protein evolution with conservation of folding stability. The stationary distributions that we obtain have a Boltzmann-like shape, and their effective temperature parameter, measuring the limit of divergent evolutionary changes at a given site, can be predicted from a site-specific topological property, the principal eigenvector of the contact matrix of the native conformation of the protein. These analytic results, obtained without free parameters, are compared with simulations of the SCN model and with the site-specific amino acid distributions obtained from the Protein Data Bank. These results also provide new insights into how the topology of a protein fold influences its designability, i.e., the number of sequences compatible with that fold. The dependence of the effective temperature on the principal eigenvector decreases for longer proteins, as a possible consequence of the fact that selection for thermodynamic stability becomes weaker in this case.
Collapse
Affiliation(s)
- Markus Porto
- Institut für Festkörperphysik, Technische Universität Darmstadt, Hochschulstr. 8, 64289 Darmstadt, Germany.
| | | | | | | |
Collapse
|
32
|
Li ZR, Han X, Liu GR. Protein designability analysis in sequence principal component space using 2D lattice model. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2004; 76:21-29. [PMID: 15313539 DOI: 10.1016/j.cmpb.2004.04.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2003] [Revised: 04/14/2004] [Accepted: 04/14/2004] [Indexed: 05/24/2023]
Abstract
The number of proteins that fold into a certain structure differs drastically. The designability of a protein structure, which is defined as the number of sequences that have that structure as their unique lowest energy state, is studied in this paper using a simplified lattice model. The two-letter (HP) code and the pair-contact energy model are employed in the formulation of the relationship between the protein sequences and the compact structures. Due to the correlations between different dimensions, principal component analysis (PCA) is carried out to remove these correlations and develop reliable approximations of probability density functions of the protein sequences and the compact structures. An estimation of designability is derived using these probability density functions. Good correlation between estimated designabilities and those obtained through enumerative calculations is successfully achieved.
Collapse
Affiliation(s)
- Z R Li
- Department of Mechanical Engineering, Centre for Advanced Computations in Engineering Science, National University of Singapore, 10 Kent Ridge Crescent, Singapore 119260, Singapore.
| | | | | |
Collapse
|
33
|
Carothers JM, Oestreich SC, Davis JH, Szostak JW. Informational complexity and functional activity of RNA structures. J Am Chem Soc 2004; 126:5130-7. [PMID: 15099096 PMCID: PMC5042360 DOI: 10.1021/ja031504a] [Citation(s) in RCA: 157] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Very little is known about the distribution of functional DNA, RNA, and protein molecules in sequence space. The question of how the number and complexity of distinct solutions to a particular biochemical problem varies with activity is an important aspect of this general problem. Here we present a comparison of the structures and activities of eleven distinct GTP-binding RNAs (aptamers). By experimentally measuring the amount of information required to specify each optimal binding structure, we show that defining a structure capable of 10-fold tighter binding requires approximately 10 additional bits of information. This increase in information content is equivalent to specifying the identity of five additional nucleotide positions and corresponds to an approximately 1000-fold decrease in abundance in a sample of random sequences. We observe a similar relationship between structural complexity and activity in a comparison of two catalytic RNAs (ribozyme ligases), raising the possibility of a general relationship between the complexity of RNA structures and their functional activity. Describing how information varies with activity in other heteropolymers, both biological and synthetic, may lead to an objective means of comparing their functional properties. This approach could be useful in predicting the functional utility of novel heteropolymers.
Collapse
Affiliation(s)
- James M Carothers
- Howard Hughes Medical Institute and Department of Molecular Biology, Massachusetts General Hospital, Boston, MA, 02114 USA
| | | | | | | |
Collapse
|
34
|
|
35
|
Yahyanejad M, Kardar M, Tang C. Structure space of model proteins: A principal component analysis. J Chem Phys 2003. [DOI: 10.1063/1.1541611] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
36
|
Larson SM, England JL, Desjarlais JR, Pande VS. Thoroughly sampling sequence space: large-scale protein design of structural ensembles. Protein Sci 2002; 11:2804-13. [PMID: 12441379 PMCID: PMC2373757 DOI: 10.1110/ps.0203902] [Citation(s) in RCA: 82] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2002] [Revised: 08/16/2002] [Accepted: 09/04/2002] [Indexed: 10/27/2022]
Abstract
Modeling the inherent flexibility of the protein backbone as part of computational protein design is necessary to capture the behavior of real proteins and is a prerequisite for the accurate exploration of protein sequence space. We present the results of a broad exploration of sequence space, with backbone flexibility, through a novel approach: large-scale protein design to structural ensembles. A distributed computing architecture has allowed us to generate hundreds of thousands of diverse sequences for a set of 253 naturally occurring proteins, allowing exciting insights into the nature of protein sequence space. Designing to a structural ensemble produces a much greater diversity of sequences than previous studies have reported, and homology searches using profiles derived from the designed sequences against the Protein Data Bank show that the relevance and quality of the sequences is not diminished. The designed sequences have greater overall diversity than corresponding natural sequence alignments, and no direct correlations are seen between the diversity of natural sequence alignments and the diversity of the corresponding designed sequences. For structures in the same fold, the sequence entropies of the designed sequences cluster together tightly. This tight clustering of sequence entropies within a fold and the separation of sequence entropy distributions for different folds suggest that the diversity of designed sequences is primarily determined by a structure's overall fold, and that the designability principle postulated from studies of simple models holds in real proteins. This has important implications for experimental protein design and engineering, as well as providing insight into protein evolution.
Collapse
Affiliation(s)
- Stefan M Larson
- Chemistry Department and Biophysics Program, Stanford University, California 94305, USA
| | | | | | | |
Collapse
|
37
|
Matsuura T, Ernst A, Plückthun A. Construction and characterization of protein libraries composed of secondary structure modules. Protein Sci 2002; 11:2631-43. [PMID: 12381846 PMCID: PMC2373733 DOI: 10.1110/ps.0215102] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Only a minute fraction of all possible protein sequences can exist in the genomes of all life forms. To explore whether physicochemical constraints or a lack of need causes the paucity of different protein folds, we set out to construct protein libraries without any restriction of topology. We generated different libraries (all alpha-helix, all beta-strand, and alpha-helix plus beta-strand) with an average length of 100 amino acid residues, composed of designed secondary structure modules (alpha-helix, beta-strand, and beta-turn) in various proportions, based primarily on the patterning of polar and nonpolar residues. We wished to explore that part of sequence space that is rich in secondary structure. The analysis of randomly chosen clones from each of the libraries showed that, despite the low sequence homology to known protein sequences, a substantial proportion of the library members containing alpha-helix modules were indeed helical, possess a defined oligomerization state, and showed cooperative chemical unfolding behavior. On the other hand, proteins composed of mainly beta-strand modules tended to form amyloid-like fibrils and were among the least soluble proteins ever reported. We found that a large fraction of members in non-beta-strand-containing protein libraries that are distant from natural proteins in sequence space possess unexpectedly favorable properties. These results reinforce the efficacy of applying binary patterning to design proteins with native-like properties despite lack of restriction in topology. Because of the intrinsic tendency of beta-strand modules to aggregate, their presence requires precise topologic arrangement to prevent fibril formation.
Collapse
Affiliation(s)
- Tomoaki Matsuura
- Biochemisches Institut, Universität Zürich, Winterthurerstr 190, CH 8057, Switzerland
| | | | | |
Collapse
|
38
|
Cejtin H, Edler J, Gottlieb A, Helling R, Li H, Philbin J, Wingreen N, Tang C. Fast tree search for enumeration of a lattice model of protein folding. J Chem Phys 2002. [DOI: 10.1063/1.1423324] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|