1
|
Gizzio J, Thakur A, Haldane A, Post CB, Levy RM. Evolutionary sequence and structural basis for the distinct conformational landscapes of Tyr and Ser/Thr kinases. Nat Commun 2024; 15:6545. [PMID: 39095350 PMCID: PMC11297160 DOI: 10.1038/s41467-024-50812-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Accepted: 07/22/2024] [Indexed: 08/04/2024] Open
Abstract
Protein kinases are molecular machines with rich sequence variation that distinguishes the two main evolutionary branches - tyrosine kinases (TKs) from serine/threonine kinases (STKs). Using a sequence co-variation Potts statistical energy model we previously concluded that TK catalytic domains are more likely than STKs to adopt an inactive conformation with the activation loop in an autoinhibitory folded conformation, due to intrinsic sequence effects. Here we investigate the structural basis for this phenomenon by integrating the sequence-based model with structure-based molecular dynamics (MD) to determine the effects of mutations on the free energy difference between active and inactive conformations, using a thermodynamic cycle involving many (n = 108) protein-mutation free energy perturbation (FEP) simulations in the active and inactive conformations. The sequence and structure-based results are consistent and support the hypothesis that the inactive conformation DFG-out Activation Loop Folded, is a functional regulatory state that has been stabilized in TKs relative to STKs over the course of their evolution via the accumulation of residue substitutions in the activation loop and catalytic loop that facilitate distinct substrate binding modes in trans and additional modes of regulation in cis for TKs.
Collapse
Affiliation(s)
- Joan Gizzio
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA, USA
- Department of Chemistry, Temple University, Philadelphia, PA, USA
| | - Abhishek Thakur
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA, USA
- Department of Chemistry, Temple University, Philadelphia, PA, USA
| | - Allan Haldane
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA, USA
- Department of Physics, Temple University, Philadelphia, PA, USA
| | - Carol Beth Post
- Borch Department of Medicinal Chemistry and Molecular Pharmacology, Purdue University, West Lafayette, IN, USA
| | - Ronald M Levy
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA, USA.
- Department of Chemistry, Temple University, Philadelphia, PA, USA.
| |
Collapse
|
2
|
Kinshuk S, Li L, Meckes B, Chan CTY. Sequence-Based Protein Design: A Review of Using Statistical Models to Characterize Coevolutionary Traits for Developing Hybrid Proteins as Genetic Sensors. Int J Mol Sci 2024; 25:8320. [PMID: 39125888 PMCID: PMC11312098 DOI: 10.3390/ijms25158320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 07/23/2024] [Accepted: 07/26/2024] [Indexed: 08/12/2024] Open
Abstract
Statistical analyses of homologous protein sequences can identify amino acid residue positions that co-evolve to generate family members with different properties. Based on the hypothesis that the coevolution of residue positions is necessary for maintaining protein structure, coevolutionary traits revealed by statistical models provide insight into residue-residue interactions that are important for understanding protein mechanisms at the molecular level. With the rapid expansion of genome sequencing databases that facilitate statistical analyses, this sequence-based approach has been used to study a broad range of protein families. An emerging application of this approach is to design hybrid transcriptional regulators as modular genetic sensors for novel wiring between input signals and genetic elements to control outputs. Among many allosterically regulated regulator families, the members contain structurally conserved and functionally independent protein domains, including a DNA-binding module (DBM) for interacting with a specific genetic element and a ligand-binding module (LBM) for sensing an input signal. By hybridizing a DBM and an LBM from two different family members, a hybrid regulator can be created with a new combination of signal-detection and DNA-recognition properties not present in natural systems. In this review, we present recent advances in the development of hybrid regulators and their applications in cellular engineering, especially focusing on the use of statistical analyses for characterizing DBM-LBM interactions and hybrid regulator design. Based on these studies, we then discuss the current limitations and potential directions for enhancing the impact of this sequence-based design approach.
Collapse
Affiliation(s)
- Sahaj Kinshuk
- Department of Biomedical Engineering, College of Engineering, University of North Texas, 3940 N Elm Street, Denton, TX 76207, USA; (S.K.); (L.L.); (B.M.)
| | - Lin Li
- Department of Biomedical Engineering, College of Engineering, University of North Texas, 3940 N Elm Street, Denton, TX 76207, USA; (S.K.); (L.L.); (B.M.)
| | - Brian Meckes
- Department of Biomedical Engineering, College of Engineering, University of North Texas, 3940 N Elm Street, Denton, TX 76207, USA; (S.K.); (L.L.); (B.M.)
- BioDiscovery Institute, University of North Texas, 1155 Union Circle #305220, Denton, TX 76203, USA
| | - Clement T. Y. Chan
- Department of Biomedical Engineering, College of Engineering, University of North Texas, 3940 N Elm Street, Denton, TX 76207, USA; (S.K.); (L.L.); (B.M.)
- BioDiscovery Institute, University of North Texas, 1155 Union Circle #305220, Denton, TX 76203, USA
| |
Collapse
|
3
|
Martin J, Lequerica Mateos M, Onuchic JN, Coluzza I, Morcos F. Machine learning in biological physics: From biomolecular prediction to design. Proc Natl Acad Sci U S A 2024; 121:e2311807121. [PMID: 38913893 PMCID: PMC11228481 DOI: 10.1073/pnas.2311807121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/26/2024] Open
Abstract
Machine learning has been proposed as an alternative to theoretical modeling when dealing with complex problems in biological physics. However, in this perspective, we argue that a more successful approach is a proper combination of these two methodologies. We discuss how ideas coming from physical modeling neuronal processing led to early formulations of computational neural networks, e.g., Hopfield networks. We then show how modern learning approaches like Potts models, Boltzmann machines, and the transformer architecture are related to each other, specifically, through a shared energy representation. We summarize recent efforts to establish these connections and provide examples on how each of these formulations integrating physical modeling and machine learning have been successful in tackling recent problems in biomolecular structure, dynamics, function, evolution, and design. Instances include protein structure prediction; improvement in computational complexity and accuracy of molecular dynamics simulations; better inference of the effects of mutations in proteins leading to improved evolutionary modeling and finally how machine learning is revolutionizing protein engineering and design. Going beyond naturally existing protein sequences, a connection to protein design is discussed where synthetic sequences are able to fold to naturally occurring motifs driven by a model rooted in physical principles. We show that this model is "learnable" and propose its future use in the generation of unique sequences that can fold into a target structure.
Collapse
Affiliation(s)
- Jonathan Martin
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
| | - Marcos Lequerica Mateos
- BCMaterials, Basque Center for Materials, Applications and Nanostructures, Universidad del País Vasco/Euskal Herriko Unibertsitatea Science Park, Leioa48940, Spain
| | - José N. Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX77005
- Department of Physics and Astronomy, Rice University, Houston, TX77005
- Department of Chemistry, Rice University, Houston, TX77005
- Department of BioSciences, Rice University, Houston, TX77005
| | - Ivan Coluzza
- BCMaterials, Basque Center for Materials, Applications and Nanostructures, Universidad del País Vasco/Euskal Herriko Unibertsitatea Science Park, Leioa48940, Spain
- Basque Foundation for Science, Ikerbasque, Bilbao48940, Spain
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
- Department of Bioengineering, Center for Systems Biology, University of Texas at Dallas, Richardson, TX75080
| |
Collapse
|
4
|
Fram B, Su Y, Truebridge I, Riesselman AJ, Ingraham JB, Passera A, Napier E, Thadani NN, Lim S, Roberts K, Kaur G, Stiffler MA, Marks DS, Bahl CD, Khan AR, Sander C, Gauthier NP. Simultaneous enhancement of multiple functional properties using evolution-informed protein design. Nat Commun 2024; 15:5141. [PMID: 38902262 PMCID: PMC11190266 DOI: 10.1038/s41467-024-49119-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 05/24/2024] [Indexed: 06/22/2024] Open
Abstract
A major challenge in protein design is to augment existing functional proteins with multiple property enhancements. Altering several properties likely necessitates numerous primary sequence changes, and novel methods are needed to accurately predict combinations of mutations that maintain or enhance function. Models of sequence co-variation (e.g., EVcouplings), which leverage extensive information about various protein properties and activities from homologous protein sequences, have proven effective for many applications including structure determination and mutation effect prediction. We apply EVcouplings to computationally design variants of the model protein TEM-1 β-lactamase. Nearly all the 14 experimentally characterized designs were functional, including one with 84 mutations from the nearest natural homolog. The designs also had large increases in thermostability, increased activity on multiple substrates, and nearly identical structure to the wild type enzyme. This study highlights the efficacy of evolutionary models in guiding large sequence alterations to generate functional diversity for protein design applications.
Collapse
Affiliation(s)
- Benjamin Fram
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
| | - Yang Su
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Ian Truebridge
- Institute for Protein Innovation, Boston, MA, USA
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- AI Proteins, Boston, MA, USA
| | - Adam J Riesselman
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Program in Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - John B Ingraham
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Alessandro Passera
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Campus-Vienna-Biocenter 1, 1030, Vienna, Austria
| | - Eve Napier
- School of Biochemistry and Immunology, Trinity College Dublin, Dublin 2, Ireland
| | - Nicole N Thadani
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Apriori Bio, Cambridge, MA, USA
| | - Samuel Lim
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Kristen Roberts
- Selux Diagnostics Inc., 56 Roland Street, Charlestown, MA, USA
| | - Gurleen Kaur
- Selux Diagnostics Inc., 56 Roland Street, Charlestown, MA, USA
| | - Michael A Stiffler
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Dyno Therapeutics, 343 Arsenal Street, Watertown, MA, USA
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Christopher D Bahl
- Institute for Protein Innovation, Boston, MA, USA
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- AI Proteins, Boston, MA, USA
| | - Amir R Khan
- School of Biochemistry and Immunology, Trinity College Dublin, Dublin 2, Ireland
- Division of Newborn Medicine, Boston Children's Hospital, Boston, MA, USA
| | - Chris Sander
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Nicholas P Gauthier
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
5
|
He J, Wu W, Wang X. DIProT: A deep learning based interactive toolkit for efficient and effective Protein design. Synth Syst Biotechnol 2024; 9:217-222. [PMID: 38385151 PMCID: PMC10876589 DOI: 10.1016/j.synbio.2024.01.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 01/02/2024] [Accepted: 01/30/2024] [Indexed: 02/23/2024] Open
Abstract
The protein inverse folding problem, designing amino acid sequences that fold into desired protein structures, is a critical challenge in biological sciences. Despite numerous data-driven and knowledge-driven methods, there remains a need for a user-friendly toolkit that effectively integrates these approaches for in-silico protein design. In this paper, we present DIProT, an interactive protein design toolkit. DIProT leverages a non-autoregressive deep generative model to solve the inverse folding problem, combined with a protein structure prediction model. This integration allows users to incorporate prior knowledge into the design process, evaluate designs in silico, and form a virtual design loop with human feedback. Our inverse folding model demonstrates competitive performance in terms of effectiveness and efficiency on TS50 and CATH4.2 datasets, with promising sequence recovery and inference time. Case studies further illustrate how DIProT can facilitate user-guided protein design.
Collapse
Affiliation(s)
| | | | - Xiaowo Wang
- Ministry of Education Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Bioinformatics Division, Beijing National Research Center for Information Science and Technology, Department of Automation, Tsinghua University, Beijing, China
| |
Collapse
|
6
|
Sánchez IE, Galpern EA, Ferreiro DU. Solvent constraints for biopolymer folding and evolution in extraterrestrial environments. Proc Natl Acad Sci U S A 2024; 121:e2318905121. [PMID: 38739787 PMCID: PMC11127021 DOI: 10.1073/pnas.2318905121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Accepted: 04/16/2024] [Indexed: 05/16/2024] Open
Abstract
We propose that spontaneous folding and molecular evolution of biopolymers are two universal aspects that must concur for life to happen. These aspects are fundamentally related to the chemical composition of biopolymers and crucially depend on the solvent in which they are embedded. We show that molecular information theory and energy landscape theory allow us to explore the limits that solvents impose on biopolymer existence. We consider 54 solvents, including water, alcohols, hydrocarbons, halogenated solvents, aromatic solvents, and low molecular weight substances made up of elements abundant in the universe, which may potentially take part in alternative biochemistries. We find that along with water, there are many solvents for which the liquid regime is compatible with biopolymer folding and evolution. We present a ranking of the solvents in terms of biopolymer compatibility. Many of these solvents have been found in molecular clouds or may be expected to occur in extrasolar planets.
Collapse
Affiliation(s)
- Ignacio E. Sánchez
- Laboratorio de Fisiología de Proteínas, Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos AiresCP1428, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales, Buenos AiresCP1428, Argentina
| | - Ezequiel A. Galpern
- Laboratorio de Fisiología de Proteínas, Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos AiresCP1428, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales, Buenos AiresCP1428, Argentina
| | - Diego U. Ferreiro
- Laboratorio de Fisiología de Proteínas, Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos AiresCP1428, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales, Buenos AiresCP1428, Argentina
| |
Collapse
|
7
|
Jaafari H, Bueno C, Schafer NP, Martin J, Morcos F, Wolynes PG. The physical and evolutionary energy landscapes of devolved protein sequences corresponding to pseudogenes. Proc Natl Acad Sci U S A 2024; 121:e2322428121. [PMID: 38739795 PMCID: PMC11127006 DOI: 10.1073/pnas.2322428121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 03/26/2024] [Indexed: 05/16/2024] Open
Abstract
Protein evolution is guided by structural, functional, and dynamical constraints ensuring organismal viability. Pseudogenes are genomic sequences identified in many eukaryotes that lack translational activity due to sequence degradation and thus over time have undergone "devolution." Previously pseudogenized genes sometimes regain their protein-coding function, suggesting they may still encode robust folding energy landscapes despite multiple mutations. We study both the physical folding landscapes of protein sequences corresponding to human pseudogenes using the Associative Memory, Water Mediated, Structure and Energy Model, and the evolutionary energy landscapes obtained using direct coupling analysis (DCA) on their parent protein families. We found that generally mutations that have occurred in pseudogene sequences have disrupted their native global network of stabilizing residue interactions, making it harder for them to fold if they were translated. In some cases, however, energetic frustration has apparently decreased when the functional constraints were removed. We analyzed this unexpected situation for Cyclophilin A, Profilin-1, and Small Ubiquitin-like Modifier 2 Protein. Our analysis reveals that when such mutations in the pseudogene ultimately stabilize folding, at the same time, they likely alter the pseudogenes' former biological activity, as estimated by DCA. We localize most of these stabilizing mutations generally to normally frustrated regions required for binding to other partners.
Collapse
Affiliation(s)
- Hana Jaafari
- Center for Theoretical Biophysics, Rice University, Houston, TX77005
- Applied Physics Graduate Program, Smalley-Curl Institute, Rice University, Houston, TX77005
- Department of Chemistry, Rice University, Houston, TX77005
| | - Carlos Bueno
- Center for Theoretical Biophysics, Rice University, Houston, TX77005
| | | | - Jonathan Martin
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
- Department of Bioengineering, University of Texas at Dallas, Richardson, TX75080
- Center for Systems Biology, University of Texas at Dallas, Richardson, TX75080
| | - Peter G. Wolynes
- Center for Theoretical Biophysics, Rice University, Houston, TX77005
- Department of Chemistry, Rice University, Houston, TX77005
- Department of Physics and Astronomy, Rice University, Houston, TX77005
- Department of Biochemistry and Cell Biology, Rice University, Houston, TX77005
| |
Collapse
|
8
|
Schwerdtfeger P, Wales DJ. 100 Years of the Lennard-Jones Potential. J Chem Theory Comput 2024; 20:3379-3405. [PMID: 38669689 DOI: 10.1021/acs.jctc.4c00135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/28/2024]
Abstract
It is now 100 years since Lennard-Jones published his first paper introducing the now famous potential that bears his name. It is therefore timely to reflect on the many achievements, as well as the limitations, of this potential in the theory of atomic and molecular interactions, where applications range from descriptions of intermolecular forces to molecules, clusters, and condensed matter.
Collapse
Affiliation(s)
- Peter Schwerdtfeger
- Centre for Theoretical Chemistry and Physics, The New Zealand Institute for Advanced Study, Massey University Auckland, Private Bag 102904, Auckland 0745, New Zealand
| | - David J Wales
- Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, U.K
| |
Collapse
|
9
|
Gizzio J, Thakur A, Haldane A, Levy RM. Evolutionary sequence and structural basis for the distinct conformational landscapes of Tyr and Ser/Thr kinases. RESEARCH SQUARE 2024:rs.3.rs-4048991. [PMID: 38746330 PMCID: PMC11092858 DOI: 10.21203/rs.3.rs-4048991/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Protein kinases are molecular machines with rich sequence variation that distinguishes the two main evolutionary branches - tyrosine kinases (TKs) from serine/threonine kinases (STKs). Using a sequence co-variation Potts statistical energy model we previously concluded that TK catalytic domains are more likely than STKs to adopt an inactive conformation with the activation loop in an autoinhibitory "folded" conformation, due to intrinsic sequence effects. Here we investigated the structural basis for this phenomenon by integrating the sequence-based model with structure-based molecular dynamics (MD) to determine the effects of mutations on the free energy difference between active and inactive conformations, using a novel thermodynamic cycle involving many (n=108) protein-mutation free energy perturbation (FEP) simulations in the active and inactive conformations. The sequence and structure-based results are consistent and support the hypothesis that the inactive conformation "DFG-out Activation Loop Folded", is a functional regulatory state that has been stabilized in TKs relative to STKs over the course of their evolution via the accumulation of residue substitutions in the activation loop and catalytic loop that facilitate distinct substrate binding modes in trans and additional modes of regulation in cis for TKs.
Collapse
Affiliation(s)
- Joan Gizzio
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania 19122
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122
| | - Abhishek Thakur
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania 19122
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122
| | - Allan Haldane
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania 19122
- Department of Physics, Temple University, Philadelphia, Pennsylvania 19122
| | - Ronald M. Levy
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania 19122
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122
| |
Collapse
|
10
|
Gizzio J, Thakur A, Haldane A, Post CB, Levy RM. Evolutionary sequence and structural basis for the distinct conformational landscapes of Tyr and Ser/Thr kinases. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.08.584161. [PMID: 38559238 PMCID: PMC10979876 DOI: 10.1101/2024.03.08.584161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Protein kinases are molecular machines with rich sequence variation that distinguishes the two main evolutionary branches - tyrosine kinases (TKs) from serine/threonine kinases (STKs). Using a sequence co-variation Potts statistical energy model we previously concluded that TK catalytic domains are more likely than STKs to adopt an inactive conformation with the activation loop in an autoinhibitory "folded" conformation, due to intrinsic sequence effects. Here we investigated the structural basis for this phenomenon by integrating the sequence-based model with structure-based molecular dynamics (MD) to determine the effects of mutations on the free energy difference between active and inactive conformations, using a novel thermodynamic cycle involving many (n=108) protein-mutation free energy perturbation (FEP) simulations in the active and inactive conformations. The sequence and structure-based results are consistent and support the hypothesis that the inactive conformation "DFG-out Activation Loop Folded", is a functional regulatory state that has been stabilized in TKs relative to STKs over the course of their evolution via the accumulation of residue substitutions in the activation loop and catalytic loop that facilitate distinct substrate binding modes in trans and additional modes of regulation in cis for TKs.
Collapse
Affiliation(s)
- Joan Gizzio
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania 19122
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122
| | - Abhishek Thakur
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania 19122
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122
| | - Allan Haldane
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania 19122
- Department of Physics, Temple University, Philadelphia, Pennsylvania 19122
| | - Carol Beth Post
- Borch Department of Medicinal Chemistry and Molecular Pharmacology, Purdue University, West Lafayette, Indiana 47907
| | - Ronald M. Levy
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania 19122
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122
| |
Collapse
|
11
|
Michael R, Kæstel-Hansen J, Mørch Groth P, Bartels S, Salomon J, Tian P, Hatzakis NS, Boomsma W. A systematic analysis of regression models for protein engineering. PLoS Comput Biol 2024; 20:e1012061. [PMID: 38701099 PMCID: PMC11095727 DOI: 10.1371/journal.pcbi.1012061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 05/15/2024] [Accepted: 04/10/2024] [Indexed: 05/05/2024] Open
Abstract
To optimize proteins for particular traits holds great promise for industrial and pharmaceutical purposes. Machine Learning is increasingly applied in this field to predict properties of proteins, thereby guiding the experimental optimization process. A natural question is: How much progress are we making with such predictions, and how important is the choice of regressor and representation? In this paper, we demonstrate that different assessment criteria for regressor performance can lead to dramatically different conclusions, depending on the choice of metric, and how one defines generalization. We highlight the fundamental issues of sample bias in typical regression scenarios and how this can lead to misleading conclusions about regressor performance. Finally, we make the case for the importance of calibrated uncertainty in this domain.
Collapse
Affiliation(s)
- Richard Michael
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | | | - Peter Mørch Groth
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
- Enzyme Research, Novozymes A/S, Kongens Lyngby, Denmark
| | - Simon Bartels
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | | | - Pengfei Tian
- Enzyme Research, Novozymes A/S, Kongens Lyngby, Denmark
| | - Nikos S. Hatzakis
- Department of Chemistry, University of Copenhagen, Copenhagen, Denmark
| | - Wouter Boomsma
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
12
|
Pereira de Araújo AF. Sequence-dependent and -independent information in a combined random energy model for protein folding and coding. Proteins 2024; 92:679-687. [PMID: 38158239 DOI: 10.1002/prot.26658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 12/11/2023] [Accepted: 12/15/2023] [Indexed: 01/03/2024]
Abstract
Random energy models (REMs) provide a simple description of the energy landscapes that guide protein folding and evolution. The requirement of a large energy gap between the native structure and unfolded conformations, considered necessary for cooperative, protein-like, folding behavior, indicates that proteins differ markedly from random heteropolymers. It has been suggested, therefore, that natural selection might have acted to choose nonrandom amino acid sequences satisfying this particular condition, implying that a large fraction of possible, unselected random sequences, would not fold to any structure. From an informational perspective, however, this scenario could indicate that protein structures, regarded as messages to be transmitted through a communication channel, would not be efficiently encoded in amino acid sequences, regarded as the communication channel for this transmission, since a large fraction of possible channel states would not be used. Here, we use a combined REM for conformations and sequences, with previously estimated parameters for natural proteins, to explore an alternative possibility in which the appropriate shape of the landscape results mainly from the deviation from randomness of possible native structures instead of sequences. We observe that this situation emerges naturally if the distribution of conformational energies happens to arise from two independent contributions corresponding to sequence-dependent and -independent terms. This construction is consistent with the hypothesis of a protein burial folding code, with native structures being determined by a modest amount of sequence-dependent atomic burial information with sequence-independent constraints imposed by unspecific hydrogen bond formation. More generally, an appropriate combination of sequence-dependent and -independent information accommodates the possibility of an efficient structural encoding with the main physical requirement for folding, providing possible insight not only on the folding process but also on several aspects sequence evolution such as neutral networks, conformational coverage, and de novo gene emergence.
Collapse
Affiliation(s)
- Antônio F Pereira de Araújo
- Laboratório de Biofísica Teórica, Departamento de Biologia Celular, Universidade de Brasília, Brasília, Brazil
| |
Collapse
|
13
|
Biswas A, Choudhuri I, Arnold E, Lyumkis D, Haldane A, Levy RM. Kinetic coevolutionary models predict the temporal emergence of HIV-1 resistance mutations under drug selection pressure. Proc Natl Acad Sci U S A 2024; 121:e2316662121. [PMID: 38557187 PMCID: PMC11009627 DOI: 10.1073/pnas.2316662121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 02/23/2024] [Indexed: 04/04/2024] Open
Abstract
Drug resistance in HIV type 1 (HIV-1) is a pervasive problem that affects the lives of millions of people worldwide. Although records of drug-resistant mutations (DRMs) have been extensively tabulated within public repositories, our understanding of the evolutionary kinetics of DRMs and how they evolve together remains limited. Epistasis, the interaction between a DRM and other residues in HIV-1 protein sequences, is key to the temporal evolution of drug resistance. We use a Potts sequence-covariation statistical-energy model of HIV-1 protein fitness under drug selection pressure, which captures epistatic interactions between all positions, combined with kinetic Monte-Carlo simulations of sequence evolutionary trajectories, to explore the acquisition of DRMs as they arise in an ensemble of drug-naive patient protein sequences. We follow the time course of 52 DRMs in the enzymes protease, RT, and integrase, the primary targets of antiretroviral therapy. The rates at which DRMs emerge are highly correlated with their observed acquisition rates reported in the literature when drug pressure is applied. This result highlights the central role of epistasis in determining the kinetics governing DRM emergence. Whereas rapidly acquired DRMs begin to accumulate as soon as drug pressure is applied, slowly acquired DRMs are contingent on accessory mutations that appear only after prolonged drug pressure. We provide a foundation for using computational methods to determine the temporal evolution of drug resistance using Potts statistical potentials, which can be used to gain mechanistic insights into drug resistance pathways in HIV-1 and other infectious agents.
Collapse
Affiliation(s)
- Avik Biswas
- Center for Biophysics and Computational Biology, College of Science and Technology, Temple University, Philadelphia, PA19122
- Laboratory of Genetics, The Salk Institute for Biological Studies, La Jolla, CA92037
- Department of Physics, University of California San Diego, La Jolla, CA92093
| | - Indrani Choudhuri
- Center for Biophysics and Computational Biology, College of Science and Technology, Temple University, Philadelphia, PA19122
- Department of Chemistry, Temple University, Philadelphia, PA19122
| | - Eddy Arnold
- Department of Chemistry and Chemical Biology, Center for Advanced Biotechnology and Medicine, Rutgers University, Piscataway, NJ08854
| | - Dmitry Lyumkis
- Laboratory of Genetics, The Salk Institute for Biological Studies, La Jolla, CA92037
- Graduate School of Biological Sciences, Department of Molecular Biology, University of California San Diego, La Jolla, CA92093
| | - Allan Haldane
- Center for Biophysics and Computational Biology, College of Science and Technology, Temple University, Philadelphia, PA19122
- Department of Physics, Temple University, Philadelphia, PA19122
| | - Ronald M. Levy
- Center for Biophysics and Computational Biology, College of Science and Technology, Temple University, Philadelphia, PA19122
- Department of Chemistry, Temple University, Philadelphia, PA19122
| |
Collapse
|
14
|
Thakur A, Gizzio J, Levy RM. Potts Hamiltonian Models and Molecular Dynamics Free Energy Simulations for Predicting the Impact of Mutations on Protein Kinase Stability. J Phys Chem B 2024; 128:1656-1667. [PMID: 38350894 PMCID: PMC10939730 DOI: 10.1021/acs.jpcb.3c08097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2024]
Abstract
Single-point mutations in kinase proteins can affect their stability and fitness, and computational analysis of these effects can provide insights into the relationships among protein sequence, structure, and function for this enzyme family. To assess the impact of mutations on protein stability, we used a sequence-based Potts Hamiltonian model trained on a kinase family multiple-sequence alignment (MSA) to calculate the statistical energy (fitness) effects of mutations and compared these against relative folding free energies (ΔΔGs) calculated from all-atom molecular dynamics free energy perturbation (FEP) simulations in explicit solvent. The fitness effects of mutations in the Potts model (ΔEs) showed good agreement with experimental thermostability data (Pearson r = 0.68), similar to the correlation we observed with ΔΔGs predicted from structure-based relative FEP simulations. Recognizing the possible advantages of using Potts models to rapidly estimate protein stability effects of kinase mutations seen in cancer genomics data, we used the Potts statistical energy model to estimate the stability effects of 65 conservative and nonconservative mutations across three distinct kinases (Wee1, Abl1, and Cdc7) with somatic mutations reported in the Genomic Data Commons (GDC) database. The ΔEs of these mutations calculated from the Potts model are consistent with the corresponding ΔΔGs from FEP simulations (Pearson ratio of 0.72). The agreement between these methods suggests that the Potts model may be used as a sequence-based tool for high-throughput screening of mutational effects as part of a computational pipeline for predicting the stability effects of mutations. We also demonstrate how the scalability of the fitness-based Potts model calculations permits analyses that are not easily accessed using FEP simulations. To this end, we employed site-saturation mutagenesis in the Potts model in order to investigate the relative stability effects of mutations seen in different cancer evolutionary scenarios. We used this approach to analyze the effects of drug pressure in Abl kinase by contrasting the relative fitness penalties of somatic mutations seen in miscellaneous cancer types with those calculated for mutations associated with cancer drug resistance. We observed that, in contrast to somatic mutations of Abl seen in various tumors that appear to have evolved neutrally, cancer mutations that evolved under drug pressure in Abl-targeted therapies tend to preserve enzyme stability.
Collapse
Affiliation(s)
- Abhishek Thakur
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania 19122, United States
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122, United States
| | - Joan Gizzio
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania 19122, United States
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122, United States
| | - Ronald M Levy
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania 19122, United States
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122, United States
- Department of Physics, Temple University, Philadelphia, Pennsylvania 19122, United States
| |
Collapse
|
15
|
Nartey C, Koo HJ, Laurendon C, Shaik HZ, O’maille P, Noel JP, Morcos F. Coevolutionary Information Captures Catalytic Functions and Reveals Divergent Roles of Terpene Synthase Interdomain Connections. Biochemistry 2024; 63:355-366. [PMID: 38206111 PMCID: PMC10851433 DOI: 10.1021/acs.biochem.3c00578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 12/22/2023] [Accepted: 12/27/2023] [Indexed: 01/12/2024]
Abstract
Inferring the historical and biophysical causes of diversity within protein families is a complex puzzle. A key to unraveling this problem is characterizing the rugged topography of sequence-function adaptive landscapes. Using biochemical data from a 29 = 512 combinatorial library of tobacco 5-epi-aristolochene synthase (TEAS) mutants engineered to make the native major product of Egyptian henbane premnaspirodiene synthase (HPS) and a complementary 512 mutant HPS library, we address the question of how product specificity is controlled. These data sets reveal that HPS is far more robust and resistant to mutations than TEAS, where most mutants are promiscuous. We also combine experimental data with a sequence Potts Hamiltonian model and direct coupling analysis to quantify mutant fitness. Our results demonstrate that the Hamiltonian captures variation in product outputs across both libraries, clusters native family members based on their substrate specificities, and exposes the divergent catalytic roles of couplings between the catalytic and noncatalytic domains of TEAS versus HPS. Specifically, we found that the role of the interdomain connectivities in specifying product output is more important in TEAS than connectivities within the catalytic domain. Despite being 75% identical, this property is not shared by HPS, where connectivities within the catalytic domain are more important for specificity. By solving the X-ray crystal structure of HPS, we assessed structural bases for their interdomain network differences. Last, we calculate the product profile Shannon entropies of the two libraries, which showcases that site-site connectivities also play divergent roles in catalytic accuracy.
Collapse
Affiliation(s)
- Charisse
M. Nartey
- Department
of Biological Sciences, The University of
Texas at Dallas, Richardson, Texas 75080, United States
| | - Hyun Jo Koo
- Howard
Hughes Medical Institute, The Salk Institute for Biological Studies, Jack H. Skirball Center for Chemical Biology and Proteomics, 10010 North Torrey Pines Road, La Jolla, California 92037, United States
| | - Caroline Laurendon
- John
Innes Centre, Department of Metabolic Biology, Norwich Research Park, Norwich NR4 7UH, U.K.
| | - Hana Z. Shaik
- Department
of Bioengineering, The University of Texas
at Dallas, Richardson, Texas 75080, United States
| | - Paul O’maille
- John
Innes Centre, Institute of Food Research, Food & Health Programme, Norwich Research Park, Norwich NR4 7UA, U.K.
| | - Joseph P. Noel
- Howard
Hughes Medical Institute, The Salk Institute for Biological Studies, Jack H. Skirball Center for Chemical Biology and Proteomics, 10010 North Torrey Pines Road, La Jolla, California 92037, United States
| | - Faruck Morcos
- Department
of Biological Sciences, The University of
Texas at Dallas, Richardson, Texas 75080, United States
- Department
of Bioengineering, The University of Texas
at Dallas, Richardson, Texas 75080, United States
- Center for
Systems Biology, The University of Texas
at Dallas, Richardson, Texas 75080, United States
| |
Collapse
|
16
|
Alvarez S, Nartey CM, Mercado N, de la Paz JA, Huseinbegovic T, Morcos F. In vivo functional phenotypes from a computational epistatic model of evolution. Proc Natl Acad Sci U S A 2024; 121:e2308895121. [PMID: 38285950 PMCID: PMC10861889 DOI: 10.1073/pnas.2308895121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 12/19/2023] [Indexed: 01/31/2024] Open
Abstract
Computational models of evolution are valuable for understanding the dynamics of sequence variation, to infer phylogenetic relationships or potential evolutionary pathways and for biomedical and industrial applications. Despite these benefits, few have validated their propensities to generate outputs with in vivo functionality, which would enhance their value as accurate and interpretable evolutionary algorithms. We demonstrate the power of epistasis inferred from natural protein families to evolve sequence variants in an algorithm we developed called sequence evolution with epistatic contributions (SEEC). Utilizing the Hamiltonian of the joint probability of sequences in the family as fitness metric, we sampled and experimentally tested for in vivo [Formula: see text]-lactamase activity in Escherichia coli TEM-1 variants. These evolved proteins can have dozens of mutations dispersed across the structure while preserving sites essential for both catalysis and interactions. Remarkably, these variants retain family-like functionality while being more active than their wild-type predecessor. We found that depending on the inference method used to generate the epistatic constraints, different parameters simulate diverse selection strengths. Under weaker selection, local Hamiltonian fluctuations reliably predict relative changes to variant fitness, recapitulating neutral evolution. SEEC has the potential to explore the dynamics of neofunctionalization, characterize viral fitness landscapes, and facilitate vaccine development.
Collapse
Affiliation(s)
- Sophia Alvarez
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
| | - Charisse M. Nartey
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
| | - Nicholas Mercado
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
| | | | - Tea Huseinbegovic
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
- Department of Bioengineering, University of Texas at Dallas, Richardson, TX75080
- Center for Systems Biology, University of Texas at Dallas, Richardson, TX75080
| |
Collapse
|
17
|
Hayes RL, Nixon CF, Marqusee S, Brooks CL. Selection pressures on evolution of ribonuclease H explored with rigorous free-energy-based design. Proc Natl Acad Sci U S A 2024; 121:e2312029121. [PMID: 38194446 PMCID: PMC10801872 DOI: 10.1073/pnas.2312029121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 11/22/2023] [Indexed: 01/11/2024] Open
Abstract
Understanding natural protein evolution and designing novel proteins are motivating interest in development of high-throughput methods to explore large sequence spaces. In this work, we demonstrate the application of multisite λ dynamics (MSλD), a rigorous free energy simulation method, and chemical denaturation experiments to quantify evolutionary selection pressure from sequence-stability relationships and to address questions of design. This study examines a mesophilic phylogenetic clade of ribonuclease H (RNase H), furthering its extensive characterization in earlier studies, focusing on E. coli RNase H (ecRNH) and a more stable consensus sequence (AncCcons) differing at 15 positions. The stabilities of 32,768 chimeras between these two sequences were computed using the MSλD framework. The most stable and least stable chimeras were predicted and tested along with several other sequences, revealing a designed chimera with approximately the same stability increase as AncCcons, but requiring only half the mutations. Comparing the computed stabilities with experiment for 12 sequences reveals a Pearson correlation of 0.86 and root mean squared error of 1.18 kcal/mol, an unprecedented level of accuracy well beyond less rigorous computational design methods. We then quantified selection pressure using a simple evolutionary model in which sequences are selected according to the Boltzmann factor of their stability. Selection temperatures from 110 to 168 K are estimated in three ways by comparing experimental and computational results to evolutionary models. These estimates indicate selection pressure is high, which has implications for evolutionary dynamics and for the accuracy required for design, and suggests accurate high-throughput computational methods like MSλD may enable more effective protein design.
Collapse
Affiliation(s)
- Ryan L. Hayes
- Department of Chemical and Biomolecular Engineering, University of California, Irvine, CA92697
- Department of Chemistry, University of Michigan, Ann Arbor, MI48109
| | - Charlotte F. Nixon
- Department of Molecular and Cell Biology, University of California, Berkeley, CA94720
| | - Susan Marqusee
- Department of Molecular and Cell Biology, University of California, Berkeley, CA94720
- California Institute for Quantitative Biosciences, University of California, Berkeley, CA94720
- Department of Chemistry, University of California, Berkeley, CA94720
| | - Charles L. Brooks
- Department of Chemistry, University of Michigan, Ann Arbor, MI48109
- Biophysics Program, University of Michigan, Ann Arbor, MI48109
| |
Collapse
|
18
|
García-Morales A, Balleza D. Exploring Flexibility and Folding Patterns Throughout Time in Voltage Sensors. J Mol Evol 2023; 91:819-836. [PMID: 37955698 DOI: 10.1007/s00239-023-10140-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 10/27/2023] [Indexed: 11/14/2023]
Abstract
The voltage-sensing domain (VSD) is a module capable of responding to changes in the membrane potential through conformational changes and facilitating electromechanical coupling to open a pore gate, activate proton permeation pathways, or promote enzymatic activity in some membrane-anchored phosphatases. To carry out these functions, this module acts cooperatively through conformational changes. The VSD is formed by four transmembrane segments (S1-S4) but the S4 segment is critical since it carries positively charged residues, mainly Arg or Lys, which require an aqueous environment for its proper function. The discovery of this module in voltage-gated ion channels (VGICs), proton channels (Hv1), and voltage sensor-containing phosphatases (VSPs) has expanded our understanding of the principle of modularity in the voltage-sensing mechanism of these proteins. Here, by sequence comparison and the evaluation of the relationship between sequence composition, intrinsic flexibility, and structural analysis in 14 selected representatives of these three major protein groups, we report five interesting differences in the folding patterns of the VSD both in prokaryotes and eukaryotes. Our main findings indicate that this module is highly conserved throughout the evolutionary scale, however: (1) segments S1 to S3 in eukaryotes are significantly more hydrophobic than those present in prokaryotes; (2) the S4 segment has retained its hydrophilic character; (3) in eukaryotes the extramembranous linkers are significantly larger and more flexible in comparison with those present in prokaryotes; (4) the sensors present in the kHv1 proton channel and the ciVSP phosphatase, both of eukaryotic origin, exhibit relationships of flexibility and folding patterns very close to the typical ones found in prokaryotic voltage sensors; and (5) archaeal channels KvAP and MVP have flexibility profiles which are clearly contrasting in the S3-S4 region, which could explain their divergent activation mechanisms. Finally, to elucidate the obscure origins of this module, we show further evidence for a possible connection between voltage sensors and TolQ proteins.
Collapse
Affiliation(s)
- Abigail García-Morales
- Tecnológico Nacional de México, Instituto Tecnológico de Veracruz, Unidad de Investigación y Desarrollo en Alimentos, Calz. Miguel Angel de Quevedo 2779, Col. Formando Hogar, CP. 91897, Veracruz, Ver, Mexico
| | - Daniel Balleza
- Tecnológico Nacional de México, Instituto Tecnológico de Veracruz, Unidad de Investigación y Desarrollo en Alimentos, Calz. Miguel Angel de Quevedo 2779, Col. Formando Hogar, CP. 91897, Veracruz, Ver, Mexico.
| |
Collapse
|
19
|
Gaudreault F, Corbeil CR, Sulea T. Enhanced antibody-antigen structure prediction from molecular docking using AlphaFold2. Sci Rep 2023; 13:15107. [PMID: 37704686 PMCID: PMC10499836 DOI: 10.1038/s41598-023-42090-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Accepted: 09/05/2023] [Indexed: 09/15/2023] Open
Abstract
Predicting the structure of antibody-antigen complexes has tremendous value in biomedical research but unfortunately suffers from a poor performance in real-life applications. AlphaFold2 (AF2) has provided renewed hope for improvements in the field of protein-protein docking but has shown limited success against antibody-antigen complexes due to the lack of co-evolutionary constraints. In this study, we used physics-based protein docking methods for building decoy sets consisting of low-energy docking solutions that were either geometrically close to the native structure (positives) or not (negatives). The docking models were then fed into AF2 to assess their confidence with a novel composite score based on normalized pLDDT and pTMscore metrics after AF2 structural refinement. We show benefits of the AF2 composite score for rescoring docking poses both in terms of (1) classification of positives/negatives and of (2) success rates with particular emphasis on early enrichment. Docking models of at least medium quality present in the decoy set, but not necessarily highly ranked by docking methods, benefitted most from AF2 rescoring by experiencing large advances towards the top of the reranked list of models. These improvements, obtained without any calibration or novel methodologies, led to a notable level of performance in antibody-antigen unbound docking that was never achieved previously.
Collapse
Affiliation(s)
- Francis Gaudreault
- Human Health Therapeutics Research Centre, National Research Council Canada, 6100 Royalmount Avenue, Montreal, QC, H4P 2R2, Canada
| | - Christopher R Corbeil
- Human Health Therapeutics Research Centre, National Research Council Canada, 6100 Royalmount Avenue, Montreal, QC, H4P 2R2, Canada
| | - Traian Sulea
- Human Health Therapeutics Research Centre, National Research Council Canada, 6100 Royalmount Avenue, Montreal, QC, H4P 2R2, Canada.
- Institute of Parasitology, McGill University, 21111 Lakeshore Road, Sainte-Anne-de-Bellevue, QC, H9X 3V9, Canada.
| |
Collapse
|
20
|
Li Y, Peng HQ, Yang LQ. Structural determinants underlying high-temperature adaptation of thermophilic xylanase from hot-spring microorganisms. Front Microbiol 2023; 14:1210420. [PMID: 37485531 PMCID: PMC10360402 DOI: 10.3389/fmicb.2023.1210420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Accepted: 06/21/2023] [Indexed: 07/25/2023] Open
Abstract
Thermophilic xylanases from hot-spring microorganisms play potential biological and industrial applications for renewable and sustainable social development. However, high-temperature adaptation mechanisms of these thermophilic xylanases remain elusive at the molecular and evolutionary levels. Here, two recently reported xylanases, named XynDRTY1 and XynM1, from hot springs were subjected to molecular dynamics (MD) simulations at a series of temperature gradients and comparatively analyzed in comparison with the evolutionary background of the xylanase family. Comparative analysis of MD trajectories revealed that the XynM1 exhibits smaller structural dynamics and greater thermal stability than the XynDRTY1, although both share a similar fold architecture with structural differences in the βα_loops. Local regions whose conformational flexibility and regular secondary structure exhibited differences as temperature increases were closely related to the high-temperature adaptation of xylanase, implying that stabilization of these regions is a feasible strategy to improve the thermal stability of xylanases. Furthermore, coevolutionary information from the xylanase family further specified the structural basis of xylanases. Thanks to these results about the sequence, structure, and dynamics of thermophilic xylanases from hot springs, a series of high-temperature-related structural determinants were resolved to promote understanding of the molecular mechanism of xylanase high-temperature adaptation and to provide direct assistance in the improvement of xylanase thermal stability.
Collapse
Affiliation(s)
- Yi Li
- College of Mathematics and Computer Science, Dali University, Dali, China
- College of Agriculture and Biological Science, Dali University, Dali, China
- Key Laboratory of Bioinformatics and Computational Biology, Department of Education of Yunnan Province, Dali University, Dali, China
- State Key Laboratory for Conservation and Utilization of Bio-Resource in Yunnan, Yunnan University, Kunming, China
| | - Hong-Qian Peng
- College of Mathematics and Computer Science, Dali University, Dali, China
| | - Li-Quan Yang
- College of Agriculture and Biological Science, Dali University, Dali, China
- Key Laboratory of Bioinformatics and Computational Biology, Department of Education of Yunnan Province, Dali University, Dali, China
| |
Collapse
|
21
|
Alvarez S, Nartey CM, Mercado N, de la Paz A, Huseinbegovic T, Morcos F. In vivo functional phenotypes from a computational epistatic model of evolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.24.542176. [PMID: 37292895 PMCID: PMC10245989 DOI: 10.1101/2023.05.24.542176] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Computational models of evolution are valuable for understanding the dynamics of sequence variation, to infer phylogenetic relationships or potential evolutionary pathways and for biomedical and industrial applications. Despite these benefits, few have validated their propensities to generate outputs with in vivo functionality, which would enhance their value as accurate and interpretable evolutionary algorithms. We demonstrate the power of epistasis inferred from natural protein families to evolve sequence variants in an algorithm we developed called Sequence Evolution with Epistatic Contributions. Utilizing the Hamiltonian of the joint probability of sequences in the family as fitness metric, we sampled and experimentally tested for in vivo β -lactamase activity in E. coli TEM-1 variants. These evolved proteins can have dozens of mutations dispersed across the structure while preserving sites essential for both catalysis and interactions. Remarkably, these variants retain family-like functionality while being more active than their WT predecessor. We found that depending on the inference method used to generate the epistatic constraints, different parameters simulate diverse selection strengths. Under weaker selection, local Hamiltonian fluctuations reliably predict relative changes to variant fitness, recapitulating neutral evolution. SEEC has the potential to explore the dynamics of neofunctionalization, characterize viral fitness landscapes and facilitate vaccine development.
Collapse
Affiliation(s)
- Sophia Alvarez
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Charisse M. Nartey
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Nicholas Mercado
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Alberto de la Paz
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Tea Huseinbegovic
- School of Natural Sciences and Mathematics, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX 75080, USA
- Department of Bioengineering, University of Texas at Dallas, Richardson, TX 75080, USA
- Center for Systems Biology, University of Texas at Dallas, Richardson, TX 75080, USA
| |
Collapse
|
22
|
Gizzio J, Thakur A, Haldane A, Levy RM. Evolutionary divergence in the conformational landscapes of tyrosine vs serine/threonine kinases. eLife 2022; 11:83368. [PMID: 36562610 PMCID: PMC9822262 DOI: 10.7554/elife.83368] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 12/22/2022] [Indexed: 12/24/2022] Open
Abstract
Inactive conformations of protein kinase catalytic domains where the DFG motif has a "DFG-out" orientation and the activation loop is folded present a druggable binding pocket that is targeted by FDA-approved 'type-II inhibitors' in the treatment of cancers. Tyrosine kinases (TKs) typically show strong binding affinity with a wide spectrum of type-II inhibitors while serine/threonine kinases (STKs) usually bind more weakly which we suggest here is due to differences in the folded to extended conformational equilibrium of the activation loop between TKs vs. STKs. To investigate this, we use sequence covariation analysis with a Potts Hamiltonian statistical energy model to guide absolute binding free-energy molecular dynamics simulations of 74 protein-ligand complexes. Using the calculated binding free energies together with experimental values, we estimated free-energy costs for the large-scale (~17-20 Å) conformational change of the activation loop by an indirect approach, circumventing the very challenging problem of simulating the conformational change directly. We also used the Potts statistical potential to thread large sequence ensembles over active and inactive kinase states. The structure-based and sequence-based analyses are consistent; together they suggest TKs evolved to have free-energy penalties for the classical 'folded activation loop' DFG-out conformation relative to the active conformation, that is, on average, 4-6 kcal/mol smaller than the corresponding values for STKs. Potts statistical energy analysis suggests a molecular basis for this observation, wherein the activation loops of TKs are more weakly 'anchored' against the catalytic loop motif in the active conformation and form more stable substrate-mimicking interactions in the inactive conformation. These results provide insights into the molecular basis for the divergent functional properties of TKs and STKs, and have pharmacological implications for the target selectivity of type-II inhibitors.
Collapse
Affiliation(s)
- Joan Gizzio
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, United States.,Department of Chemistry, Temple University, Philadelphia, United States
| | - Abhishek Thakur
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, United States.,Department of Chemistry, Temple University, Philadelphia, United States
| | - Allan Haldane
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, United States.,Department of Physics, Temple University, Philadelphia, United States
| | - Ronald M Levy
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, United States.,Department of Chemistry, Temple University, Philadelphia, United States
| |
Collapse
|
23
|
Sánchez IE, Galpern EA, Garibaldi MM, Ferreiro DU. Molecular Information Theory Meets Protein Folding. J Phys Chem B 2022; 126:8655-8668. [PMID: 36282961 DOI: 10.1021/acs.jpcb.2c04532] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
We propose an application of molecular information theory to analyze the folding of single domain proteins. We analyze results from various areas of protein science, such as sequence-based potentials, reduced amino acid alphabets, backbone configurational entropy, secondary structure content, residue burial layers, and mutational studies of protein stability changes. We found that the average information contained in the sequences of evolved proteins is very close to the average information needed to specify a fold ∼2.2 ± 0.3 bits/(site·operation). The effective alphabet size in evolved proteins equals the effective number of conformations of a residue in the compact unfolded state at around 5. We calculated an energy-to-information conversion efficiency upon folding of around 50%, lower than the theoretical limit of 70%, but much higher than human-built macroscopic machines. We propose a simple mapping between molecular information theory and energy landscape theory and explore the connections between sequence evolution, configurational entropy, and the energetics of protein folding.
Collapse
Affiliation(s)
- Ignacio E Sánchez
- Facultad de Ciencias Exactas y Naturales, Laboratorio de Fisiología de Proteínas, Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Universidad de Buenos Aires, Buenos AiresCP1428, Argentina
| | - Ezequiel A Galpern
- Facultad de Ciencias Exactas y Naturales, Laboratorio de Fisiología de Proteínas, Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Universidad de Buenos Aires, Buenos AiresCP1428, Argentina
| | - Martín M Garibaldi
- Facultad de Ciencias Exactas y Naturales, Laboratorio de Fisiología de Proteínas, Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Universidad de Buenos Aires, Buenos AiresCP1428, Argentina
| | - Diego U Ferreiro
- Facultad de Ciencias Exactas y Naturales, Laboratorio de Fisiología de Proteínas, Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Universidad de Buenos Aires, Buenos AiresCP1428, Argentina
| |
Collapse
|
24
|
Colberg M, Schofield J. Configurational entropy, transition rates, and optimal interactions for rapid folding in coarse-grained model proteins. J Chem Phys 2022; 157:125101. [PMID: 36182418 DOI: 10.1063/5.0098612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Under certain conditions, the dynamics of coarse-grained models of solvated proteins can be described using a Markov state model, which tracks the evolution of populations of configurations. The transition rates among states that appear in the Markov model can be determined by computing the relative entropy of states and their mean first passage times. In this paper, we present an adaptive method to evaluate the configurational entropy and the mean first passage times for linear chain models with discontinuous potentials. The approach is based on event-driven dynamical sampling in a massively parallel architecture. Using the fact that the transition rate matrix can be calculated for any choice of interaction energies at any temperature, it is demonstrated how each state's energy can be chosen such that the average time to transition between any two states is minimized. The methods are used to analyze the optimization of the folding process of two protein systems: the crambin protein and a model with frustration and misfolding. It is shown that the folding pathways for both systems are comprised of two regimes: first, the rapid establishment of local bonds, followed by the subsequent formation of more distant contacts. The state energies that lead to the most rapid folding encourage multiple pathways, and they either penalize folding pathways through kinetic traps by raising the energies of trapping states or establish an escape route from the trapping states by lowering free energy barriers to other states that rapidly reach the native state.
Collapse
Affiliation(s)
- Margarita Colberg
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| | - Jeremy Schofield
- Chemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| |
Collapse
|
25
|
Gerardos A, Dietler N, Bitbol AF. Correlations from structure and phylogeny combine constructively in the inference of protein partners from sequences. PLoS Comput Biol 2022; 18:e1010147. [PMID: 35576238 PMCID: PMC9135348 DOI: 10.1371/journal.pcbi.1010147] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 05/26/2022] [Accepted: 04/27/2022] [Indexed: 11/19/2022] Open
Abstract
Inferring protein-protein interactions from sequences is an important task in computational biology. Recent methods based on Direct Coupling Analysis (DCA) or Mutual Information (MI) allow to find interaction partners among paralogs of two protein families. Does successful inference mainly rely on correlations from structural contacts or from phylogeny, or both? Do these two types of signal combine constructively or hinder each other? To address these questions, we generate and analyze synthetic data produced using a minimal model that allows us to control the amounts of structural constraints and phylogeny. We show that correlations from these two sources combine constructively to increase the performance of partner inference by DCA or MI. Furthermore, signal from phylogeny can rescue partner inference when signal from contacts becomes less informative, including in the realistic case where inter-protein contacts are restricted to a small subset of sites. We also demonstrate that DCA-inferred couplings between non-contact pairs of sites improve partner inference in the presence of strong phylogeny, while deteriorating it otherwise. Moreover, restricting to non-contact pairs of sites preserves inference performance in the presence of strong phylogeny. In a natural data set, as well as in realistic synthetic data based on it, we find that non-contact pairs of sites contribute positively to partner inference performance, and that restricting to them preserves performance, evidencing an important role of phylogeny.
Collapse
Affiliation(s)
- Andonis Gerardos
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Nicola Dietler
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- * E-mail:
| |
Collapse
|
26
|
Chi H, Zhou Q, Tutol JN, Phelps SM, Lee J, Kapadia P, Morcos F, Dodani SC. Coupling a Live Cell Directed Evolution Assay with Coevolutionary Landscapes to Engineer an Improved Fluorescent Rhodopsin Chloride Sensor. ACS Synth Biol 2022; 11:1627-1638. [PMID: 35389621 PMCID: PMC9184236 DOI: 10.1021/acssynbio.2c00033] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Our understanding of chloride in biology has been accelerated through the application of fluorescent protein-based sensors in living cells. These sensors can be generated and diversified to have a range of properties using laboratory-guided evolution. Recently, we established that the fluorescent proton-pumping rhodopsin wtGR from Gloeobacter violaceus can be converted into a fluorescent sensor for chloride. To unlock this non-natural function, a single point mutation at the Schiff counterion position (D121V) was introduced into wtGR fused to cyan fluorescent protein (CFP) resulting in GR1-CFP. Here, we have integrated coevolutionary analysis with directed evolution to understand how the rhodopsin sequence space can be explored and engineered to improve this starting point. We first show how evolutionary couplings are predictive of functional sites in the rhodopsin family and how a fitness metric based on a sequence can be used to quantify the known proton-pumping activities of GR-CFP variants. Then, we couple this ability to predict potential functional outcomes with a screening and selection assay in live Escherichia coli to reduce the mutational search space of five residues along the proton-pumping pathway in GR1-CFP. This iterative selection process results in GR2-CFP with four additional mutations: E132K, A84K, T125C, and V245I. Finally, bulk and single fluorescence measurements in live E. coli reveal that GR2-CFP is a reversible, ratiometric fluorescent sensor for extracellular chloride with an improved dynamic range. We anticipate that our framework will be applicable to other systems, providing a more efficient methodology to engineer fluorescent protein-based sensors with desired properties.
Collapse
|
27
|
Hayes RL, Vilseck JZ, Brooks CL. Addressing Intersite Coupling Unlocks Large Combinatorial Chemical Spaces for Alchemical Free Energy Methods. J Chem Theory Comput 2022; 18:2114-2123. [PMID: 35255214 PMCID: PMC9700482 DOI: 10.1021/acs.jctc.1c00948] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Alchemical free energy methods are playing a growing role in molecular design, both for computer-aided drug design of small molecules and for computational protein design. Multisite λ dynamics (MSλD) is a uniquely scalable alchemical free energy method that enables more efficient exploration of combinatorial alchemical spaces encountered in molecular design, but simulations have typically been limited to a few hundred ligands or sequences. Here, we focus on coupling between sites to enable scaling to larger alchemical spaces. We first discuss updates to the biasing potentials that facilitate MSλD sampling to include coupling terms and show that this can provide more thorough sampling of alchemical states. We then harness coupling between sites by developing a new free energy estimator based on the Potts models underlying direct coupling analysis, a method for predicting contacts from sequence coevolution, and find it yields more accurate free energies than previous estimators. The sampling requirements of the Potts model estimator scale with the square of the number of sites, a substantial improvement over the exponential scaling of the standard estimator. This opens up exploration of much larger alchemical spaces with MSλD for molecular design.
Collapse
Affiliation(s)
- Ryan L Hayes
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
- Biophysics Program, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Jonah Z Vilseck
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, Indiana 46202, United States
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202, United States
| | - Charles L Brooks
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
- Biophysics Program, University of Michigan, Ann Arbor, Michigan 48109, United States
| |
Collapse
|
28
|
Enhancing computational enzyme design by a maximum entropy strategy. Proc Natl Acad Sci U S A 2022; 119:2122355119. [PMID: 35135886 PMCID: PMC8851541 DOI: 10.1073/pnas.2122355119] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/03/2022] [Indexed: 01/16/2023] Open
Abstract
Although computational enzyme design is of great importance, the advances utilizing physics-based approaches have been slow, and further progress is urgently needed. One promising direction is using machine learning, but such strategies have not been established as effective tools for predicting the catalytic power of enzymes. Here, we show that the statistical energy inferred from homologous sequences with the maximum entropy (MaxEnt) principle significantly correlates with enzyme catalysis and stability at the active site region and the more distant region, respectively. This finding decodes enzyme architecture and offers a connection between enzyme evolution and the physical chemistry of enzyme catalysis, and it deepens our understanding of the stability-activity trade-off hypothesis for enzymes. Overall, the strong correlations found here provide a powerful way of guiding enzyme design.
Collapse
|
29
|
Do HN, Haldane A, Levy RM, Miao Y. Unique features of different classes of G-protein-coupled receptors revealed from sequence coevolutionary and structural analysis. Proteins 2022; 90:601-614. [PMID: 34599827 PMCID: PMC8738117 DOI: 10.1002/prot.26256] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 09/21/2021] [Accepted: 09/27/2021] [Indexed: 02/03/2023]
Abstract
G-protein-coupled receptors (GPCRs) are the largest family of human membrane proteins and represent the primary targets of about one third of currently marketed drugs. Despite the critical importance, experimental structures have been determined for only a limited portion of GPCRs and functional mechanisms of GPCRs remain poorly understood. Here, we have constructed novel sequence coevolutionary models of the A and B classes of GPCRs and compared them with residue contact frequency maps generated with available experimental structures. Significant portions of structural residue contacts were successfully detected in the sequence-based covariational models. "Exception" residue contacts predicted from sequence coevolutionary models but not available structures added missing links that were important for GPCR activation and allosteric modulation. Moreover, we identified distinct residue contacts involving different sets of functional motifs for GPCR activation, such as the Na+ pocket, CWxP, DRY, PIF, and NPxxY motifs in the class A and the HETx and PxxG motifs in the class B. Finally, we systematically uncovered critical residue contacts tuned by allosteric modulation in the two classes of GPCRs, including those from the activation motifs and particularly the extracellular and intracellular loops in class A GPCRs. These findings provide a promising framework for rational design of ligands to regulate GPCR activation and allosteric modulation.
Collapse
Affiliation(s)
- Hung N Do
- The Center for Computational Biology and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas 66047
| | - Allan Haldane
- Department of Chemistry, Center for Biophysics and Computational Biology, Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania 19122,Corresponding authors: and
| | - Ronald M Levy
- Department of Chemistry, Center for Biophysics and Computational Biology, Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania 19122
| | - Yinglong Miao
- The Center for Computational Biology and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas 66047,Corresponding authors: and
| |
Collapse
|
30
|
Röder K, Wales DJ. The Energy Landscape Perspective: Encoding Structure and Function for Biomolecules. Front Mol Biosci 2022; 9:820792. [PMID: 35155579 PMCID: PMC8829389 DOI: 10.3389/fmolb.2022.820792] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 01/07/2022] [Indexed: 12/02/2022] Open
Abstract
The energy landscape perspective is outlined with particular reference to biomolecules that perform multiple functions. We associate these multifunctional molecules with multifunnel energy landscapes, illustrated by some selected examples, where understanding the organisation of the landscape has provided new insight into function. Conformational selection and induced fit may provide alternative routes to realisation of multifunctionality, exploiting the possibility of environmental control and distinct binding modes.
Collapse
|
31
|
Kazan IC, Sharma P, Rahman MI, Bobkov A, Fromme R, Ghirlanda G, Ozkan SB. Design of novel cyanovirin-N variants by modulation of binding dynamics through distal mutations. eLife 2022; 11:67474. [PMID: 36472898 PMCID: PMC9725752 DOI: 10.7554/elife.67474] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 11/28/2022] [Indexed: 12/07/2022] Open
Abstract
We develop integrated co-evolution and dynamic coupling (ICDC) approach to identify, mutate, and assess distal sites to modulate function. We validate the approach first by analyzing the existing mutational fitness data of TEM-1 β-lactamase and show that allosteric positions co-evolved and dynamically coupled with the active site significantly modulate function. We further apply ICDC approach to identify positions and their mutations that can modulate binding affinity in a lectin, cyanovirin-N (CV-N), that selectively binds to dimannose, and predict binding energies of its variants through Adaptive BP-Dock. Computational and experimental analyses reveal that binding enhancing mutants identified by ICDC impact the dynamics of the binding pocket, and show that rigidification of the binding residues compensates for the entropic cost of binding. This work suggests a mechanism by which distal mutations modulate function through dynamic allostery and provides a blueprint to identify candidates for mutagenesis in order to optimize protein function.
Collapse
Affiliation(s)
- I Can Kazan
- Center for Biological Physics and Department of Physics, Arizona State UniversityTempeUnited States,School of Molecular Sciences, Arizona State UniversityTempeUnited States
| | - Prerna Sharma
- School of Molecular Sciences, Arizona State UniversityTempeUnited States
| | | | - Andrey Bobkov
- Sanford Burnham Prebys Medical Discovery InstituteLa JollaUnited States
| | - Raimund Fromme
- School of Molecular Sciences, Arizona State UniversityTempeUnited States
| | - Giovanna Ghirlanda
- School of Molecular Sciences, Arizona State UniversityTempeUnited States
| | - S Banu Ozkan
- Center for Biological Physics and Department of Physics, Arizona State UniversityTempeUnited States
| |
Collapse
|
32
|
Miyazawa S. Boltzmann Machine Learning and Regularization Methods for Inferring Evolutionary Fields and Couplings From a Multiple Sequence Alignment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:328-342. [PMID: 32396099 DOI: 10.1109/tcbb.2020.2993232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The inverse Potts problem to infer a Boltzmann distribution for homologous protein sequences from their single-site and pairwise amino acid frequencies recently attracts a great deal of attention in the studies of protein structure and evolution. We study regularization and learning methods and how to tune regularization parameters to correctly infer interactions in Boltzmann machine learning. Using L2 regularization for fields, group L1 for couplings is shown to be very effective for sparse couplings in comparison with L2 and L1. Two regularization parameters are tuned to yield equal values for both the sample and ensemble averages of evolutionary energy. Both averages smoothly change and converge, but their learning profiles are very different between learning methods. The Adam method is modified to make stepsize proportional to the gradient for sparse couplings and to use a soft-thresholding function for group L1. It is shown by first inferring interactions from protein sequences and then from Monte Carlo samples that the fields and couplings can be well recovered, but that recovering the pairwise correlations in the resolution of a total energy is harder for the natural proteins than for the protein-like sequences. Selective temperature for folding/structural constrains in protein evolution is also estimated.
Collapse
|
33
|
Chu WT, Yan Z, Chu X, Zheng X, Liu Z, Xu L, Zhang K, Wang J. Physics of biomolecular recognition and conformational dynamics. REPORTS ON PROGRESS IN PHYSICS. PHYSICAL SOCIETY (GREAT BRITAIN) 2021; 84:126601. [PMID: 34753115 DOI: 10.1088/1361-6633/ac3800] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 11/09/2021] [Indexed: 06/13/2023]
Abstract
Biomolecular recognition usually leads to the formation of binding complexes, often accompanied by large-scale conformational changes. This process is fundamental to biological functions at the molecular and cellular levels. Uncovering the physical mechanisms of biomolecular recognition and quantifying the key biomolecular interactions are vital to understand these functions. The recently developed energy landscape theory has been successful in quantifying recognition processes and revealing the underlying mechanisms. Recent studies have shown that in addition to affinity, specificity is also crucial for biomolecular recognition. The proposed physical concept of intrinsic specificity based on the underlying energy landscape theory provides a practical way to quantify the specificity. Optimization of affinity and specificity can be adopted as a principle to guide the evolution and design of molecular recognition. This approach can also be used in practice for drug discovery using multidimensional screening to identify lead compounds. The energy landscape topography of molecular recognition is important for revealing the underlying flexible binding or binding-folding mechanisms. In this review, we first introduce the energy landscape theory for molecular recognition and then address four critical issues related to biomolecular recognition and conformational dynamics: (1) specificity quantification of molecular recognition; (2) evolution and design in molecular recognition; (3) flexible molecular recognition; (4) chromosome structural dynamics. The results described here and the discussions of the insights gained from the energy landscape topography can provide valuable guidance for further computational and experimental investigations of biomolecular recognition and conformational dynamics.
Collapse
Affiliation(s)
- Wen-Ting Chu
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, People's Republic of China
| | - Zhiqiang Yan
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, People's Republic of China
| | - Xiakun Chu
- Department of Chemistry & Physics, State University of New York at Stony Brook, Stony Brook, NY 11794, United States of America
| | - Xiliang Zheng
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, People's Republic of China
| | - Zuojia Liu
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, People's Republic of China
| | - Li Xu
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, People's Republic of China
| | - Kun Zhang
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, People's Republic of China
| | - Jin Wang
- Department of Chemistry & Physics, State University of New York at Stony Brook, Stony Brook, NY 11794, United States of America
| |
Collapse
|
34
|
Bisardi M, Rodriguez-Rivas J, Zamponi F, Weigt M. Modeling sequence-space exploration and emergence of epistatic signals in protein evolution. Mol Biol Evol 2021; 39:6424001. [PMID: 34751386 PMCID: PMC8789065 DOI: 10.1093/molbev/msab321] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
During their evolution, proteins explore sequence space via an interplay between random mutations and phenotypic selection. Here, we build upon recent progress in reconstructing data-driven fitness landscapes for families of homologous proteins, to propose stochastic models of experimental protein evolution. These models predict quantitatively important features of experimentally evolved sequence libraries, like fitness distributions and position-specific mutational spectra. They also allow us to efficiently simulate sequence libraries for a vast array of combinations of experimental parameters like sequence divergence, selection strength, and library size. We showcase the potential of the approach in reanalyzing two recent experiments to determine protein structure from signals of epistasis emerging in experimental sequence libraries. To be detectable, these signals require sufficiently large and sufficiently diverged libraries. Our modeling framework offers a quantitative explanation for different outcomes of recently published experiments. Furthermore, we can forecast the outcome of time- and resource-intensive evolution experiments, opening thereby a way to computationally optimize experimental protocols.
Collapse
Affiliation(s)
- M Bisardi
- Laboratoire de Physique de l'Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, Paris, F-75005, France.,Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative LCQB, Paris, F-75005, France
| | - J Rodriguez-Rivas
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative LCQB, Paris, F-75005, France
| | - F Zamponi
- Laboratoire de Physique de l'Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, Paris, F-75005, France
| | - M Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative LCQB, Paris, F-75005, France
| |
Collapse
|
35
|
Shen Y, Olson ER, Van Deelen TR. Spatially explicit modeling of community occupancy using Markov Random Field models with imperfect observation: Mesocarnivores in Apostle Islands National Lakeshore. Ecol Modell 2021. [DOI: 10.1016/j.ecolmodel.2021.109712] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
36
|
Barrat-Charlaix P, Muntoni AP, Shimagaki K, Weigt M, Zamponi F. Sparse generative modeling via parameter reduction of Boltzmann machines: Application to protein-sequence families. Phys Rev E 2021; 104:024407. [PMID: 34525554 DOI: 10.1103/physreve.104.024407] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Accepted: 07/19/2021] [Indexed: 11/07/2022]
Abstract
Boltzmann machines (BMs) are widely used as generative models. For example, pairwise Potts models (PMs), which are instances of the BM class, provide accurate statistical models of families of evolutionarily related protein sequences. Their parameters are the local fields, which describe site-specific patterns of amino acid conservation, and the two-site couplings, which mirror the coevolution between pairs of sites. This coevolution reflects structural and functional constraints acting on protein sequences during evolution. The most conservative choice to describe the coevolution signal is to include all possible two-site couplings into the PM. This choice, typical of what is known as Direct Coupling Analysis, has been successful for predicting residue contacts in the three-dimensional structure, mutational effects, and generating new functional sequences. However, the resulting PM suffers from important overfitting effects: many couplings are small, noisy, and hardly interpretable; the PM is close to a critical point, meaning that it is highly sensitive to small parameter perturbations. In this work, we introduce a general parameter-reduction procedure for BMs, via a controlled iterative decimation of the less statistically significant couplings, identified by an information-based criterion that selects either weak or statistically unsupported couplings. For several protein families, our procedure allows one to remove more than 90% of the PM couplings, while preserving the predictive and generative properties of the original dense PM, and the resulting model is far away from criticality, hence more robust to noise.
Collapse
Affiliation(s)
- Pierre Barrat-Charlaix
- Biozentrum, Universität Basel, Switzerland, Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Anna Paola Muntoni
- Department of Applied Science and Technology (DISAT), Politecnico di Torino, Corso Duca degli Abruzzi 24, Torino 10129, Italy.,Italian Institute for Genomic Medicine, IRCCS Candiolo, SP-142, I-10060 Candiolo (TO), Italy.,Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative-LCQB, F-75005 Paris, France.,Laboratoire de Physique de l'Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France
| | - Kai Shimagaki
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative-LCQB, F-75005 Paris, France
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative-LCQB, F-75005 Paris, France
| | - Francesco Zamponi
- Laboratoire de Physique de l'Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France
| |
Collapse
|
37
|
On the effect of phylogenetic correlations in coevolution-based contact prediction in proteins. PLoS Comput Biol 2021; 17:e1008957. [PMID: 34029316 PMCID: PMC8177639 DOI: 10.1371/journal.pcbi.1008957] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Revised: 06/04/2021] [Accepted: 04/09/2021] [Indexed: 12/04/2022] Open
Abstract
Coevolution-based contact prediction, either directly by coevolutionary couplings resulting from global statistical sequence models or using structural supervision and deep learning, has found widespread application in protein-structure prediction from sequence. However, one of the basic assumptions in global statistical modeling is that sequences form an at least approximately independent sample of an unknown probability distribution, which is to be learned from data. In the case of protein families, this assumption is obviously violated by phylogenetic relations between protein sequences. It has turned out to be notoriously difficult to take phylogenetic correlations into account in coevolutionary model learning. Here, we propose a complementary approach: we develop strategies to randomize or resample sequence data, such that conservation patterns and phylogenetic relations are preserved, while intrinsic (i.e. structure- or function-based) coevolutionary couplings are removed. A comparison between the results of Direct Coupling Analysis applied to real and to resampled data shows that the largest coevolutionary couplings, i.e. those used for contact prediction, are only weakly influenced by phylogeny. However, the phylogeny-induced spurious couplings in the resampled data are compatible in size with the first false-positive contact predictions from real data. Dissecting functional from phylogeny-induced couplings might therefore extend accurate contact predictions to the range of intermediate-size couplings. Many homologous protein families contain thousands of highly diverged amino-acid sequences, which fold into close-to-identical three-dimensional structures and fulfill almost identical biological tasks. Global coevolutionary models, like those inferred by the Direct Coupling Analysis (DCA), assume that families can be considered as samples of some unknown statistical model, and that the parameters of these models represent evolutionary constraints acting on protein sequences. To learn these models from data, DCA and related approaches have to also assume that the distinct sequences in a protein family are close to independent, while in reality they are characterized by involved hierarchical phylogenetic relationships. Here we propose Null models for sequence alignments, which maintain patterns of amino-acid conservation and phylogeny contained in the data, but destroy any coevolutionary couplings, frequently used in protein structure prediction. We find that phylogeny actually induces spurious non-zero couplings. These are, however, significantly smaller that the largest couplings derived from natural sequences, and therefore have only little influence on the first predicted contacts. However, in the range of intermediate couplings, they may lead to statistically significant effects. Dissecting phylogenetic from functional couplings might therefore extend the range of accurately predicted structural contacts down to smaller coupling strengths than those currently used.
Collapse
|
38
|
Sequeiros-Borja CE, Surpeta B, Brezovsky J. Recent advances in user-friendly computational tools to engineer protein function. Brief Bioinform 2021; 22:bbaa150. [PMID: 32743637 PMCID: PMC8138880 DOI: 10.1093/bib/bbaa150] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 06/03/2020] [Accepted: 06/16/2020] [Indexed: 12/14/2022] Open
Abstract
Progress in technology and algorithms throughout the past decade has transformed the field of protein design and engineering. Computational approaches have become well-engrained in the processes of tailoring proteins for various biotechnological applications. Many tools and methods are developed and upgraded each year to satisfy the increasing demands and challenges of protein engineering. To help protein engineers and bioinformaticians navigate this emerging wave of dedicated software, we have critically evaluated recent additions to the toolbox regarding their application for semi-rational and rational protein engineering. These newly developed tools identify and prioritize hotspots and analyze the effects of mutations for a variety of properties, comprising ligand binding, protein-protein and protein-nucleic acid interactions, and electrostatic potential. We also discuss notable progress to target elusive protein dynamics and associated properties like ligand-transport processes and allosteric communication. Finally, we discuss several challenges these tools face and provide our perspectives on the further development of readily applicable methods to guide protein engineering efforts.
Collapse
Affiliation(s)
- Carlos Eduardo Sequeiros-Borja
- Laboratory of Biomolecular Interactions and Transport, Department of Gene Expression, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University and the International Institute of Molecular and Cell Biology in Warsaw, Warsaw, Poland
| | - Bartłomiej Surpeta
- Laboratory of Biomolecular Interactions and Transport, Department of Gene Expression, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University and the International Institute of Molecular and Cell Biology in Warsaw, Warsaw, Poland
| | - Jan Brezovsky
- Laboratory of Biomolecular Interactions and Transport, Department of Gene Expression, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University and the International Institute of Molecular and Cell Biology in Warsaw
| |
Collapse
|
39
|
Zou T, Woodrum BW, Halloran N, Campitelli P, Bobkov AA, Ghirlanda G, Ozkan SB. Local Interactions That Contribute Minimal Frustration Determine Foldability. J Phys Chem B 2021; 125:2617-2626. [PMID: 33687216 DOI: 10.1021/acs.jpcb.1c00364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Earlier experiments suggest that the evolutionary information (conservation and coevolution) encoded in protein sequences is necessary and sufficient to specify the fold of a protein family. However, there is no computational work to quantify the effect of such evolutionary information on the folding process. Here we explore the role of early folding steps for sequences designed using coevolution and conservation through a combination of computational and experimental methods. We simulated a repertoire of native and designed WW domain sequences to analyze early local contact formation and found that the N-terminal β-hairpin turn would not form correctly due to strong non-native local contacts in unfoldable sequences. Through a maximum likelihood approach, we identified five local contacts that play a critical role in folding, suggesting that a small subset of amino acid pairs can be used to solve the "needle in the haystack" problem to design foldable sequences. Thus, using the contact probability of those five local contacts that form during the early stage of folding, we built a classification model that predicts the foldability of a WW sequence with 81% accuracy. This classification model was used to redesign WW domain sequences that could not fold due to frustration and make them foldable by introducing a few mutations that led to the stabilization of these critical local contacts. The experimental analysis shows that a redesigned sequence folds and binds to polyproline peptides with a similar affinity as those observed for native WW domains. Overall, our analysis shows that evolutionary-designed sequences should not only satisfy the folding stability but also ensure a minimally frustrated folding landscape.
Collapse
Affiliation(s)
- Taisong Zou
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona 85287, United States
| | - Brian W Woodrum
- School of Molecular Sciences, Arizona State University, Tempe, Arizona 85287, United States
| | - Nicholas Halloran
- School of Molecular Sciences, Arizona State University, Tempe, Arizona 85287, United States
| | - Paul Campitelli
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona 85287, United States
| | - Andrey A Bobkov
- Conrad Prebys Center for Chemical Genomics, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, California 92037, United States
| | - Giovanna Ghirlanda
- School of Molecular Sciences, Arizona State University, Tempe, Arizona 85287, United States
| | - Sefika Banu Ozkan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona 85287, United States
| |
Collapse
|
40
|
Gianni S, Freiberger MI, Jemth P, Ferreiro DU, Wolynes PG, Fuxreiter M. Fuzziness and Frustration in the Energy Landscape of Protein Folding, Function, and Assembly. Acc Chem Res 2021; 54:1251-1259. [PMID: 33550810 PMCID: PMC8023570 DOI: 10.1021/acs.accounts.0c00813] [Citation(s) in RCA: 72] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Indexed: 12/20/2022]
Abstract
Are all protein interactions fully optimized? Do suboptimal interactions compromise specificity? What is the functional impact of frustration? Why does evolution not optimize some contacts? Proteins and their complexes are best described as ensembles of states populating an energy landscape. These ensembles vary in breadth from narrow ensembles clustered around a single average X-ray structure to broader ensembles encompassing a few different functional "taxonomic" states on to near continua of rapidly interconverting conformations, which are called "fuzzy" or even "intrinsically disordered". Here we aim to provide a comprehensive framework for confronting the structural and dynamical continuum of protein assemblies by combining the concepts of energetic frustration and interaction fuzziness. The diversity of the protein structural ensemble arises from the frustrated conflicts between the interactions that create the energy landscape. When frustration is minimal after folding, it results in a narrow ensemble, but residual frustrated interactions result in fuzzy ensembles, and this fuzziness allows a versatile repertoire of biological interactions. Here we discuss how fuzziness and frustration play off each other as proteins fold and assemble, viewing their significance from energetic, functional, and evolutionary perspectives.We demonstrate, in particular, that the common physical origin of both concepts is related to the ruggedness of the energy landscapes, intramolecular in the case of frustration and intermolecular in the case of fuzziness. Within this framework, we show that alternative sets of suboptimal contacts may encode specificity without achieving a single structural optimum. Thus, we demonstrate that structured complexes may not be optimized, and energetic frustration is realized via different sets of contacts leading to multiplicity of specific complexes. Furthermore, we propose that these suboptimal, frustrated, or fuzzy interactions are under evolutionary selection and expand the biological repertoire by providing a multiplicity of biological activities. In accord, we show that non-native interactions in folding or interaction landscapes can cooperate to generate diverse functional states, which are essential to facilitate adaptation to different cellular conditions. Thus, we propose that not fully optimized structures may actually be beneficial for biological activities of proteins via an alternative set of suboptimal interactions. The importance of such variability has not been recognized across different areas of biology.This account provides a modern view on folding, function, and assembly across the protein universe. The physical framework presented here is applicable to the structure and dynamics continuum of proteins and opens up new perspectives for drug design involving not fully structured, highly dynamic protein assemblies.
Collapse
Affiliation(s)
- Stefano Gianni
- Istituto
Pasteur - Fondazione Cenci Bolognetti, Dipartimento di Scienze Biochimiche
“A. Rossi Fanelli” and Istituto di Biologia e Patologia
Molecolari del CNR, Sapienza Università
di Roma, 00185 Rome, Italy
| | - María Inés Freiberger
- Protein
Physiology Lab, Departamento de Química Biológica, Facultad
de Ciencias Exactas y Naturales, Universidad
de Buenos Aires-CONICET-IQUIBICEN, 1428 Buenos Aires, Argentina
| | - Per Jemth
- Department
of Medical Biochemistry and Microbiology, Uppsala University, Husargatan 3, SE-75123 Uppsala, Sweden
| | - Diego U. Ferreiro
- Protein
Physiology Lab, Departamento de Química Biológica, Facultad
de Ciencias Exactas y Naturales, Universidad
de Buenos Aires-CONICET-IQUIBICEN, 1428 Buenos Aires, Argentina
| | - Peter G. Wolynes
- Center
for Theoretical Biological Physics, Rice
University, 6500 Main Street, Houston, Texas 77251-1892, United States
| | - Monika Fuxreiter
- MTA-DE
Laboratory of Protein Dynamics, Department of Biochemistry and Molecular
Biology, University of Debrecen, Nagyerdei krt 98, H-4032 Debrecen, Hungary
- Department
of Biomedical Sciences, University of Padova, Via Ugo Bassi 58/B, 35131 Padova, Italy
| |
Collapse
|
41
|
Crippa M, Andreghetti D, Capelli R, Tiana G. Evolution of frustrated and stabilising contacts in reconstructed ancient proteins. EUROPEAN BIOPHYSICS JOURNAL 2021; 50:699-712. [PMID: 33569610 PMCID: PMC8260555 DOI: 10.1007/s00249-021-01500-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Revised: 12/14/2020] [Accepted: 01/13/2021] [Indexed: 11/30/2022]
Abstract
Energetic properties of a protein are a major determinant of its evolutionary fitness. Using a reconstruction algorithm, dating the reconstructed proteins and calculating the interaction network between their amino acids through a coevolutionary approach, we studied how the interactions that stabilise 890 proteins, belonging to five families, evolved for billions of years. In particular, we focused our attention on the network of most strongly attractive contacts and on that of poorly optimised, frustrated contacts. Our results support the idea that the cluster of most attractive interactions extends its size along evolutionary time, but from the data, we cannot conclude that protein stability or that the degree of frustration tends always to decrease.
Collapse
Affiliation(s)
- Martina Crippa
- Department of Physics and Center for Complexity and Biosystems, Università degli Studi di Milano and INFN, via Celoria 16, 20133, Milan, Italy
- Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
| | - Damiano Andreghetti
- Department of Physics and Center for Complexity and Biosystems, Università degli Studi di Milano and INFN, via Celoria 16, 20133, Milan, Italy
| | - Riccardo Capelli
- Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
| | - Guido Tiana
- Department of Physics and Center for Complexity and Biosystems, Università degli Studi di Milano and INFN, via Celoria 16, 20133, Milan, Italy.
| |
Collapse
|
42
|
Thadani NN, Zhou Q, Reyes Gamas K, Butler S, Bueno C, Schafer NP, Morcos F, Wolynes PG, Suh J. Frustration and Direct-Coupling Analyses to Predict Formation and Function of Adeno-Associated Virus. Biophys J 2020; 120:489-503. [PMID: 33359833 DOI: 10.1016/j.bpj.2020.12.018] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Revised: 11/08/2020] [Accepted: 12/08/2020] [Indexed: 01/03/2023] Open
Abstract
Adeno-associated virus (AAV) is a promising gene therapy vector because of its efficient gene delivery and relatively mild immunogenicity. To improve delivery target specificity, researchers use combinatorial and rational library design strategies to generate novel AAV capsid variants. These approaches frequently propose high proportions of nonforming or noninfective capsid protein sequences that reduce the effective depth of synthesized vector DNA libraries, thereby raising the discovery cost of novel vectors. We evaluated two computational techniques for their ability to estimate the impact of residue mutations on AAV capsid protein-protein interactions and thus predict changes in vector fitness, reasoning that these approaches might inform the design of functionally enriched AAV libraries and accelerate therapeutic candidate identification. The Frustratometer computes an energy function derived from the energy landscape theory of protein folding. Direct-coupling analysis (DCA) is a statistical framework that captures residue coevolution within proteins. We applied the Frustratometer to select candidate protein residues predicted to favor assembled or disassembled capsid states, then predicted mutation effects at these sites using the Frustratometer and DCA. Capsid mutants were experimentally assessed for changes in virus formation, stability, and transduction ability. The Frustratometer-based metric showed a counterintuitive correlation with viral stability, whereas a DCA-derived metric was highly correlated with virus transduction ability in the small population of residues studied. Our results suggest that coevolutionary models may be able to elucidate complex capsid residue-residue interaction networks essential for viral function, but further study is needed to understand the relationship between protein energy simulations and viral capsid metastability.
Collapse
Affiliation(s)
| | - Qin Zhou
- Department of Biological Sciences, University of Texas at Dallas, Richardson, Texas
| | | | - Susan Butler
- Department of Bioengineering, Rice University, Houston, Texas
| | - Carlos Bueno
- Center for Theoretical Biological Physics, Rice University, Houston, Texas; Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas
| | - Nicholas P Schafer
- Center for Theoretical Biological Physics, Rice University, Houston, Texas; Department of Chemistry, Rice University, Houston, Texas
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, Texas; Center for Systems Biology, University of Texas at Dallas, Richardson, Texas; Department of Bioengineering, University of Texas at Dallas, Richardson, Texas
| | - Peter G Wolynes
- Center for Theoretical Biological Physics, Rice University, Houston, Texas; Department of Chemistry, Rice University, Houston, Texas; Department of Biosciences, Rice University, Houston, Texas; Department of Physics, Rice University, Houston, Texas
| | - Junghae Suh
- Department of Bioengineering, Rice University, Houston, Texas; Department of Biosciences, Rice University, Houston, Texas; Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas; Systems, Synthetic, and Physical Biology Program, Rice University, Houston, Texas.
| |
Collapse
|
43
|
Hu L, Hu P, Luo X, Yuan X, You ZH. Incorporating the Coevolving Information of Substrates in Predicting HIV-1 Protease Cleavage Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:2017-2028. [PMID: 31056514 DOI: 10.1109/tcbb.2019.2914208] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Human immunodeficiency virus 1 (HIV-1) protease (PR) plays a crucial role in the maturation of the virus. The study of substrate specificity of HIV-1 PR as a new endeavor strives to increase our ability to understand how HIV-1 PR recognizes its various cleavage sites. To predict HIV-1 PR cleavage sites, most of the existing approaches have been developed solely based on the homogeneity of substrate sequence information with supervised classification techniques. Although efficient, these approaches are found to be restricted to the ability of explaining their results and probably provide few insights into the mechanisms by which HIV-1 PR cleaves the substrates in a site-specific manner. In this work, a coevolutionary pattern-based prediction model for HIV-1 PR cleavage sites, namely EvoCleave, is proposed by integrating the coevolving information obtained from substrate sequences with a linear SVM classifier. The experiment results showed that EvoCleave yielded a very promising performance in terms of ROC analysis and f-measure. We also prospectively assessed the biological significance of coevolutionary patterns by applying them to study three fundamental issues of HIV-1 PR cleavage site. The analysis results demonstrated that the coevolutionary patterns offered valuable insights into the understanding of substrate specificity of HIV-1 PR.
Collapse
|
44
|
Tian P, Best RB. Exploring the sequence fitness landscape of a bridge between protein folds. PLoS Comput Biol 2020; 16:e1008285. [PMID: 33048928 PMCID: PMC7553338 DOI: 10.1371/journal.pcbi.1008285] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 08/24/2020] [Indexed: 12/15/2022] Open
Abstract
Most foldable protein sequences adopt only a single native fold. Recent protein design studies have, however, created protein sequences which fold into different structures apon changes of environment, or single point mutation, the best characterized example being the switch between the folds of the GA and GB binding domains of streptococcal protein G. To obtain further insight into the design of sequences which can switch folds, we have used a computational model for the fitness landscape of a single fold, built from the observed sequence variation of protein homologues. We have recently shown that such coevolutionary models can be used to design novel foldable sequences. By appropriately combining two of these models to describe the joint fitness landscape of GA and GB, we are able to describe the propensity of a given sequence for each of the two folds. We have successfully tested the combined model against the known series of designed GA/GB hybrids. Using Monte Carlo simulations on this landscape, we are able to identify pathways of mutations connecting the two folds. In the absence of a requirement for domain stability, the most frequent paths go via sequences in which neither domain is stably folded, reminiscent of the propensity for certain intrinsically disordered proteins to fold into different structures according to context. Even if the folded state is required to be stable, we find that there is nonetheless still a wide range of sequences which are close to the transition region and therefore likely fold switches, consistent with recent estimates that fold switching may be more widespread than had been thought.
Collapse
Affiliation(s)
- Pengfei Tian
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, U.S.A
| | - Robert B. Best
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, U.S.A
| |
Collapse
|
45
|
Molecular origins of folding rate differences in the thioredoxin family. Biochem J 2020; 477:1083-1087. [DOI: 10.1042/bcj20190864] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2019] [Revised: 02/19/2020] [Accepted: 02/20/2020] [Indexed: 12/13/2022]
Abstract
Thioredoxins are a family of conserved oxidoreductases responsible for maintaining redox balance within cells. They have also served as excellent model systems for protein design and engineering studies particularly through ancestral sequence reconstruction methods. The recent work by Gamiz-Arco et al. [Biochem J (2019) 476, 3631–3647] answers fundamental questions on how specific sequence differences can contribute to differences in folding rates between modern and ancient thioredoxins but also among a selected subset of modern thioredoxins. They surprisingly find that rapid unassisted folding, a feature of ancient thioredoxins, is not conserved in the modern descendants suggestive of co-evolution of better folding machinery that likely enabled the accumulation of mutations that slow-down folding. The work thus provides an interesting take on the expected folding-stability-function constraint while arguing for additional factors that contribute to sequence evolution and hence impact folding efficiency.
Collapse
|
46
|
Epistatic contributions promote the unification of incompatible models of neutral molecular evolution. Proc Natl Acad Sci U S A 2020; 117:5873-5882. [PMID: 32123092 PMCID: PMC7084075 DOI: 10.1073/pnas.1913071117] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Mathematical models of evolution help us understand mechanisms driving protein-sequence change. Previous models recapitulate a disjoint subset of statistical features of natural sequences. We present a neutral evolution model that unifies features including extreme variance of the molecular clock’s tick rate and the observation of an evolutionary Stokes shift, an irreversible effect of mutations in the fitness landscape during sequence evolution. We show that interactions between amino acid sites, which inform our fitness metric, are required to observe these features. These interactions are inferred by using direct coupling analysis, which has been successfully utilized to predict protein structures, dynamics, and complexes from coevolutionary information. We anticipate our model will have applications in phylogenetics, ancestral reconstruction of sequences, and protein design. We introduce a model of amino acid sequence evolution that accounts for the statistical behavior of real sequences induced by epistatic interactions. We base the model dynamics on parameters derived from multiple sequence alignments analyzed by using direct coupling analysis methodology. Known statistical properties such as overdispersion, heterotachy, and gamma-distributed rate-across-sites are shown to be emergent properties of this model while being consistent with neutral evolution theory, thereby unifying observations from previously disjointed evolutionary models of sequences. The relationship between site restriction and heterotachy is characterized by tracking the effective alphabet dynamics of sites. We also observe an evolutionary Stokes shift in the fitness of sequences that have undergone evolution under our simulation. By analyzing the structural information of some proteins, we corroborate that the strongest Stokes shifts derive from sites that physically interact in networks near biochemically important regions. Perspectives on the implementation of our model in the context of the molecular clock are discussed.
Collapse
|
47
|
Rivoire O. Parsimonious evolutionary scenario for the origin of allostery and coevolution patterns in proteins. Phys Rev E 2020; 100:032411. [PMID: 31640027 DOI: 10.1103/physreve.100.032411] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Indexed: 12/16/2022]
Abstract
Proteins display generic properties that are challenging to explain by direct selection, notably allostery, the capacity to be regulated through long-range effects, and evolvability, the capacity to adapt to new selective pressures. An evolutionary scenario is proposed where proteins acquire these two features indirectly as a by-product of their selection for a more fundamental property, exquisite discrimination, the capacity to bind discriminatively very similar ligands. Achieving this task is shown to typically require proteins to undergo a conformational change. We argue that physical and evolutionary constraints impel this change to be controlled by a group of sites extending from the binding site. Proteins can thus acquire a latent potential for allosteric regulation and evolutionary adaptation because of long-range effects that initially arise as evolutionary spandrels. This scenario accounts for the groups of conserved and coevolving residues observed in multiple sequence alignments. However, we propose that most pairs of coevolving and contacting residues inferred from such alignments have a different origin, related to thermal stability. A physical model is presented that illustrates this evolutionary scenario and its implications. The scenario can be implemented in experiments of protein evolution to directly test its predictions.
Collapse
Affiliation(s)
- Olivier Rivoire
- Center for Interdisciplinary Research in Biology, Collège de France, Centre National de la Recherche Scientifique, INSERM, PSL Research University, 75005 Paris, France
| |
Collapse
|
48
|
Rizzato F, Coucke A, de Leonardis E, Barton JP, Tubiana J, Monasson R, Cocco S. Inference of compressed Potts graphical models. Phys Rev E 2020; 101:012309. [PMID: 32069678 DOI: 10.1103/physreve.101.012309] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Indexed: 06/10/2023]
Abstract
We consider the problem of inferring a graphical Potts model on a population of variables. This inverse Potts problem generally involves the inference of a large number of parameters, often larger than the number of available data, and, hence, requires the introduction of regularization. We study here a double regularization scheme, in which the number of Potts states (colors) available to each variable is reduced and interaction networks are made sparse. To achieve the color compression, only Potts states with large empirical frequency (exceeding some threshold) are explicitly modeled on each site, while the others are grouped into a single state. We benchmark the performances of this mixed regularization approach, with two inference algorithms, adaptive cluster expansion (ACE) and pseudolikelihood maximization (PLM), on synthetic data obtained by sampling disordered Potts models on Erdős-Rényi random graphs. We show in particular that color compression does not affect the quality of reconstruction of the parameters corresponding to high-frequency symbols, while drastically reducing the number of the other parameters and thus the computational time. Our procedure is also applied to multisequence alignments of protein families, with similar results.
Collapse
Affiliation(s)
- Francesca Rizzato
- Laboratoire de Physique de l'Ecole normale supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France
| | - Alice Coucke
- Laboratoire de Physique de l'Ecole normale supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France
| | - Eleonora de Leonardis
- Laboratoire de Physique de l'Ecole normale supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France
| | - John P Barton
- Department of Physics and Astronomy, University of California, Riverside, 900 University Avenue, Riverside, California 92521, USA
| | - Jérôme Tubiana
- Laboratoire de Physique de l'Ecole normale supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France
| | - Rémi Monasson
- Laboratoire de Physique de l'Ecole normale supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France
| | - Simona Cocco
- Laboratoire de Physique de l'Ecole normale supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France
| |
Collapse
|
49
|
Hayes RL, Vilseck JZ, Brooks CL. Approaching protein design with multisite λ dynamics: Accurate and scalable mutational folding free energies in T4 lysozyme. Protein Sci 2019; 27:1910-1922. [PMID: 30175503 DOI: 10.1002/pro.3500] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Revised: 08/06/2018] [Accepted: 08/15/2018] [Indexed: 12/14/2022]
Abstract
The estimation of changes in free energy upon mutation is central to the problem of protein design. Modern protein design methods have had remarkable success over a wide range of design targets, but are reaching their limits in ligand binding and enzyme design due to insufficient accuracy in mutational free energies. Alchemical free energy calculations have the potential to supplement modern design methods through more accurate molecular dynamics based prediction of free energy changes, but suffer from high computational cost. Multisite λ dynamics (MSλD) is a particularly efficient and scalable free energy method with potential to explore combinatorially large sequence spaces inaccessible with other free energy methods. This work aims to quantify the accuracy of MSλD and demonstrate its scalability. We apply MSλD to the classic problem of calculating folding free energies in T4 lysozyme, a system with a wealth of experimental measurements. Single site mutants considering 32 mutations show remarkable agreement with experiment with a Pearson correlation of 0.914 and mean unsigned error of 1.19 kcal/mol. Multisite mutants in systems with up to five concurrent mutations spanning 240 different sequences show comparable agreement with experiment. These results demonstrate the promise of MSλD in exploring large sequence spaces for protein design.
Collapse
Affiliation(s)
- Ryan L Hayes
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Jonah Z Vilseck
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan, 48109
| | - Charles L Brooks
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan, 48109.,Biophysics Program, University of Michigan, Ann Arbor, Michigan, 48109
| |
Collapse
|
50
|
Rodriguez Horta E, Barrat-Charlaix P, Weigt M. Toward Inferring Potts Models for Phylogenetically Correlated Sequence Data. ENTROPY 2019; 21:1090. [PMCID: PMC7514434 DOI: 10.3390/e21111090] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Accepted: 11/06/2019] [Indexed: 06/16/2023]
Abstract
Global coevolutionary models of protein families have become increasingly popular due to their capacity to predict residue–residue contacts from sequence information, but also to predict fitness effects of amino acid substitutions or to infer protein–protein interactions. The central idea in these models is to construct a probability distribution, a Potts model, that reproduces single and pairwise frequencies of amino acids found in natural sequences of the protein family. This approach treats sequences from the family as independent samples, completely ignoring phylogenetic relations between them. This simplification is known to lead to potentially biased estimates of the parameters of the model, decreasing their biological relevance. Current workarounds for this problem, such as reweighting sequences, are poorly understood and not principled. Here, we propose an inference scheme that takes the phylogeny of a protein family into account in order to correct biases in estimating the frequencies of amino acids. Using artificial data, we show that a Potts model inferred using these corrected frequencies performs better in predicting contacts and fitness effect of mutations. First, only partially successful tests on real protein data are presented, too.
Collapse
Affiliation(s)
- Edwin Rodriguez Horta
- Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Institut de Biologie Paris-Seine, Sorbonne Université, Centre national de la recherche scientifique (CNRS), 75005 Paris, France; (E.R.H.); (P.B.-C.)
- Group of Complex Systems and Statistical Physics, Department of Theoretical Physics, Physics Faculty, University of Havana, La Habana 10400, Cuba
| | - Pierre Barrat-Charlaix
- Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Institut de Biologie Paris-Seine, Sorbonne Université, Centre national de la recherche scientifique (CNRS), 75005 Paris, France; (E.R.H.); (P.B.-C.)
- Biozentrum, University of Basel, 4056 Basel, Switzerland
| | - Martin Weigt
- Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Institut de Biologie Paris-Seine, Sorbonne Université, Centre national de la recherche scientifique (CNRS), 75005 Paris, France; (E.R.H.); (P.B.-C.)
| |
Collapse
|