1
|
Cocco S, Posani L, Monasson R. Functional effects of mutations in proteins can be predicted and interpreted by guided selection of sequence covariation information. Proc Natl Acad Sci U S A 2024; 121:e2312335121. [PMID: 38889151 PMCID: PMC11214004 DOI: 10.1073/pnas.2312335121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 04/21/2024] [Indexed: 06/20/2024] Open
Abstract
Predicting the effects of one or more mutations to the in vivo or in vitro properties of a wild-type protein is a major computational challenge, due to the presence of epistasis, that is, of interactions between amino acids in the sequence. We introduce a computationally efficient procedure to build minimal epistatic models to predict mutational effects by combining evolutionary (homologous sequence) and few mutational-scan data. Mutagenesis measurements guide the selection of links in a sparse graphical model, while the parameters on the nodes and the edges are inferred from sequence data. We show, on 10 mutational scans, that our pipeline exhibits performances comparable to state-of-the-art deep networks trained on many more data, while requiring much less parameters and being hence more interpretable. In particular, the identified interactions adapt to the wild-type protein and to the fitness or biochemical property experimentally measured, mostly focus on key functional sites, and are not necessarily related to structural contacts. Therefore, our method is able to extract information relevant for one mutational experiment from homologous sequence data reflecting the multitude of structural and functional constraints acting on proteins throughout evolution.
Collapse
Affiliation(s)
- Simona Cocco
- Laboratory of Physics of the Ecole Normale Supérieure, CNRS UMR8023 and Paris Sciences & Lettres (PSL) Research, Sorbonne Université, 75005Paris, France
| | - Lorenzo Posani
- Laboratory of Physics of the Ecole Normale Supérieure, CNRS UMR8023 and Paris Sciences & Lettres (PSL) Research, Sorbonne Université, 75005Paris, France
| | - Rémi Monasson
- Laboratory of Physics of the Ecole Normale Supérieure, CNRS UMR8023 and Paris Sciences & Lettres (PSL) Research, Sorbonne Université, 75005Paris, France
| |
Collapse
|
2
|
Xie T, Huang J. Can Protein Structure Prediction Methods Capture Alternative Conformations of Membrane Transporters? J Chem Inf Model 2024; 64:3524-3536. [PMID: 38564295 DOI: 10.1021/acs.jcim.3c01936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Understanding the conformational dynamics of proteins, such as the inward-facing (IF) and outward-facing (OF) transition observed in transporters, is vital for elucidating their functional mechanisms. Despite significant advances in protein structure prediction (PSP) over the past three decades, most efforts have been focused on single-state prediction, leaving multistate or alternative conformation prediction (ACP) relatively unexplored. This discrepancy has led to the development of highly accurate PSP methods such as AlphaFold, yet their capabilities for ACP remain limited. To investigate the performance of current PSP methods in ACP, we curated a data set, named IOMemP, consisting of 32 experimentally determined high-resolution IF and OF structures of 16 membrane proteins with substantial conformational changes. We benchmarked 12 representative PSP methods, along with two recent multistate methods based on AlphaFold, against this data set. Our findings reveal a remarkably consistent preference for specific states across various PSP methods. We elucidated how coevolution information in MSAs influences state preference. Moreover, we showed that AlphaFold, when excluding coevolution information, estimated similar energies between the experimental IF and OF conformations, indicating that the energy model learned by AlphaFold is not biased toward any particular state. Our IOMemP data set and benchmark results are anticipated to advance the development of robust ACP methods.
Collapse
Affiliation(s)
- Tengyu Xie
- College of Life Science, Zhejiang University, HangZhou Zhejiang 310058, China
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, HangZhou Zhejiang 310024, China
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, HangZhou Zhejiang 310024, China
| | - Jing Huang
- College of Life Science, Zhejiang University, HangZhou Zhejiang 310058, China
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, HangZhou Zhejiang 310024, China
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, HangZhou Zhejiang 310024, China
| |
Collapse
|
3
|
Schafer JW, Porter LL. Evolutionary selection of proteins with two folds. Nat Commun 2023; 14:5478. [PMID: 37673981 PMCID: PMC10482954 DOI: 10.1038/s41467-023-41237-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 08/24/2023] [Indexed: 09/08/2023] Open
Abstract
Although most globular proteins fold into a single stable structure, an increasing number have been shown to remodel their secondary and tertiary structures in response to cellular stimuli. State-of-the-art algorithms predict that these fold-switching proteins adopt only one stable structure, missing their functionally critical alternative folds. Why these algorithms predict a single fold is unclear, but all of them infer protein structure from coevolved amino acid pairs. Here, we hypothesize that coevolutionary signatures are being missed. Suspecting that single-fold variants could be masking these signatures, we developed an approach, called Alternative Contact Enhancement (ACE), to search both highly diverse protein superfamilies-composed of single-fold and fold-switching variants-and protein subfamilies with more fold-switching variants. ACE successfully revealed coevolution of amino acid pairs uniquely corresponding to both conformations of 56/56 fold-switching proteins from distinct families. Then, we used ACE-derived contacts to (1) predict two experimentally consistent conformations of a candidate protein with unsolved structure and (2) develop a blind prediction pipeline for fold-switching proteins. The discovery of widespread dual-fold coevolution indicates that fold-switching sequences have been preserved by natural selection, implying that their functionalities provide evolutionary advantage and paving the way for predictions of diverse protein structures from single sequences.
Collapse
Affiliation(s)
- Joseph W Schafer
- National Library of Medicine, National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Lauren L Porter
- National Library of Medicine, National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, 20894, USA.
- National Heart, Lung, and Blood Institute, Biochemistry and Biophysics Center, National Institutes of Health, Bethesda, MD, 20892, USA.
| |
Collapse
|
4
|
Luan Y, Tang Z, He Y, Xie Z. Intra-Domain Residue Coevolution in Transcription Factors Contributes to DNA Binding Specificity. Microbiol Spectr 2023; 11:e0365122. [PMID: 36943132 PMCID: PMC10100741 DOI: 10.1128/spectrum.03651-22] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 02/22/2023] [Indexed: 03/23/2023] Open
Abstract
Understanding the basis of the DNA-binding specificity of transcription factors (TFs) has been of long-standing interest. Despite extensive efforts to map millions of putative TF binding sequences, identifying the critical determinants for DNA binding specificity remains a major challenge. The coevolution of residues in proteins occurs due to a shared evolutionary history. However, it is unclear how coevolving residues in TFs contribute to DNA binding specificity. Here, we systematically collected publicly available data sets from multiple large-scale high-throughput TF-DNA interaction screening experiments for the major TF families with large numbers of TF members. These families included the Homeobox, HLH, bZIP_1, Ets, HMG_box, ZF-C4, and Zn_clus TFs. We detected TF subclass-determining sites (TSDSs) and showed that the TSDSs were more likely to coevolve with other TSDSs than with non-TSDSs, particularly for the Homeobox, HLH, Ets, bZIP_1, and HMG_box TF families. By in silico modeling, we showed that mutation of the highly coevolving residues could significantly reduce the stability of the TF-DNA complex. The distant residues from the DNA interface also contributed to TF-DNA binding activity. Overall, our study gave evidence that coevolved residues relate to transcriptional regulation and provided insights into the potential application of engineered DNA-binding domains and proteins. IMPORTANCE While unraveling DNA-binding specificity of TFs is the key to understanding the basis and molecular mechanism of gene expression regulation, identifying the critical determinants that contribute to DNA binding specificity remains a major challenge. In this study, we provided evidence showing that coevolving residues in TF domains contributed to DNA binding specificity. We demonstrated that the TSDSs were more likely to coevolve with other TSDSs than with non-TSDSs. Mutation of the coevolving residue pairs (CRPs) could significantly reduce the stability of THE TF-DNA complex, and even the distant residues from the DNA interface contribute to TF-DNA binding activity. Collectively, our study expands our knowledge of the interactions among coevolved residues in TFs, tertiary contacting, and functional importance in refined transcriptional regulation. Understanding the impact of coevolving residues in TFs will help understand the details of transcription of gene regulation and advance the application of engineered DNA-binding domains and protein.
Collapse
Affiliation(s)
- Yizhao Luan
- State Key Laboratory of Ophthalmology, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Zehua Tang
- State Key Laboratory of Ophthalmology, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Yao He
- State Key Laboratory of Ophthalmology, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Zhi Xie
- State Key Laboratory of Ophthalmology, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
5
|
Jia K, Kilinc M, Jernigan RL. Functional Protein Dynamics Directly from Sequences. J Phys Chem B 2023; 127:1914-1921. [PMID: 36848294 PMCID: PMC10009744 DOI: 10.1021/acs.jpcb.2c05766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 02/15/2023] [Indexed: 03/01/2023]
Abstract
The sequence correlations within a protein multiple sequence alignment are routinely being used to predict contacts within its structure, but here we point out that these data can also be used to predict a protein's dynamics directly. The elastic network protein dynamics models rely directly upon the contacts, and the normal modes of motion are obtained from the decomposition of the inverse of the contact map. To make the direct connection between sequence and dynamics, it is necessary to apply coarse-graining to the structure at the level of one point per amino acid, which has often been done, and protein coarse-grained dynamics from elastic network models has been highly successful, particularly in representing the large-scale motions of proteins that usually relate closely to their functions. The interesting implication of this is that it is not necessary to know the structure itself to obtain its dynamics and instead to use the sequence information directly to obtain the dynamics.
Collapse
Affiliation(s)
- Kejue Jia
- Bioinformatics and Computational
Biology Program and Roy J. Carver Department of Biochemistry, Biophysics
and Molecular Biology Iowa State University, Ames, Iowa 50011, United States
| | - Mesih Kilinc
- Bioinformatics and Computational
Biology Program and Roy J. Carver Department of Biochemistry, Biophysics
and Molecular Biology Iowa State University, Ames, Iowa 50011, United States
| | - Robert L. Jernigan
- Bioinformatics and Computational
Biology Program and Roy J. Carver Department of Biochemistry, Biophysics
and Molecular Biology Iowa State University, Ames, Iowa 50011, United States
| |
Collapse
|
6
|
Malinverni D, Babu MM. Data-driven design of orthogonal protein-protein interactions. Sci Signal 2023; 16:eabm4484. [PMID: 36853962 DOI: 10.1126/scisignal.abm4484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/02/2023]
Abstract
Engineering protein-protein interactions to generate new functions presents a challenge with great potential for many applications, ranging from therapeutics to synthetic biology. To avoid unwanted cross-talk with preexisting protein interaction networks in a cell, the specificity and selectivity of newly engineered proteins must be controlled. Here, we developed a computational strategy that mimics gene duplication and the divergence of preexisting interacting protein pairs to design new interactions. We used the bacterial PhoQ-PhoP two-component system as a model system to demonstrate the feasibility of this strategy and validated the approach with known experimental results. The designed protein pairs are predicted to exclusively interact with each other and to be insulated from potential cross-talk with their native partners. Thus, our approach enables exploration of uncharted regions of the protein sequence space and the design of new interacting protein pairs.
Collapse
Affiliation(s)
- Duccio Malinverni
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK.,Department of Structural Biology and Center of Excellence for Data Driven Discovery, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - M Madan Babu
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK.,Department of Structural Biology and Center of Excellence for Data Driven Discovery, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| |
Collapse
|
7
|
Krishnamohan A, Hamilton GL, Goutam R, Sanabria H, Morcos F. Coevolution and smFRET Enhances Conformation Sampling and FRET Experimental Design in Tandem PDZ1-2 Proteins. J Phys Chem B 2023; 127:884-898. [PMID: 36693159 PMCID: PMC9900596 DOI: 10.1021/acs.jpcb.2c06720] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
The structural flexibility of proteins is crucial for their functions. Many experimental and computational approaches can probe protein dynamics across a range of time and length-scales. Integrative approaches synthesize the complementary outputs of these techniques and provide a comprehensive view of the dynamic conformational space of proteins, including the functionally relevant limiting conformational states and transition pathways between them. Here, we introduce an integrative paradigm to model the conformational states of multidomain proteins. As a model system, we use the first two tandem PDZ domains of postsynaptic density protein 95. First, we utilize available sequence information collected from genomic databases to identify potential amino acid interactions in the PDZ1-2 tandem that underlie modeling of the functionally relevant conformations maintained through evolution. This was accomplished through combination of coarse-grained structural modeling with outputs from direct coupling analysis measuring amino acid coevolution, a hybrid approach called SBM+DCA. We recapitulated five distinct, experimentally derived PDZ1-2 tandem conformations. In addition, SBM+DCA unveiled an unidentified, twisted conformation of the PDZ1-2 tandem. Finally, we implemented an integrative framework for the design of single-molecule Förster resonance energy transfer (smFRET) experiments incorporating the outputs of SBM+DCA with simulated FRET observables. This resulting FRET network is designed to mutually resolve the predicted limiting state conformations through global analysis. Using simulated FRET observables, we demonstrate that structural modeling with the newly designed FRET network is expected to outperform a previously used empirical FRET network at resolving all states simultaneously. Integrative approaches to experimental design have the potential to provide a new level of detail in characterizing the evolutionarily conserved conformational landscapes of proteins, and thus new insights into functional relevance of protein dynamics in biological function.
Collapse
Affiliation(s)
- Aishwarya Krishnamohan
- Departments of Biological Sciences and Bioengineering, University of Texas at Dallas, Richardson, Texas75080, United States
| | - George L Hamilton
- Department of Physics and Astronomy, Clemson University, Clemson, South Carolina29634, United States
| | - Rajen Goutam
- Department of Physics and Astronomy, Clemson University, Clemson, South Carolina29634, United States
| | - Hugo Sanabria
- Department of Physics and Astronomy, Clemson University, Clemson, South Carolina29634, United States
| | - Faruck Morcos
- Departments of Biological Sciences and Bioengineering, University of Texas at Dallas, Richardson, Texas75080, United States.,Center for Systems Biology, University of Texas at Dallas, Richardson, Texas75080, United States
| |
Collapse
|
8
|
Torielli L, Serapian SA, Mussolin L, Moroni E, Colombo G. Integrating Protein Interaction Surface Prediction with a Fragment-Based Drug Design: Automatic Design of New Leads with Fragments on Energy Surfaces. J Chem Inf Model 2023; 63:343-353. [PMID: 36574607 PMCID: PMC9832486 DOI: 10.1021/acs.jcim.2c01408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Protein-protein interactions (PPIs) have emerged in the past years as significant pharmacological targets in the development of new therapeutics due to their key roles in determining pathological pathways. Herein, we present fragments on energy surfaces, a simple and general design strategy that integrates the analysis of the dynamic and energetic signatures of proteins to unveil the substructures involved in PPIs, with docking, selection, and combination of drug-like fragments to generate new PPI inhibitor candidates. Specifically, structural representatives of the target protein are used as inputs for the blind physics-based prediction of potential protein interaction surfaces using the matrix of low coupling energy decomposition method. The predicted interaction surfaces are subdivided into overlapping windows that are used as templates to direct the docking and combination of fragments representative of moieties typically found in active drugs. This protocol is then applied and validated using structurally diverse, important PPI targets as test systems. We demonstrate that our approach facilitates the exploration of the molecular diversity space of potential ligands, with no requirement of prior information on the location and properties of interaction surfaces or on the structures of potential lead compounds. Importantly, the hit molecules that emerge from our ab initio design share high chemical similarity with experimentally tested active PPI inhibitors. We propose that the protocol we describe here represents a valuable means of generating initial leads against difficult targets for further development and refinement.
Collapse
Affiliation(s)
- Luca Torielli
- Department
of Chemistry, University of Pavia, Via Taramelli 12, Pavia27100, Italy
| | - Stefano A. Serapian
- Department
of Chemistry, University of Pavia, Via Taramelli 12, Pavia27100, Italy
| | - Lara Mussolin
- Department
of Woman’s and Child’s Health, Pediatric Hematology,
Oncology and Stem Cell Transplant Center, University of Padua, Via Giustiniani, 3, Padua35128, Italy,Istituto
di Ricerca Pediatrica Città della Speranza, Corso Stati Uniti, 4 F, Padova35127, Italy
| | | | - Giorgio Colombo
- Department
of Chemistry, University of Pavia, Via Taramelli 12, Pavia27100, Italy,
| |
Collapse
|
9
|
van Keulen SC, Martin J, Colizzi F, Frezza E, Trpevski D, Diaz NC, Vidossich P, Rothlisberger U, Hellgren Kotaleski J, Wade RC, Carloni P. Multiscale molecular simulations to investigate adenylyl cyclase‐based signaling in the brain. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1623] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Siri C. van Keulen
- Computational Structural Biology Group, Bijvoet Center for Biomolecular Research, Science for Life, Faculty of Science – Chemistry Utrecht University Utrecht The Netherlands
| | - Juliette Martin
- CNRS, UMR 5086 Molecular Microbiology and Structural Biochemistry University of Lyon Lyon France
| | - Francesco Colizzi
- Molecular Ocean Laboratory, Department of Marine Biology and Oceanography Institute of Marine Sciences, ICM‐CSIC Barcelona Spain
| | - Elisa Frezza
- Université Paris Cité, CiTCoM, CNRS Paris France
| | - Daniel Trpevski
- Science for Life Laboratory, School of Electrical Engineering and Computer Science KTH Royal Institute of Technology Stockholm
| | - Nuria Cirauqui Diaz
- CNRS, UMR 5086 Molecular Microbiology and Structural Biochemistry University of Lyon Lyon France
| | - Pietro Vidossich
- Molecular Modeling and Drug Discovery Lab Istituto Italiano di Tecnologia Genoa Italy
| | - Ursula Rothlisberger
- Laboratory of Computational Chemistry and Biochemistry Ecole Polytechnique Fédérale de Lausanne (EPFL) Lausanne
| | - Jeanette Hellgren Kotaleski
- Science for Life Laboratory, School of Electrical Engineering and Computer Science KTH Royal Institute of Technology Stockholm
- Department of Neuroscience Karolinska Institute Stockholm
| | - Rebecca C. Wade
- Molecular and Cellular Modeling Group Heidelberg Institute for Theoretical Studies (HITS) Heidelberg Germany
- Center for Molecular Biology (ZMBH), DKFZ‐ZMBH Alliance, and Interdisciplinary Center for Scientific Computing (IWR) Heidelberg University Heidelberg Germany
| | - Paolo Carloni
- Institute for Neuroscience and Medicine (INM‐9) and Institute for Advanced Simulations (IAS‐5) “Computational biomedicine” Forschungszentrum Jülich Jülich Germany
- INM‐11 JARA‐Institute: Molecular Neuroscience and Neuroimaging Forschungszentrum Jülich Jülich Germany
| |
Collapse
|
10
|
Biswas A, Haldane A, Levy RM. Limits to detecting epistasis in the fitness landscape of HIV. PLoS One 2022; 17:e0262314. [PMID: 35041711 PMCID: PMC8765623 DOI: 10.1371/journal.pone.0262314] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 12/20/2021] [Indexed: 02/05/2023] Open
Abstract
The rapid evolution of HIV is constrained by interactions between mutations which affect viral fitness. In this work, we explore the role of epistasis in determining the mutational fitness landscape of HIV for multiple drug target proteins, including Protease, Reverse Transcriptase, and Integrase. Epistatic interactions between residues modulate the mutation patterns involved in drug resistance, with unambiguous signatures of epistasis best seen in the comparison of the Potts model predicted and experimental HIV sequence “prevalences” expressed as higher-order marginals (beyond triplets) of the sequence probability distribution. In contrast, experimental measures of fitness such as viral replicative capacities generally probe fitness effects of point mutations in a single background, providing weak evidence for epistasis in viral systems. The detectable effects of epistasis are obscured by higher evolutionary conservation at sites. While double mutant cycles in principle, provide one of the best ways to probe epistatic interactions experimentally without reference to a particular background, we show that the analysis is complicated by the small dynamic range of measurements. Overall, we show that global pairwise interaction Potts models are necessary for predicting the mutational landscape of viral proteins.
Collapse
Affiliation(s)
- Avik Biswas
- Department of Physics, Temple University, Philadelphia, PA, United States of America
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA, United States of America
| | - Allan Haldane
- Department of Physics, Temple University, Philadelphia, PA, United States of America
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA, United States of America
| | - Ronald M. Levy
- Department of Physics, Temple University, Philadelphia, PA, United States of America
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA, United States of America
- Department of Chemistry, Temple University, Philadelphia, PA, United States of America
- * E-mail:
| |
Collapse
|
11
|
Do HN, Haldane A, Levy RM, Miao Y. Unique features of different classes of G-protein-coupled receptors revealed from sequence coevolutionary and structural analysis. Proteins 2022; 90:601-614. [PMID: 34599827 PMCID: PMC8738117 DOI: 10.1002/prot.26256] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 09/21/2021] [Accepted: 09/27/2021] [Indexed: 02/03/2023]
Abstract
G-protein-coupled receptors (GPCRs) are the largest family of human membrane proteins and represent the primary targets of about one third of currently marketed drugs. Despite the critical importance, experimental structures have been determined for only a limited portion of GPCRs and functional mechanisms of GPCRs remain poorly understood. Here, we have constructed novel sequence coevolutionary models of the A and B classes of GPCRs and compared them with residue contact frequency maps generated with available experimental structures. Significant portions of structural residue contacts were successfully detected in the sequence-based covariational models. "Exception" residue contacts predicted from sequence coevolutionary models but not available structures added missing links that were important for GPCR activation and allosteric modulation. Moreover, we identified distinct residue contacts involving different sets of functional motifs for GPCR activation, such as the Na+ pocket, CWxP, DRY, PIF, and NPxxY motifs in the class A and the HETx and PxxG motifs in the class B. Finally, we systematically uncovered critical residue contacts tuned by allosteric modulation in the two classes of GPCRs, including those from the activation motifs and particularly the extracellular and intracellular loops in class A GPCRs. These findings provide a promising framework for rational design of ligands to regulate GPCR activation and allosteric modulation.
Collapse
Affiliation(s)
- Hung N Do
- The Center for Computational Biology and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas 66047
| | - Allan Haldane
- Department of Chemistry, Center for Biophysics and Computational Biology, Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania 19122,Corresponding authors: and
| | - Ronald M Levy
- Department of Chemistry, Center for Biophysics and Computational Biology, Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania 19122
| | - Yinglong Miao
- The Center for Computational Biology and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas 66047,Corresponding authors: and
| |
Collapse
|
12
|
Si Y, Zhang Y, Yan C. A reproducibility analysis-based statistical framework for residue-residue evolutionary coupling detection. Brief Bioinform 2022; 23:6509046. [PMID: 35037015 DOI: 10.1093/bib/bbab576] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 11/26/2021] [Accepted: 12/15/2021] [Indexed: 11/14/2022] Open
Abstract
Direct coupling analysis (DCA) has been widely used to infer evolutionary coupled residue pairs from the multiple sequence alignment (MSA) of homologous sequences. However, effectively selecting residue pairs with significant evolutionary couplings according to the result of DCA is a non-trivial task. In this study, we developed a general statistical framework for significant evolutionary coupling detection, referred to as irreproducible discovery rate (IDR)-DCA, which is based on reproducibility analysis of the coupling scores obtained from DCA on manually created MSA replicates. IDR-DCA was applied to select residue pairs for contact prediction for monomeric proteins, protein-protein interactions and monomeric RNAs, in which three different versions of DCA were applied. We demonstrated that with the application of IDR-DCA, the residue pairs selected using a universal threshold always yielded stable performance for contact prediction. Comparing with the application of carefully tuned coupling score cutoffs, IDR-DCA always showed better performance. The robustness of IDR-DCA was also supported through the MSA downsampling analysis. We further demonstrated the effectiveness of applying constraints obtained from residue pairs selected by IDR-DCA to assist RNA secondary structure prediction.
Collapse
Affiliation(s)
- Yunda Si
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Yi Zhang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Chengfei Yan
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| |
Collapse
|
13
|
Webber A, Ratnaweera M, Harris A, Luisi BF, Ntsogo Enguéné VY. A Model for Allosteric Communication in Drug Transport by the AcrAB-TolC Tripartite Efflux Pump. Antibiotics (Basel) 2022; 11:52. [PMID: 35052929 PMCID: PMC8773123 DOI: 10.3390/antibiotics11010052] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 12/22/2021] [Accepted: 12/28/2021] [Indexed: 01/30/2023] Open
Abstract
RND family efflux pumps are complex macromolecular machines involved in multidrug resistance by extruding antibiotics from the cell. While structural studies and molecular dynamics simulations have provided insights into the architecture and conformational states of the pumps, the path followed by conformational changes from the inner membrane protein (IMP) to the periplasmic membrane fusion protein (MFP) and to the outer membrane protein (OMP) in tripartite efflux assemblies is not fully understood. Here, we investigated AcrAB-TolC efflux pump's allostery by comparing resting and transport states using difference distance matrices supplemented with evolutionary couplings data and buried surface area measurements. Our analysis indicated that substrate binding by the IMP triggers quaternary level conformational changes in the MFP, which induce OMP to switch from the closed state to the open state, accompanied by a considerable increase in the interface area between the MFP subunits and between the OMPs and MFPs. This suggests that the pump's transport-ready state is at a more favourable energy level than the resting state, but raises the puzzle of how the pump does not become stably trapped in a transport-intermediate state. We propose a model for pump allostery that includes a downhill energetic transition process from a proposed 'activated' transport state back to the resting pump.
Collapse
Affiliation(s)
- Anya Webber
- Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1GA, UK; (A.W.); (A.H.)
| | - Malitha Ratnaweera
- Department of Oncology, MRC Weatherall Institute of Molecular Medicine, University of Oxford, Oxford OX3 9DS, UK;
| | - Andrzej Harris
- Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1GA, UK; (A.W.); (A.H.)
| | - Ben F. Luisi
- Department of Biochemistry, University of Cambridge, Tennis Court Road, Cambridge CB2 1GA, UK; (A.W.); (A.H.)
| | | |
Collapse
|
14
|
Chu WT, Yan Z, Chu X, Zheng X, Liu Z, Xu L, Zhang K, Wang J. Physics of biomolecular recognition and conformational dynamics. REPORTS ON PROGRESS IN PHYSICS. PHYSICAL SOCIETY (GREAT BRITAIN) 2021; 84:126601. [PMID: 34753115 DOI: 10.1088/1361-6633/ac3800] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 11/09/2021] [Indexed: 06/13/2023]
Abstract
Biomolecular recognition usually leads to the formation of binding complexes, often accompanied by large-scale conformational changes. This process is fundamental to biological functions at the molecular and cellular levels. Uncovering the physical mechanisms of biomolecular recognition and quantifying the key biomolecular interactions are vital to understand these functions. The recently developed energy landscape theory has been successful in quantifying recognition processes and revealing the underlying mechanisms. Recent studies have shown that in addition to affinity, specificity is also crucial for biomolecular recognition. The proposed physical concept of intrinsic specificity based on the underlying energy landscape theory provides a practical way to quantify the specificity. Optimization of affinity and specificity can be adopted as a principle to guide the evolution and design of molecular recognition. This approach can also be used in practice for drug discovery using multidimensional screening to identify lead compounds. The energy landscape topography of molecular recognition is important for revealing the underlying flexible binding or binding-folding mechanisms. In this review, we first introduce the energy landscape theory for molecular recognition and then address four critical issues related to biomolecular recognition and conformational dynamics: (1) specificity quantification of molecular recognition; (2) evolution and design in molecular recognition; (3) flexible molecular recognition; (4) chromosome structural dynamics. The results described here and the discussions of the insights gained from the energy landscape topography can provide valuable guidance for further computational and experimental investigations of biomolecular recognition and conformational dynamics.
Collapse
Affiliation(s)
- Wen-Ting Chu
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, People's Republic of China
| | - Zhiqiang Yan
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, People's Republic of China
| | - Xiakun Chu
- Department of Chemistry & Physics, State University of New York at Stony Brook, Stony Brook, NY 11794, United States of America
| | - Xiliang Zheng
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, People's Republic of China
| | - Zuojia Liu
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, People's Republic of China
| | - Li Xu
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, People's Republic of China
| | - Kun Zhang
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, People's Republic of China
| | - Jin Wang
- Department of Chemistry & Physics, State University of New York at Stony Brook, Stony Brook, NY 11794, United States of America
| |
Collapse
|
15
|
Bisardi M, Rodriguez-Rivas J, Zamponi F, Weigt M. Modeling sequence-space exploration and emergence of epistatic signals in protein evolution. Mol Biol Evol 2021; 39:6424001. [PMID: 34751386 PMCID: PMC8789065 DOI: 10.1093/molbev/msab321] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
During their evolution, proteins explore sequence space via an interplay between random mutations and phenotypic selection. Here, we build upon recent progress in reconstructing data-driven fitness landscapes for families of homologous proteins, to propose stochastic models of experimental protein evolution. These models predict quantitatively important features of experimentally evolved sequence libraries, like fitness distributions and position-specific mutational spectra. They also allow us to efficiently simulate sequence libraries for a vast array of combinations of experimental parameters like sequence divergence, selection strength, and library size. We showcase the potential of the approach in reanalyzing two recent experiments to determine protein structure from signals of epistasis emerging in experimental sequence libraries. To be detectable, these signals require sufficiently large and sufficiently diverged libraries. Our modeling framework offers a quantitative explanation for different outcomes of recently published experiments. Furthermore, we can forecast the outcome of time- and resource-intensive evolution experiments, opening thereby a way to computationally optimize experimental protocols.
Collapse
Affiliation(s)
- M Bisardi
- Laboratoire de Physique de l'Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, Paris, F-75005, France.,Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative LCQB, Paris, F-75005, France
| | - J Rodriguez-Rivas
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative LCQB, Paris, F-75005, France
| | - F Zamponi
- Laboratoire de Physique de l'Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, Paris, F-75005, France
| | - M Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative LCQB, Paris, F-75005, France
| |
Collapse
|
16
|
adabmDCA: adaptive Boltzmann machine learning for biological sequences. BMC Bioinformatics 2021; 22:528. [PMID: 34715775 PMCID: PMC8555268 DOI: 10.1186/s12859-021-04441-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Accepted: 10/12/2021] [Indexed: 11/30/2022] Open
Abstract
Background Boltzmann machines are energy-based models that have been shown to provide an accurate statistical description of domains of evolutionary-related protein and RNA families. They are parametrized in terms of local biases accounting for residue conservation, and pairwise terms to model epistatic coevolution between residues. From the model parameters, it is possible to extract an accurate prediction of the three-dimensional contact map of the target domain. More recently, the accuracy of these models has been also assessed in terms of their ability in predicting mutational effects and generating in silico functional sequences. Results Our adaptive implementation of Boltzmann machine learning, adabmDCA, can be generally applied to both protein and RNA families and accomplishes several learning set-ups, depending on the complexity of the input data and on the user requirements. The code is fully available at https://github.com/anna-pa-m/adabmDCA. As an example, we have performed the learning of three Boltzmann machines modeling the Kunitz and Beta-lactamase2 protein domains and TPP-riboswitch RNA domain. Conclusions The models learned by adabmDCA are comparable to those obtained by state-of-the-art techniques for this task, in terms of the quality of the inferred contact map as well as of the synthetically generated sequences. In addition, the code implements both equilibrium and out-of-equilibrium learning, which allows for an accurate and lossless training when the equilibrium one is prohibitive in terms of computational time, and allows for pruning irrelevant parameters using an information-based criterion.
Collapse
|
17
|
Trinquier J, Uguzzoni G, Pagnani A, Zamponi F, Weigt M. Efficient generative modeling of protein sequences using simple autoregressive models. Nat Commun 2021; 12:5800. [PMID: 34608136 PMCID: PMC8490405 DOI: 10.1038/s41467-021-25756-4] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Accepted: 08/23/2021] [Indexed: 02/08/2023] Open
Abstract
Generative models emerge as promising candidates for novel sequence-data driven approaches to protein design, and for the extraction of structural and functional information about proteins deeply hidden in rapidly growing sequence databases. Here we propose simple autoregressive models as highly accurate but computationally efficient generative sequence models. We show that they perform similarly to existing approaches based on Boltzmann machines or deep generative models, but at a substantially lower computational cost (by a factor between 102 and 103). Furthermore, the simple structure of our models has distinctive mathematical advantages, which translate into an improved applicability in sequence generation and evaluation. Within these models, we can easily estimate both the probability of a given sequence, and, using the model's entropy, the size of the functional sequence space related to a specific protein family. In the example of response regulators, we find a huge number of ca. 1068 possible sequences, which nevertheless constitute only the astronomically small fraction 10-80 of all amino-acid sequences of the same length. These findings illustrate the potential and the difficulty in exploring sequence space via generative sequence models.
Collapse
Affiliation(s)
- Jeanne Trinquier
- grid.503253.20000 0004 0520 7190Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative LCQB, F-75005 Paris, France ,grid.462608.e0000 0004 0384 7821Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France
| | - Guido Uguzzoni
- grid.4800.c0000 0004 1937 0343Department of Applied Science and Technology (DISAT), Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy ,grid.428948.b0000 0004 1784 6598Italian Institute for Genomic Medicine, IRCCS Candiolo, SP-142, I-10060 Candiolo (TO), Italy
| | - Andrea Pagnani
- grid.4800.c0000 0004 1937 0343Department of Applied Science and Technology (DISAT), Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy ,grid.428948.b0000 0004 1784 6598Italian Institute for Genomic Medicine, IRCCS Candiolo, SP-142, I-10060 Candiolo (TO), Italy ,grid.470222.10000 0004 7471 9712INFN Sezione di Torino, Via P. Giuria 1, I-10125 Torino, Italy
| | - Francesco Zamponi
- grid.462608.e0000 0004 0384 7821Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France
| | - Martin Weigt
- grid.503253.20000 0004 0520 7190Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative LCQB, F-75005 Paris, France
| |
Collapse
|
18
|
Sanchez-Pulido L, Ponting CP. Extending the Horizon of Homology Detection with Coevolution-based Structure Prediction. J Mol Biol 2021; 433:167106. [PMID: 34139218 PMCID: PMC8527833 DOI: 10.1016/j.jmb.2021.167106] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 06/09/2021] [Accepted: 06/09/2021] [Indexed: 12/12/2022]
Abstract
Traditional sequence analysis algorithms fail to identify distant homologies when they lie beyond a detection horizon. In this review, we discuss how co-evolution-based contact and distance prediction methods are pushing back this homology detection horizon, thereby yielding new functional insights and experimentally testable hypotheses. Based on correlated substitutions, these methods divine three-dimensional constraints among amino acids in protein sequences that were previously devoid of all annotated domains and repeats. The new algorithms discern hidden structure in an otherwise featureless sequence landscape. Their revelatory impact promises to be as profound as the use, by archaeologists, of ground-penetrating radar to discern long-hidden, subterranean structures. As examples of this, we describe how triplicated structures reflecting longin domains in MON1A-like proteins, or UVR-like repeats in DISC1, emerge from their predicted contact and distance maps. These methods also help to resolve structures that do not conform to a "beads-on-a-string" model of protein domains. In one such example, we describe CFAP298 whose ubiquitin-like domain was previously challenging to perceive owing to a large sequence insertion within it. More generally, the new algorithms permit an easier appreciation of domain families and folds whose evolution involved structural insertion or rearrangement. As we exemplify with α1-antitrypsin, coevolution-based predicted contacts may also yield insights into protein dynamics and conformational change. This new combination of structure prediction (using innovative co-evolution based methods) and homology inference (using more traditional sequence analysis approaches) shows great promise for bringing into view a sea of evolutionary relationships that had hitherto lain far beyond the horizon of homology detection.
Collapse
Affiliation(s)
- Luis Sanchez-Pulido
- Medical Research Council Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK.
| | - Chris P Ponting
- Medical Research Council Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, UK.
| |
Collapse
|
19
|
Colizzi F, Orozco M. Probing allosteric regulations with coevolution-driven molecular simulations. SCIENCE ADVANCES 2021; 7:eabj0786. [PMID: 34516882 PMCID: PMC8442858 DOI: 10.1126/sciadv.abj0786] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 07/19/2021] [Indexed: 06/13/2023]
Abstract
Protein-mediated allosteric regulations are essential in biology, but their quantitative characterization continues to posit formidable challenges for both experiments and computations. Here, we combine coevolutionary information, multiscale molecular simulations, and free-energy methods to interrogate and quantify the allosteric regulation of functional changes in protein complexes. We apply this approach to investigate the regulation of adenylyl cyclase (AC) by stimulatory and inhibitory G proteins—a prototypical allosteric system that has long escaped from in-depth molecular characterization. We reveal a surprisingly simple ON/OFF regulation of AC functional dynamics through multiple pathways of information transfer. The binding of G proteins reshapes the free-energy landscape of AC following the classical population-shift paradigm. The model agrees with structural and biochemical data and reveals previously unknown experimentally consistent intermediates. Our approach showcases a general strategy to explore uncharted functional space in complex biomolecular regulations.
Collapse
Affiliation(s)
- Francesco Colizzi
- Institute for Research in Biomedicine (IRB Barcelona), Barcelona Institute of Science and Technology (BIST), Carrer de Baldiri Reixac 10, Barcelona 08028, Spain
| | - Modesto Orozco
- Institute for Research in Biomedicine (IRB Barcelona), Barcelona Institute of Science and Technology (BIST), Carrer de Baldiri Reixac 10, Barcelona 08028, Spain
- Departament de Bioquímica i Biomedicina, Facultat de Biologia, Universitat de Barcelona, Avinguda Diagonal 647, Barcelona 08028, Spain
| |
Collapse
|
20
|
|
21
|
Haldane A, Levy RM. Mi3-GPU: MCMC-based Inverse Ising Inference on GPUs for protein covariation analysis. COMPUTER PHYSICS COMMUNICATIONS 2021; 260:107312. [PMID: 33716309 PMCID: PMC7944406 DOI: 10.1016/j.cpc.2020.107312] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Inverse Ising inference is a method for inferring the coupling parameters of a Potts/Ising model based on observed site-covariation, which has found important applications in protein physics for detecting interactions between residues in protein families. We introduce Mi3-GPU ("mee-three", for MCMC Inverse Ising Inference) software for solving the inverse Ising problem for protein-sequence datasets with few analytic approximations, by parallel Markov-Chain Monte-Carlo sampling on GPUs. We also provide tools for analysis and preparation of protein-family Multiple Sequence Alignments (MSAs) to account for finite-sampling issues, which are a major source of error or bias in inverse Ising inference. Our method is "generative" in the sense that the inferred model can be used to generate synthetic MSAs whose mutational statistics (marginals) can be verified to match the dataset MSA statistics up to the limits imposed by the effects of finite sampling. Our GPU implementation enables the construction of models which reproduce the covariation patterns of the observed MSA with a precision that is not possible with more approximate methods. The main components of our method are a GPU-optimized algorithm to greatly accelerate MCMC sampling, combined with a multi-step Quasi-Newton parameter-update scheme using a "Zwanzig reweighting" technique. We demonstrate the ability of this software to produce generative models on typical protein family datasets for sequence lengths L ~ 300 with 21 residue types with tens of millions of inferred parameters in short running times.
Collapse
Affiliation(s)
- Allan Haldane
- Center for Biophysics and Computational Biology and Department of Physics, Temple University, Philadelphia, Pennsylvania 19122
| | - Ronald M. Levy
- Center for Biophysics and Computational Biology and Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122
| |
Collapse
|
22
|
Wang CK, Craik DJ. Linking molecular evolution to molecular grafting. J Biol Chem 2021; 296:100425. [PMID: 33600801 PMCID: PMC8005815 DOI: 10.1016/j.jbc.2021.100425] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 02/09/2021] [Accepted: 02/13/2021] [Indexed: 12/01/2022] Open
Abstract
Molecular grafting is a strategy for the engineering of molecular scaffolds into new functional agents, such as next-generation therapeutics. Despite its wide use, studies so far have focused almost exclusively on demonstrating its utility rather than understanding the factors that lead to either poor or successful grafting outcomes. Here, we examine protein evolution and identify parallels between the natural process of protein functional diversification and the artificial process of molecular grafting. We discuss features of natural proteins that are correlated to innovability-the capacity to acquire new functions-and describe their implications to molecular grafting scaffolds. Disulfide-rich peptides are used as exemplars because they are particularly promising scaffolds onto which new functions can be grafted. This article provides a perspective on why some scaffolds are more suitable for grafting than others, identifying opportunities on how molecular grafting might be improved.
Collapse
Affiliation(s)
- Conan K Wang
- Institute for Molecular Bioscience and Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, The University of Queensland, Brisbane, Queensland, Australia.
| | - David J Craik
- Institute for Molecular Bioscience and Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, The University of Queensland, Brisbane, Queensland, Australia
| |
Collapse
|
23
|
Crippa M, Andreghetti D, Capelli R, Tiana G. Evolution of frustrated and stabilising contacts in reconstructed ancient proteins. EUROPEAN BIOPHYSICS JOURNAL 2021; 50:699-712. [PMID: 33569610 PMCID: PMC8260555 DOI: 10.1007/s00249-021-01500-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Revised: 12/14/2020] [Accepted: 01/13/2021] [Indexed: 11/30/2022]
Abstract
Energetic properties of a protein are a major determinant of its evolutionary fitness. Using a reconstruction algorithm, dating the reconstructed proteins and calculating the interaction network between their amino acids through a coevolutionary approach, we studied how the interactions that stabilise 890 proteins, belonging to five families, evolved for billions of years. In particular, we focused our attention on the network of most strongly attractive contacts and on that of poorly optimised, frustrated contacts. Our results support the idea that the cluster of most attractive interactions extends its size along evolutionary time, but from the data, we cannot conclude that protein stability or that the degree of frustration tends always to decrease.
Collapse
Affiliation(s)
- Martina Crippa
- Department of Physics and Center for Complexity and Biosystems, Università degli Studi di Milano and INFN, via Celoria 16, 20133, Milan, Italy
- Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
| | - Damiano Andreghetti
- Department of Physics and Center for Complexity and Biosystems, Università degli Studi di Milano and INFN, via Celoria 16, 20133, Milan, Italy
| | - Riccardo Capelli
- Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
| | - Guido Tiana
- Department of Physics and Center for Complexity and Biosystems, Università degli Studi di Milano and INFN, via Celoria 16, 20133, Milan, Italy.
| |
Collapse
|
24
|
Terzoli S, Tiana G. Molecular Recognition between Cadherins Studied by a Coarse-Grained Model Interacting with a Coevolutionary Potential. J Phys Chem B 2020; 124:4079-4088. [PMID: 32336092 PMCID: PMC8007105 DOI: 10.1021/acs.jpcb.0c01671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
![]()
Studying the conformations
involved in the dimerization of cadherins
is highly relevant to understand the development of tissues and its
failure, which is associated with tumors and metastases. Experimental
techniques, like X-ray crystallography, can usually report only the
most stable conformations, missing minority states that could nonetheless
be important for the recognition mechanism. Computer simulations could
be a valid complement to the experimental approach. However, standard
all-atom protein models in explicit solvent are computationally too
demanding to search thoroughly the conformational space of multiple
chains composed of several hundreds of amino acids. To reach this
goal, we resorted to a coarse-grained model in implicit solvent. The
standard problem with this kind of model is to find a realistic potential
to describe its interactions. We used coevolutionary information from
cadherin alignments, corrected by a statistical potential, to build
an interaction potential, which is agnostic about the experimental
conformations of the protein. Using this model, we explored the conformational
space of multichain systems and validated the results comparing with
experimental data. We identified dimeric conformations that are sequence
specific and that can be useful to rationalize the mechanism of recognition
between cadherins.
Collapse
Affiliation(s)
- Sara Terzoli
- Department of Physics and Center for Complexity and Biosystems, Universitá degli Studi di Milano and INFN, via Celoria 16, Milano 20133, Italy
| | - Guido Tiana
- Department of Physics and Center for Complexity and Biosystems, Universitá degli Studi di Milano and INFN, via Celoria 16, Milano 20133, Italy
| |
Collapse
|
25
|
Baldessari F, Capelli R, Carloni P, Giorgetti A. Coevolutionary data-based interaction networks approach highlighting key residues across protein families: The case of the G-protein coupled receptors. Comput Struct Biotechnol J 2020; 18:1153-1159. [PMID: 32489528 PMCID: PMC7260681 DOI: 10.1016/j.csbj.2020.05.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Revised: 05/01/2020] [Accepted: 05/06/2020] [Indexed: 12/26/2022] Open
Abstract
We present an approach that, by integrating structural data with Direct Coupling Analysis, is able to pinpoint most of the interaction hotspots (i.e. key residues for the biological activity) across very sparse protein families in a single run. An application to the Class A G-protein coupled receptors (GPCRs), both in their active and inactive states, demonstrates the predictive power of our approach. The latter can be easily extended to any other kind of protein family, where it is expected to highlight most key sites involved in their functional activity.
Collapse
Affiliation(s)
- Filippo Baldessari
- Department of Biotechnology, Università di Verona, Ca Vignal 1, strada Le Grazie 15, I-37134 Verona, Italy
| | - Riccardo Capelli
- Computational Biomedicine Section, IAS-5/INM-9, Forschungzentrum Jülich, Wilhelm-Johnen-straße, D-52425 Jülich, Germany
| | - Paolo Carloni
- Computational Biomedicine Section, IAS-5/INM-9, Forschungzentrum Jülich, Wilhelm-Johnen-straße, D-52425 Jülich, Germany
| | - Alejandro Giorgetti
- Department of Biotechnology, Università di Verona, Ca Vignal 1, strada Le Grazie 15, I-37134 Verona, Italy
- Computational Biomedicine Section, IAS-5/INM-9, Forschungzentrum Jülich, Wilhelm-Johnen-straße, D-52425 Jülich, Germany
| |
Collapse
|
26
|
Cuturello F, Tiana G, Bussi G. Assessing the accuracy of direct-coupling analysis for RNA contact prediction. RNA (NEW YORK, N.Y.) 2020; 26:637-647. [PMID: 32115426 PMCID: PMC7161351 DOI: 10.1261/rna.074179.119] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Accepted: 02/26/2020] [Indexed: 05/31/2023]
Abstract
Many noncoding RNAs are known to play a role in the cell directly linked to their structure. Structure prediction based on the sole sequence is, however, a challenging task. On the other hand, thanks to the low cost of sequencing technologies, a very large number of homologous sequences are becoming available for many RNA families. In the protein community, the idea of exploiting the covariance of mutations within a family to predict the protein structure using the direct-coupling-analysis (DCA) method has emerged in the last decade. The application of DCA to RNA systems has been limited so far. We here perform an assessment of the DCA method on 17 riboswitch families, comparing it with the commonly used mutual information analysis and with state-of-the-art R-scape covariance method. We also compare different flavors of DCA, including mean-field, pseudolikelihood, and a proposed stochastic procedure (Boltzmann learning) for solving exactly the DCA inverse problem. Boltzmann learning outperforms the other methods in predicting contacts observed in high-resolution crystal structures.
Collapse
Affiliation(s)
- Francesca Cuturello
- Scuola Internazionale Superiore di Studi Avanzati, International School for Advanced Studies, 34136 Trieste, Italy
| | - Guido Tiana
- Center for Complexity and Biosystems and Department of Physics, Università degli Studi di Milano and INFN, 20133 Milano, Italy
| | - Giovanni Bussi
- Scuola Internazionale Superiore di Studi Avanzati, International School for Advanced Studies, 34136 Trieste, Italy
| |
Collapse
|
27
|
Feng J, Shukla D. FingerprintContacts: Predicting Alternative Conformations of Proteins from Coevolution. J Phys Chem B 2020; 124:3605-3615. [PMID: 32283936 DOI: 10.1021/acs.jpcb.9b11869] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Proteins are dynamic molecules which perform diverse molecular functions by adopting different three-dimensional structures. Recent progress in residue-residue contacts prediction opens up new avenues for the de novo protein structure prediction from sequence information. However, it is still difficult to predict more than one conformation from residue-residue contacts alone. This is due to the inability to deconvolve the complex signals of residue-residue contacts, i.e., spatial contacts relevant for protein folding, conformational diversity, and ligand binding. Here, we introduce a machine learning based method, called FingerprintContacts, for extending the capabilities of residue-residue contacts. This algorithm leverages the features of residue-residue contacts, that is, (1) a single conformation outperforms the others in the structural prediction using all the top ranking residue-residue contacts as structural constraints and (2) conformation specific contacts rank lower and constitute a small fraction of residue-residue contacts. We demonstrate the capabilities of FingerprintContacts on eight ligand binding proteins with varying conformational motions. Furthermore, FingerprintContacts identifies small clusters of residue-residue contacts which are preferentially located in the dynamically fluctuating regions. With the rapid growth in protein sequence information, we expect FingerprintContacts to be a powerful first step in structural understanding of protein functional mechanisms.
Collapse
|
28
|
Koukos P, Bonvin A. Integrative Modelling of Biomolecular Complexes. J Mol Biol 2020; 432:2861-2881. [DOI: 10.1016/j.jmb.2019.11.009] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2019] [Revised: 11/12/2019] [Accepted: 11/13/2019] [Indexed: 12/31/2022]
|
29
|
Malinverni D, Barducci A. Coevolutionary Analysis of Protein Subfamilies by Sequence Reweighting. ENTROPY (BASEL, SWITZERLAND) 2020; 21:1127. [PMID: 32002010 PMCID: PMC6992422 DOI: 10.3390/e21111127] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Accepted: 11/14/2019] [Indexed: 01/07/2023]
Abstract
Extracting structural information from sequence co-variation has become a common computational biology practice in the recent years, mainly due to the availability of large sequence alignments of protein families. However, identifying features that are specific to sub-classes and not shared by all members of the family using sequence-based approaches has remained an elusive problem. We here present a coevolutionary-based method to differentially analyze subfamily specific structural features by a continuous sequence reweighting (SR) approach. We introduce the underlying principles and test its predictive capabilities on the Response Regulator family, whose subfamilies have been previously shown to display distinct, specific homo-dimerization patterns. Our results show that this reweighting scheme is effective in assigning structural features known a priori to subfamilies, even when sequence data is relatively scarce. Furthermore, sequence reweighting allows assessing if individual structural contacts pertain to specific subfamilies and it thus paves the way for the identification specificity-determining contacts from sequence variation data.
Collapse
Affiliation(s)
- Duccio Malinverni
- Medical Research Council (MRC) Laboratory of Molecular Biology, Cambridge CB20QH, UK
| | - Alessandro Barducci
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France
| |
Collapse
|
30
|
Sala D, Cerofolini L, Fragai M, Giachetti A, Luchinat C, Rosato A. A protocol to automatically calculate homo-oligomeric protein structures through the integration of evolutionary constraints and NMR ambiguous contacts. Comput Struct Biotechnol J 2019; 18:114-124. [PMID: 31969972 PMCID: PMC6961069 DOI: 10.1016/j.csbj.2019.12.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 11/20/2019] [Accepted: 12/06/2019] [Indexed: 12/15/2022] Open
Abstract
Protein assemblies are involved in many important biological processes. Solid-state NMR (SSNMR) spectroscopy is a technique suitable for the structural characterization of samples with high molecular weight and thus can be applied to such assemblies. A significant bottleneck in terms of both effort and time required is the manual identification of unambiguous intermolecular contacts. This is particularly challenging for homo-oligomeric complexes, where simple uniform labeling may not be effective. We tackled this challenge by exploiting coevolution analysis to extract information on homo-oligomeric interfaces from NMR-derived ambiguous contacts. After removing the evolutionary couplings (ECs) that are already satisfied by the 3D structure of the monomer, the predicted ECs are matched with the automatically generated list of experimental contacts. This approach provides a selection of potential interface residues that is used directly in monomer-monomer docking calculations. We validated the protocol on tetrameric L-asparaginase II and dimeric Sod1.
Collapse
Affiliation(s)
- Davide Sala
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Linda Cerofolini
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Marco Fragai
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| | - Andrea Giachetti
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Claudio Luchinat
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| | - Antonio Rosato
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| |
Collapse
|
31
|
Orellana L. Large-Scale Conformational Changes and Protein Function: Breaking the in silico Barrier. Front Mol Biosci 2019; 6:117. [PMID: 31750315 PMCID: PMC6848229 DOI: 10.3389/fmolb.2019.00117] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2019] [Accepted: 10/14/2019] [Indexed: 12/16/2022] Open
Abstract
Large-scale conformational changes are essential to link protein structures with their function at the cell and organism scale, but have been elusive both experimentally and computationally. Over the past few years developments in cryo-electron microscopy and crystallography techniques have started to reveal multiple snapshots of increasingly large and flexible systems, deemed impossible only short time ago. As structural information accumulates, theoretical methods become central to understand how different conformers interconvert to mediate biological function. Here we briefly survey current in silico methods to tackle large conformational changes, reviewing recent examples of cross-validation of experiments and computational predictions, which show how the integration of different scale simulations with biological information is already starting to break the barriers between the in silico, in vitro, and in vivo worlds, shedding new light onto complex biological problems inaccessible so far.
Collapse
Affiliation(s)
- Laura Orellana
- Institutionen för Biokemi och Biofysik, Stockholms Universitet, Stockholm, Sweden.,Science for Life Laboratory, Solna, Sweden
| |
Collapse
|
32
|
Biswas A, Haldane A, Arnold E, Levy RM. Epistasis and entrenchment of drug resistance in HIV-1 subtype B. eLife 2019; 8:e50524. [PMID: 31591964 PMCID: PMC6783267 DOI: 10.7554/elife.50524] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2019] [Accepted: 09/09/2019] [Indexed: 12/17/2022] Open
Abstract
The development of drug resistance in HIV is the result of primary mutations whose effects on viral fitness depend on the entire genetic background, a phenomenon called 'epistasis'. Based on protein sequences derived from drug-experienced patients in the Stanford HIV database, we use a co-evolutionary (Potts) Hamiltonian model to provide direct confirmation of epistasis involving many simultaneous mutations. Building on earlier work, we show that primary mutations leading to drug resistance can become highly favored (or entrenched) by the complex mutation patterns arising in response to drug therapy despite being disfavored in the wild-type background, and provide the first confirmation of entrenchment for all three drug-target proteins: protease, reverse transcriptase, and integrase; a comparative analysis reveals that NNRTI-induced mutations behave differently from the others. We further show that the likelihood of resistance mutations can vary widely in patient populations, and from the population average compared to specific molecular clones.
Collapse
Affiliation(s)
- Avik Biswas
- Center for Biophysics and Computational BiologyTemple UniversityPhiladelphiaUnited States
- Department of PhysicsTemple UniversityPhiladelphiaUnited States
| | - Allan Haldane
- Center for Biophysics and Computational BiologyTemple UniversityPhiladelphiaUnited States
- Department of PhysicsTemple UniversityPhiladelphiaUnited States
| | - Eddy Arnold
- Center for Advanced Biotechnology and MedicineRutgers UniversityPiscatawayUnited States
- Department of Chemistry and Chemical BiologyRutgers UniversityPiscatawayUnited States
| | - Ronald M Levy
- Center for Biophysics and Computational BiologyTemple UniversityPhiladelphiaUnited States
- Department of PhysicsTemple UniversityPhiladelphiaUnited States
- Department of ChemistryTemple UniversityPhiladelphiaUnited States
| |
Collapse
|
33
|
Shimagaki K, Weigt M. Selection of sequence motifs and generative Hopfield-Potts models for protein families. Phys Rev E 2019; 100:032128. [PMID: 31639992 DOI: 10.1103/physreve.100.032128] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Indexed: 06/10/2023]
Abstract
Statistical models for families of evolutionary related proteins have recently gained interest: In particular, pairwise Potts models as those inferred by the direct-coupling analysis have been able to extract information about the three-dimensional structure of folded proteins and about the effect of amino acid substitutions in proteins. These models are typically requested to reproduce the one- and two-point statistics of the amino acid usage in a protein family, i.e., to capture the so-called residue conservation and covariation statistics of proteins of common evolutionary origin. Pairwise Potts models are the maximum-entropy models achieving this. Although being successful, these models depend on huge numbers of ad hoc introduced parameters, which have to be estimated from finite amounts of data and whose biophysical interpretation remains unclear. Here, we propose an approach to parameter reduction, which is based on selecting collective sequence motifs. It naturally leads to the formulation of statistical sequence models in terms of Hopfield-Potts models. These models can be accurately inferred using a mapping to restricted Boltzmann machines and persistent contrastive divergence. We show that, when applied to protein data, even 20-40 patterns are sufficient to obtain statistically close-to-generative models. The Hopfield patterns form interpretable sequence motifs and may be used to clusterize amino acid sequences into functional subfamilies. However, the distributed collective nature of these motifs intrinsically limits the ability of Hopfield-Potts models in predicting contact maps, showing the necessity of developing models going beyond the Hopfield-Potts models discussed here.
Collapse
Affiliation(s)
- Kai Shimagaki
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative-LCQB, Paris, France
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative-LCQB, Paris, France
| |
Collapse
|
34
|
Astl L, Verkhivker GM. Data-driven computational analysis of allosteric proteins by exploring protein dynamics, residue coevolution and residue interaction networks. Biochim Biophys Acta Gen Subj 2019:S0304-4165(19)30179-5. [PMID: 31330173 DOI: 10.1016/j.bbagen.2019.07.008] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2019] [Revised: 07/15/2019] [Accepted: 07/17/2019] [Indexed: 02/07/2023]
Abstract
BACKGROUND Computational studies of allosteric interactions have witnessed a recent renaissance fueled by the growing interest in modeling of the complex molecular assemblies and biological networks. Allosteric interactions in protein structures allow for molecular communication in signal transduction networks. METHODS In this work, we performed a large scale comprehensive and multi-faceted analysis of >300 diverse allosteric proteins and complexes with allosteric modulators. By modeling and exploring coarse-grained dynamics, residue coevolution, and residue interaction networks for allosteric proteins, we have determined unifying molecular signatures shared by allosteric systems. RESULTS The results of this study have suggested that allosteric inhibitors and allosteric activators may differentially affect global dynamics and network organization of protein systems, leading to diverse allosteric mechanisms. By using structural and functional data on protein kinases, we present a detailed case study that that included atomic-level analysis of coevolutionary networks in kinases bound with allosteric inhibitors and activators. CONCLUSIONS We have found that coevolutionary networks can form direct communication pathways connecting functional regions and can recapitulate key regulatory sites and interactions responsible for allosteric signaling in the studied protein systems. The results of this computational investigation are compared with the experimental studies and reveal molecular signatures of known regulatory hotspots in protein kinases. GENERAL SIGNIFICANCE This study has shown that allosteric inhibitors and allosteric activators can have a different effect on residue interaction networks and can exploit distinct regulatory mechanisms, which could open up opportunities for probing allostery and new drug combinations with broad range of activities.
Collapse
Affiliation(s)
- Lindy Astl
- Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA 92618, United States of America
| | - Gennady M Verkhivker
- Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA 92618, United States of America; Department of Pharmacology, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093, United States of America.
| |
Collapse
|
35
|
Haldane A, Flynn WF, He P, Levy RM. Coevolutionary Landscape of Kinase Family Proteins: Sequence Probabilities and Functional Motifs. Biophys J 2019; 114:21-31. [PMID: 29320688 DOI: 10.1016/j.bpj.2017.10.028] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Revised: 09/11/2017] [Accepted: 10/17/2017] [Indexed: 01/25/2023] Open
Abstract
The protein kinase catalytic domain is one of the most abundant domains across all branches of life. Although kinases share a common core function of phosphoryl-transfer, they also have wide functional diversity and play varied roles in cell signaling networks, and for this reason are implicated in a number of human diseases. This functional diversity is primarily achieved through sequence variation, and uncovering the sequence-function relationships for the kinase family is a major challenge. In this study we use a statistical inference technique inspired by statistical physics, which builds a coevolutionary "Potts" Hamiltonian model of sequence variation in a protein family. We show how this model has sufficient power to predict the probability of specific subsequences in the highly diverged kinase family, which we verify by comparing the model's predictions with experimental observations in the Uniprot database. We show that the pairwise (residue-residue) interaction terms of the statistical model are necessary and sufficient to capture higher-than-pairwise mutation patterns of natural kinase sequences. We observe that previously identified functional sets of residues have much stronger correlated interaction scores than are typical.
Collapse
Affiliation(s)
- Allan Haldane
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania
| | - William F Flynn
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania; Department of Physics and Astronomy, Rutgers, The State University of New Jersey, Piscataway, New Jersey
| | - Peng He
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania
| | - Ronald M Levy
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania.
| |
Collapse
|
36
|
Abstract
Classically, phenotype is what is observed, and genotype is the genetic makeup. Statistical studies aim to project phenotypic likelihoods of genotypic patterns. The traditional genotype-to-phenotype theory embraces the view that the encoded protein shape together with gene expression level largely determines the resulting phenotypic trait. Here, we point out that the molecular biology revolution at the turn of the century explained that the gene encodes not one but ensembles of conformations, which in turn spell all possible gene-associated phenotypes. The significance of a dynamic ensemble view is in understanding the linkage between genetic change and the gained observable physical or biochemical characteristics. Thus, despite the transformative shift in our understanding of the basis of protein structure and function, the literature still commonly relates to the classical genotype-phenotype paradigm. This is important because an ensemble view clarifies how even seemingly small genetic alterations can lead to pleiotropic traits in adaptive evolution and in disease, why cellular pathways can be modified in monogenic and polygenic traits, and how the environment may tweak protein function.
Collapse
Affiliation(s)
- Ruth Nussinov
- Cancer and Inflammation Program, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, National Cancer Institute at Frederick, Frederick, Maryland, United States of America
- Sackler Institute of Molecular Medicine, Department of Human Genetics and Molecular Medicine, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Chung-Jung Tsai
- Cancer and Inflammation Program, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, National Cancer Institute at Frederick, Frederick, Maryland, United States of America
| | - Hyunbum Jang
- Cancer and Inflammation Program, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research, National Cancer Institute at Frederick, Frederick, Maryland, United States of America
| |
Collapse
|
37
|
Dixit SM, Ruotolo BT. A Semi-Empirical Framework for Interpreting Traveling Wave Ion Mobility Arrival Time Distributions. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2019; 30:956-966. [PMID: 30815838 DOI: 10.1007/s13361-019-02133-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Revised: 12/12/2018] [Accepted: 01/04/2019] [Indexed: 06/09/2023]
Abstract
The inherent structural heterogeneity of biomolecules is an important biophysical property that is essential to their function, but is challenging to characterize experimentally. We present a workflow that rapidly and quantitatively assesses the conformational heterogeneity of peptides and proteins in the gas phase using traveling wave ion mobility (TWIM) arrival time distributions (ATDs). We have established a set of semi-empirical equations that model the TWIM ATD peak width and resolution across a wide range of wave amplitudes (V) and wave velocities (v). In addition, a conformational broadening parameter, δ, can be extracted from this analysis that reports on the contribution of conformational heterogeneity to the broadening of TWIM ATD peak width during ion mobility separation. We use this δ value to evaluate the conformational heterogeneity of a set of helical peptides, and our analysis correlates well with previous peak width observations reported for these ions. Furthermore, we use molecular dynamics simulations to independently investigate the general flexibility of these peptides in the gas phase, and generate similar trends found in experimental TWIM data. Finally, we extended our analysis to Avidin, a 64-kDa homotetramer, and quantify the structural heterogeneity of this intact complex using TWIM ATD data as a function of cross-linking. We observe an initial reduction in δ values as a function of cross-linker concentration, demonstrating the sensitivity of our δ value analysis to changes in flexibility of the assembly.
Collapse
Affiliation(s)
- Sugyan M Dixit
- Department of Chemistry, University of Michigan, 930 N. University Ave, Ann Arbor, MI, 48109, USA
| | - Brandon T Ruotolo
- Department of Chemistry, University of Michigan, 930 N. University Ave, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
38
|
The role of coevolutionary signatures in protein interaction dynamics, complex inference, molecular recognition, and mutational landscapes. Curr Opin Struct Biol 2019; 56:179-186. [PMID: 31029927 DOI: 10.1016/j.sbi.2019.03.024] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2019] [Revised: 03/18/2019] [Accepted: 03/19/2019] [Indexed: 11/22/2022]
Abstract
Evolution imposes constraints at the interface of interacting biomolecules in order to preserve function or maintain fitness. This pressure may have a direct effect on the sequence composition of interacting biomolecules. As a result, statistical patterns of amino acid or nucleotide covariance that encode for physical and functional interactions are observed in sequences of extant organisms. In recent years, global pairwise models of amino acid and nucleotide coevolution from multiple sequence alignments have been developed and utilized to study molecular interactions in structural biology. In proteins, for which the energy landscape is funneled and minimally frustrated, a direct connection between the physical and sequence space landscapes can be established. Estimating coevolutionary information from sequences of interacting molecules has a broad impact in molecular biology. Applications include the accurate determination of 3D structures of molecular complexes, inference of protein interaction partners, models of protein-protein interaction specificity, the elucidation, and design of protein-nucleic acid recognition as well as the discovery of genome-wide epistatic effects. The current state of the art of coevolutionary analysis includes biomedical applications ranging from mutational landscapes and drug-design to vaccine development.
Collapse
|
39
|
Marchetti F, Capelli R, Rizzato F, Laio A, Colombo G. The Subtle Trade-Off between Evolutionary and Energetic Constraints in Protein-Protein Interactions. J Phys Chem Lett 2019; 10:1489-1497. [PMID: 30855965 DOI: 10.1021/acs.jpclett.9b00191] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Life machinery, although overwhelmingly complex, is rooted on a rather limited number of molecular processes. One of the most important is protein-protein interaction. Metabolic regulation, protein folding control, and cellular motility are examples of processes based on the fine-tuned interaction of several protein partners. The region on the protein surface devoted to the recognition of a specific partner is essential for the function of the protein and is, therefore, likely to be conserved during evolution. On the other hand, the physical chemistry of amino acids underlies the mechanism of interactions. Both evolutionary and energetic constraints can then be used to build scoring functions capable of recognizing interaction sites. Our working hypothesis is that residues within the interaction interface tend at the same time to be evolutionarily conserved (to preserve their function) and to provide little contribution to the internal stabilization of the structure of their cognate protein, to facilitate conformational adaptation to the partner. Here, we show that for some classes of protein partners (for example, those involved in signal transduction and in enzymes) evolutionary constraints play the key role in defining the interaction surface. In contrast, energetic constraints emerge as more important in protein partners involved in immune response, in inhibitor proteins, and in structural proteins. Our results indicate that a general-purpose scoring function for protein-protein interaction should not be agnostic of the biological function of the partners.
Collapse
Affiliation(s)
- Filippo Marchetti
- Istituto di Chimica del Riconoscimento Molecolare , CNR Via Mario Bianco 9 , 20131 Milano , Italy
- Dipartimento di Chimica , Università degli Studi di Milano , Via Venezian 21 , I-20133 Milano , Italy
| | - Riccardo Capelli
- INM-9/IAS-5 Computational Biomedicine , Forschungszentrum Jülich , Wilhelm-Johnen-Straße , D-54245 Jülich , Germany
| | - Francesca Rizzato
- SISSA, Scuola Internazionale Superiore Studi Avanzati , Via Bonomea 265 , I-34136 Trieste , Italy
| | - Alessandro Laio
- SISSA, Scuola Internazionale Superiore Studi Avanzati , Via Bonomea 265 , I-34136 Trieste , Italy
- ICTP, International Centre for Theoretical Physics , Strada Costiera 11 , I-34100 Trieste , Italy
| | - Giorgio Colombo
- Istituto di Chimica del Riconoscimento Molecolare , CNR Via Mario Bianco 9 , 20131 Milano , Italy
- Dipartimento di Chimica , Università di Pavia , V.le Taramelli 12 , 27100 Pavia , Italy
| |
Collapse
|
40
|
Liang Z, Verkhivker GM, Hu G. Integration of network models and evolutionary analysis into high-throughput modeling of protein dynamics and allosteric regulation: theory, tools and applications. Brief Bioinform 2019; 21:815-835. [DOI: 10.1093/bib/bbz029] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Revised: 02/04/2019] [Accepted: 02/21/2019] [Indexed: 12/24/2022] Open
Abstract
Abstract
Proteins are dynamical entities that undergo a plethora of conformational changes, accomplishing their biological functions. Molecular dynamics simulation and normal mode analysis methods have become the gold standard for studying protein dynamics, analyzing molecular mechanism and allosteric regulation of biological systems. The enormous amount of the ensemble-based experimental and computational data on protein structure and dynamics has presented a major challenge for the high-throughput modeling of protein regulation and molecular mechanisms. In parallel, bioinformatics and systems biology approaches including genomic analysis, coevolution and network-based modeling have provided an array of powerful tools that complemented and enriched biophysical insights by enabling high-throughput analysis of biological data and dissection of global molecular signatures underlying mechanisms of protein function and interactions in the cellular environment. These developments have provided a powerful interdisciplinary framework for quantifying the relationships between protein dynamics and allosteric regulation, allowing for high-throughput modeling and engineering of molecular mechanisms. Here, we review fundamental advances in protein dynamics, network theory and coevolutionary analysis that have provided foundation for rapidly growing computational tools for modeling of allosteric regulation. We discuss recent developments in these interdisciplinary areas bridging computational biophysics and network biology, focusing on promising applications in allosteric regulations, including the investigation of allosteric communication pathways, protein–DNA/RNA interactions and disease mutations in genomic medicine. We conclude by formulating and discussing future directions and potential challenges facing quantitative computational investigations of allosteric regulatory mechanisms in protein systems.
Collapse
Affiliation(s)
- Zhongjie Liang
- School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Gennady M Verkhivker
- Department of Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA, USA
- Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA, USA
| | - Guang Hu
- School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| |
Collapse
|
41
|
Tubiana J, Cocco S, Monasson R. Learning protein constitutive motifs from sequence data. eLife 2019; 8:e39397. [PMID: 30857591 PMCID: PMC6436896 DOI: 10.7554/elife.39397] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Accepted: 02/24/2019] [Indexed: 12/11/2022] Open
Abstract
Statistical analysis of evolutionary-related protein sequences provides information about their structure, function, and history. We show that Restricted Boltzmann Machines (RBM), designed to learn complex high-dimensional data and their statistical features, can efficiently model protein families from sequence information. We here apply RBM to 20 protein families, and present detailed results for two short protein domains (Kunitz and WW), one long chaperone protein (Hsp70), and synthetic lattice proteins for benchmarking. The features inferred by the RBM are biologically interpretable: they are related to structure (residue-residue tertiary contacts, extended secondary motifs (α-helixes and β-sheets) and intrinsically disordered regions), to function (activity and ligand specificity), or to phylogenetic identity. In addition, we use RBM to design new protein sequences with putative properties by composing and 'turning up' or 'turning down' the different modes at will. Our work therefore shows that RBM are versatile and practical tools that can be used to unveil and exploit the genotype-phenotype relationship for protein families.
Collapse
Affiliation(s)
- Jérôme Tubiana
- Laboratory of Physics of the Ecole Normale SupérieureCNRS UMR 8023 & PSL ResearchParisFrance
| | - Simona Cocco
- Laboratory of Physics of the Ecole Normale SupérieureCNRS UMR 8023 & PSL ResearchParisFrance
| | - Rémi Monasson
- Laboratory of Physics of the Ecole Normale SupérieureCNRS UMR 8023 & PSL ResearchParisFrance
| |
Collapse
|
42
|
Figliuzzi M, Barrat-Charlaix P, Weigt M. How Pairwise Coevolutionary Models Capture the Collective Residue Variability in Proteins? Mol Biol Evol 2019; 35:1018-1027. [PMID: 29351669 DOI: 10.1093/molbev/msy007] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Global coevolutionary models of homologous protein families, as constructed by direct coupling analysis (DCA), have recently gained popularity in particular due to their capacity to accurately predict residue-residue contacts from sequence information alone, and thereby to facilitate tertiary and quaternary protein structure prediction. More recently, they have also been used to predict fitness effects of amino-acid substitutions in proteins, and to predict evolutionary conserved protein-protein interactions. These models are based on two currently unjustified hypotheses: 1) correlations in the amino-acid usage of different positions are resulting collectively from networks of direct couplings; and 2) pairwise couplings are sufficient to capture the amino-acid variability. Here, we propose a highly precise inference scheme based on Boltzmann-machine learning, which allows us to systematically address these hypotheses. We show how correlations are built up in a highly collective way by a large number of coupling paths, which are based on the proteins three-dimensional structure. We further find that pairwise coevolutionary models capture the collective residue variability across homologous proteins even for quantities which are not imposed by the inference procedure, like three-residue correlations, the clustered structure of protein families in sequence space or the sequence distances between homologs. These findings strongly suggest that pairwise coevolutionary models are actually sufficient to accurately capture the residue variability in homologous protein families.
Collapse
Affiliation(s)
- Matteo Figliuzzi
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Computational and Quantitative Biology - UMR7238, 75005 Paris, France
| | - Pierre Barrat-Charlaix
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Computational and Quantitative Biology - UMR7238, 75005 Paris, France
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Computational and Quantitative Biology - UMR7238, 75005 Paris, France
| |
Collapse
|
43
|
Abstract
Thanks to the explosion of genomic sequencing, coevolutionary analysis of protein sequences has gained great and ever-increasing popularity in the last decade, and it is currently an important and well-established tool in structural bioinformatics and computational biology. This chapter concisely introduces the theoretical foundation and the practical aspects of coevolutionary analysis, as well as discusses the molecular modeling strategies to exploit its results in the study of protein structure, dynamics, and interactions. We present here a complete pipeline from sequence extraction to contact prediction through two examples, focusing on the predictions of inter-residue contacts in a single protein domain and on the analysis of a multi-domain protein that undergoes functional, large-scale conformational transitions.
Collapse
Affiliation(s)
- Duccio Malinverni
- Laboratory of Statistical Biophysics, Institute of Physics, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.
| | - Alessandro Barducci
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, Montpellier, France.
| |
Collapse
|
44
|
Koehl P, Orland H, Delarue M. Numerical Encodings of Amino Acids in Multivariate Gaussian Modeling of Protein Multiple Sequence Alignments. Molecules 2018; 24:E104. [PMID: 30597916 PMCID: PMC6337344 DOI: 10.3390/molecules24010104] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2018] [Revised: 12/21/2018] [Accepted: 12/24/2018] [Indexed: 11/17/2022] Open
Abstract
Residues in proteins that are in close spatial proximity are more prone to covariate as their interactions are likely to be preserved due to structural and evolutionary constraints. If we can detect and quantify such covariation, physical contacts may then be predicted in the structure of a protein solely from the sequences that decorate it. To carry out such predictions, and following the work of others, we have implemented a multivariate Gaussian model to analyze correlation in multiple sequence alignments. We have explored and tested several numerical encodings of amino acids within this model. We have shown that 1D encodings based on amino acid biochemical and biophysical properties, as well as higher dimensional encodings computed from the principal components of experimentally derived mutation/substitution matrices, do not perform as well as a simple twenty dimensional encoding with each amino acid represented with a vector of one along its own dimension and zero elsewhere. The optimum obtained from representations based on substitution matrices is reached by using 10 to 12 principal components; the corresponding performance is less than the performance obtained with the 20-dimensional binary encoding. We highlight also the importance of the prior when constructing the multivariate Gaussian model of a multiple sequence alignment.
Collapse
Affiliation(s)
- Patrice Koehl
- Department of Computer Science, University of California, Davis, CA 95211, USA.
| | - Henri Orland
- Institut de Physique Théorique, CEA Saclay, 91191 Gif-sur-Yvette CEDEX, France.
| | - Marc Delarue
- Department of Structural Biology and Chemistry and UMR 3528 du CNRS, Institut Pasteur, 75015 Paris, France.
| |
Collapse
|
45
|
Harrison RES, Morikis D. Molecular Mechanisms of Macular Degeneration Associated with the Complement Factor H Y402H Mutation. Biophys J 2018; 116:215-226. [PMID: 30616835 DOI: 10.1016/j.bpj.2018.12.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Revised: 10/24/2018] [Accepted: 12/07/2018] [Indexed: 01/02/2023] Open
Abstract
A single nucleotide polymorphism, tyrosine at position 402 to histidine (Y402H), within the gene encoding complement factor H (FH) predisposes individuals to acquiring age-related macular degeneration (AMD) after aging. This polymorphism occurs in short consensus repeat (SCR) 7 of FH and results in decreased binding affinity of SCR6-8 for heparin. As FH is responsible for regulating the complement system, decreased affinity for heparin results in decreased regulation on surfaces of self. To understand the involvement of the Y402H polymorphism in AMD, we leverage methods from bioinformatics and computational biophysics to quantify structural and dynamical differences between SCR7 isoforms that contribute to decreased pattern recognition in SCR7H402. Our data from molecular and Brownian dynamics simulations suggest a revised mechanism for decreased heparin binding. In this model, transient contacts not observed in structures for SCR7 are predicted to occur in molecular dynamics simulations between coevolved residues Y402 and I412, stabilizing SCR7Y402 in a conformation that promotes association with heparin. H402 in the risk isoform is less likely to form a contact with I412 and samples a larger conformational space than Y402. We observe energy minima for sidechains of Y402 and R404 from SCR7Y402 that are predicted to associate with heparin at a rate constant faster than energy minima for sidechains of H402 and R404 from SCR7H402. As both carbohydrate density and degree of sulfation decrease with age in Bruch's membrane of the macula, the decreased heparin recognition of SCR7H402 may contribute to the pathogenesis of AMD.
Collapse
Affiliation(s)
- Reed E S Harrison
- Department of Bioengineering, University of California, Riverside, Riverside, California
| | - Dimitrios Morikis
- Department of Bioengineering, University of California, Riverside, Riverside, California.
| |
Collapse
|
46
|
Neuwald AF, Altschul SF. Statistical investigations of protein residue direct couplings. PLoS Comput Biol 2018; 14:e1006237. [PMID: 30596639 PMCID: PMC6329532 DOI: 10.1371/journal.pcbi.1006237] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Revised: 01/11/2019] [Accepted: 11/23/2018] [Indexed: 12/12/2022] Open
Abstract
Protein Direct Coupling Analysis (DCA), which predicts residue-residue contacts based on covarying positions within a multiple sequence alignment, has been remarkably effective. This suggests that there is more to learn from sequence correlations than is generally assumed, and calls for deeper investigations into DCA and perhaps into other types of correlations. Here we describe an approach that enables such investigations by measuring, as an estimated p-value, the statistical significance of the association between residue-residue covariance and structural interactions, either internal or homodimeric. Its application to thirty protein superfamilies confirms that direct coupling (DC) scores correlate with 3D pairwise contacts with very high significance. This method also permits quantitative assessment of the relative performance of alternative DCA methods, and of the degree to which they detect direct versus indirect couplings. We illustrate its use to assess, for a given protein, the biological relevance of alternative conformational states, to investigate the possible mechanistic implications of differences between these states, and to characterize subtle aspects of direct couplings. Our analysis indicates that direct pairwise correlations may be largely distinct from correlated patterns associated with functional specialization, and that the joint analysis of both types of correlations can yield greater power. Data, programs, and source code are freely available at http://evaldca.igs.umaryland.edu.
Collapse
Affiliation(s)
- Andrew F. Neuwald
- Institute for Genome Sciences and Department of Biochemistry & Molecular Biology, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - Stephen F. Altschul
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| |
Collapse
|
47
|
Butler BM, Kazan IC, Kumar A, Ozkan SB. Coevolving residues inform protein dynamics profiles and disease susceptibility of nSNVs. PLoS Comput Biol 2018; 14:e1006626. [PMID: 30496278 PMCID: PMC6289467 DOI: 10.1371/journal.pcbi.1006626] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Revised: 12/11/2018] [Accepted: 11/09/2018] [Indexed: 11/18/2022] Open
Abstract
The conformational dynamics of proteins is rarely used in methodologies used to predict the impact of genetic mutations due to the paucity of three-dimensional protein structures as compared to the vast number of available sequences. Until now a three-dimensional (3D) structure has been required to predict the conformational dynamics of a protein. We introduce an approach that estimates the conformational dynamics of a protein, without relying on structural information. This de novo approach utilizes coevolving residues identified from a multiple sequence alignment (MSA) using Potts models. These coevolving residues are used as contacts in a Gaussian network model (GNM) to obtain protein dynamics. B-factors calculated using sequence-based GNM (Seq-GNM) are in agreement with crystallographic B-factors as well as theoretical B-factors from the original GNM that utilizes the 3D structure. Moreover, we demonstrate the ability of the calculated B-factors from the Seq-GNM approach to discriminate genomic variants according to their phenotypes for a wide range of proteins. These results suggest that protein dynamics can be approximated based on sequence information alone, making it possible to assess the phenotypes of nSNVs in cases where a 3D structure is unknown. We hope this work will promote the use of dynamics information in genetic disease prediction at scale by circumventing the need for 3D structures. Proteins are dynamic machines that undergo atomic fluctuations, side chain rotations, and collective domain movements that are required for biological function. There is, therefore, a need for quantitative metrics that capture the dynamic fluctuations per position to understand the critical role of protein dynamics in shaping biological functions. A limiting factor in incorporating structural dynamics information in the classification of non-synonymous single nucleotide variants (nSNVs) is the limited number of known 3D structures compared to the vast number of available sequences. We have developed a new sequence-based GNM method, termed Seq-GNM, which uses co-evolving amino acid positions based on the multiple sequence alignment of a given query sequence to estimate the thermal motions of C-alpha atoms. In this paper, we have demonstrated that the predicted thermal motions using Seq-GNM are in reasonable agreement with experimental B-factors as well as B-factors computed using 3D crystal structures. We also provide evidence that B-factors predicted by Seq-GNM are capable of distinguishing between disease-associated and neutral nSNVs.
Collapse
Affiliation(s)
- Brandon M. Butler
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, United States of America
| | - I. Can Kazan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, United States of America
| | - Avishek Kumar
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, United States of America
- Harris School of Public Policy and Center for Data Science and Public Policy, University of Chicago, Chicago, IL, United States of America
| | - S. Banu Ozkan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, United States of America
- * E-mail:
| |
Collapse
|
48
|
Vorberg S, Seemayer S, Söding J. Synthetic protein alignments by CCMgen quantify noise in residue-residue contact prediction. PLoS Comput Biol 2018; 14:e1006526. [PMID: 30395601 PMCID: PMC6237422 DOI: 10.1371/journal.pcbi.1006526] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2018] [Revised: 11/15/2018] [Accepted: 09/24/2018] [Indexed: 12/01/2022] Open
Abstract
Compensatory mutations between protein residues in physical contact can manifest themselves as statistical couplings between the corresponding columns in a multiple sequence alignment (MSA) of the protein family. Conversely, large coupling coefficients predict residue contacts. Methods for de-novo protein structure prediction based on this approach are becoming increasingly reliable. Their main limitation is the strong systematic and statistical noise in the estimation of coupling coefficients, which has so far limited their application to very large protein families. While most research has focused on improving predictions by adding external information, little progress has been made to improve the statistical procedure at the core, because our lack of understanding of the sources of noise poses a major obstacle. First, we show theoretically that the expectation value of the coupling score assuming no coupling is proportional to the product of the square roots of the column entropies, and we propose a simple entropy bias correction (EntC) that subtracts out this expectation value. Second, we show that the average product correction (APC) includes the correction of the entropy bias, partly explaining its success. Third, we have developed CCMgen, the first method for simulating protein evolution and generating realistic synthetic MSAs with pairwise statistical residue couplings. Fourth, to learn exact statistical models that reliably reproduce observed alignment statistics, we developed CCMpredPy, an implementation of the persistent contrastive divergence (PCD) method for exact inference. Fifth, we demonstrate how CCMgen and CCMpredPy can facilitate the development of contact prediction methods by analysing the systematic noise contributions from phylogeny and entropy. Using the entropy bias correction, we can disentangle both sources of noise and find that entropy contributes roughly twice as much noise as phylogeny. Knowledge about the three-dimensional structure of proteins is key to understanding their function and role in biological processes and diseases. The experimental structure determination techniques, such as X-ray crystallography or electron cryo-microscopy, are labour intensive, time-consuming and expensive. Therefore, complementary computational methods to predict a protein’s structure have become indispensable. Over the last years, immense progress has been made in predicting protein structures from their amino acid sequence by utilizing highly accurate predictions of spatial contacts between amino acid residues as constraints in folding simulations. However, contact prediction methods require large numbers of homologous protein sequences in order to discriminate between signal and noise. A major obstacle preventing progress on the statistical methodology is our limited understanding of the different components of noise that are known to affect the predictions. We provide two tools, CCMpredPy and CCMgen, that can be used to learn highly accurate statistical models for contact prediction and to simulate protein evolution according to the statistical constraints between positions of residues as specified by these models, respectively. We showcase their usefulness by quantifying the relative contribution of noise arising from entropy and phylogeny on the predicted contacts, which will facilitate the improvement of the statistical methodology.
Collapse
Affiliation(s)
- Susann Vorberg
- Quantitative and Computational Biology Group, Max-Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Stefan Seemayer
- Quantitative and Computational Biology Group, Max-Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Johannes Söding
- Quantitative and Computational Biology Group, Max-Planck Institute for Biophysical Chemistry, Göttingen, Germany
| |
Collapse
|
49
|
How is structural divergence related to evolutionary information? Mol Phylogenet Evol 2018; 127:859-866. [DOI: 10.1016/j.ympev.2018.06.033] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Revised: 06/01/2018] [Accepted: 06/19/2018] [Indexed: 12/15/2022]
|
50
|
Endutkin AV, Koptelov SS, Popov AV, Torgasheva NA, Lomzov AA, Tsygankova AR, Skiba TV, Afonnikov DA, Zharkov DO. Residue coevolution reveals functionally important intramolecular interactions in formamidopyrimidine-DNA glycosylase. DNA Repair (Amst) 2018; 69:24-33. [DOI: 10.1016/j.dnarep.2018.07.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Revised: 07/04/2018] [Accepted: 07/04/2018] [Indexed: 10/28/2022]
|