101
|
Ju F, Zhu J, Shao B, Kong L, Liu TY, Zheng WM, Bu D. CopulaNet: Learning residue co-evolution directly from multiple sequence alignment for protein structure prediction. Nat Commun 2021; 12:2535. [PMID: 33953201 PMCID: PMC8100175 DOI: 10.1038/s41467-021-22869-8] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 03/28/2021] [Indexed: 11/29/2022] Open
Abstract
Residue co-evolution has become the primary principle for estimating inter-residue distances of a protein, which are crucially important for predicting protein structure. Most existing approaches adopt an indirect strategy, i.e., inferring residue co-evolution based on some hand-crafted features, say, a covariance matrix, calculated from multiple sequence alignment (MSA) of target protein. This indirect strategy, however, cannot fully exploit the information carried by MSA. Here, we report an end-to-end deep neural network, CopulaNet, to estimate residue co-evolution directly from MSA. The key elements of CopulaNet include: (i) an encoder to model context-specific mutation for each residue; (ii) an aggregator to model residue co-evolution, and thereafter estimate inter-residue distances. Using CASP13 (the 13th Critical Assessment of Protein Structure Prediction) target proteins as representatives, we demonstrate that CopulaNet can predict protein structure with improved accuracy and efficiency. This study represents a step toward improved end-to-end prediction of inter-residue distances and protein tertiary structures.
Collapse
Affiliation(s)
- Fusong Ju
- Key Lab of Intelligent Information Processing, State Key Lab of Computer Architecture, Big-data Academy, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | | | - Bin Shao
- Microsoft Research Asia, Beijing, China
| | - Lupeng Kong
- Key Lab of Intelligent Information Processing, State Key Lab of Computer Architecture, Big-data Academy, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | | | - Wei-Mou Zheng
- University of Chinese Academy of Sciences, Beijing, China
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, China
| | - Dongbo Bu
- Key Lab of Intelligent Information Processing, State Key Lab of Computer Architecture, Big-data Academy, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.
- University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
102
|
Anton B, Besalú M, Fornes O, Bonet J, Molina A, Molina-Fernandez R, De Las Cuevas G, Fernandez-Fuentes N, Oliva B. On the use of direct-coupling analysis with a reduced alphabet of amino acids combined with super-secondary structure motifs for protein fold prediction. NAR Genom Bioinform 2021; 3:lqab027. [PMID: 33937764 PMCID: PMC8061457 DOI: 10.1093/nargab/lqab027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 02/27/2021] [Accepted: 03/26/2021] [Indexed: 11/12/2022] Open
Abstract
Direct-coupling analysis (DCA) for studying the coevolution of residues in proteins has been widely used to predict the three-dimensional structure of a protein from its sequence. We present RADI/raDIMod, a variation of the original DCA algorithm that groups chemically equivalent residues combined with super-secondary structure motifs to model protein structures. Interestingly, the simplification produced by grouping amino acids into only two groups (polar and non-polar) is still representative of the physicochemical nature that characterizes the protein structure and it is in line with the role of hydrophobic forces in protein-folding funneling. As a result of a compressed alphabet, the number of sequences required for the multiple sequence alignment is reduced. The number of long-range contacts predicted is limited; therefore, our approach requires the use of neighboring sequence-positions. We use the prediction of secondary structure and motifs of super-secondary structures to predict local contacts. We use RADI and raDIMod, a fragment-based protein structure modelling, achieving near native conformations when the number of super-secondary motifs covers >30-50% of the sequence. Interestingly, although different contacts are predicted with different alphabets, they produce similar structures.
Collapse
Affiliation(s)
- Bernat Anton
- Structural Bioinformatics Lab (GRIB-IMIM), Department of Experimental and Health Science, University Pompeu Fabra, Barcelona 08005, Catalonia, Spain
| | - Mireia Besalú
- Departament de Genètica, Microbiologia i Estadística, Universitat de Barcelona, Barcelona 08028, Catalonia, Spain
| | - Oriol Fornes
- Structural Bioinformatics Lab (GRIB-IMIM), Department of Experimental and Health Science, University Pompeu Fabra, Barcelona 08005, Catalonia, Spain
| | - Jaume Bonet
- Structural Bioinformatics Lab (GRIB-IMIM), Department of Experimental and Health Science, University Pompeu Fabra, Barcelona 08005, Catalonia, Spain
| | - Alexis Molina
- Electronic and Atomic Protein Modeling, Life Sciences, Barcelona Supercomputing Center, Barcelona 08034, Catalonia, Spain
| | - Ruben Molina-Fernandez
- Structural Bioinformatics Lab (GRIB-IMIM), Department of Experimental and Health Science, University Pompeu Fabra, Barcelona 08005, Catalonia, Spain
| | - Gemma De Las Cuevas
- Institut für Theoritische Physik, School of Mathematics, Computer Science and Physics, Universität Innsbruck. A-6020 Innsbruck, Austria
| | - Narcis Fernandez-Fuentes
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, SY233EB Aberystwyth, United Kingdom
| | - Baldo Oliva
- Structural Bioinformatics Lab (GRIB-IMIM), Department of Experimental and Health Science, University Pompeu Fabra, Barcelona 08005, Catalonia, Spain
| |
Collapse
|
103
|
Coevolution underlies GPCR-G protein selectivity and functionality. Sci Rep 2021; 11:7858. [PMID: 33846507 PMCID: PMC8041822 DOI: 10.1038/s41598-021-87251-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Accepted: 03/25/2021] [Indexed: 12/12/2022] Open
Abstract
G protein-coupled receptors (GPCRs) regulate diverse physiological events, which makes them as the major targets for many approved drugs. G proteins are downstream molecules that receive signals from GPCRs and trigger cell responses. The GPCR-G protein selectivity mechanism on how they properly and timely interact is still unclear. Here, we analyzed model GPCRs (i.e. HTR, DAR) and Gα proteins with a coevolutionary tool, statistical coupling analysis. The results suggested that 5-hydroxytryptamine receptors and dopamine receptors have common conserved and coevolved residues. The Gα protein also have conserved and coevolved residues. These coevolved residues were implicated in the molecular functions of the analyzed proteins. We also found specific coevolving pairs related to the selectivity between GPCR and G protein were identified. We propose that these results would contribute to better understandings of not only the functional residues of GPCRs and Gα proteins but also GPCR-G protein selectivity mechanisms.
Collapse
|
104
|
Trivial and nontrivial error sources account for misidentification of protein partners in mutual information approaches. Sci Rep 2021; 11:6902. [PMID: 33767294 PMCID: PMC7994710 DOI: 10.1038/s41598-021-86455-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Accepted: 03/15/2021] [Indexed: 12/01/2022] Open
Abstract
The problem of finding the correct set of partners for a given pair of interacting protein families based on multi-sequence alignments (MSAs) has received great attention over the years. Recently, the native contacts of two interacting proteins were shown to store the strongest mutual information (MI) signal to discriminate MSA concatenations with the largest fraction of correct pairings. Although that signal might be of practical relevance in the search for an effective heuristic to solve the problem, the number of MSA concatenations with near-native MI is large, imposing severe limitations. Here, a Genetic Algorithm that explores possible MSA concatenations according to a MI maximization criteria is shown to find degenerate solutions with two error sources, arising from mismatches among (i) similar and (ii) non-similar sequences. If mistakes made among similar sequences are disregarded, type-(i) solutions are found to resolve correct pairings at best true positive (TP) rates of 70%—far above the very same estimates in type-(ii) solutions. A machine learning classification algorithm helps to show further that differences between optimized solutions based on TP rates are not artificial and may have biological meaning associated with the three-dimensional distribution of the MI signal. Type-(i) solutions may therefore correspond to reliable results for predictive purposes, found here to be more likely obtained via MI maximization across protein systems having a minimum critical number of amino acid contacts on their interaction surfaces (N > 200).
Collapse
|
105
|
Zou T, Woodrum BW, Halloran N, Campitelli P, Bobkov AA, Ghirlanda G, Ozkan SB. Local Interactions That Contribute Minimal Frustration Determine Foldability. J Phys Chem B 2021; 125:2617-2626. [PMID: 33687216 DOI: 10.1021/acs.jpcb.1c00364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Earlier experiments suggest that the evolutionary information (conservation and coevolution) encoded in protein sequences is necessary and sufficient to specify the fold of a protein family. However, there is no computational work to quantify the effect of such evolutionary information on the folding process. Here we explore the role of early folding steps for sequences designed using coevolution and conservation through a combination of computational and experimental methods. We simulated a repertoire of native and designed WW domain sequences to analyze early local contact formation and found that the N-terminal β-hairpin turn would not form correctly due to strong non-native local contacts in unfoldable sequences. Through a maximum likelihood approach, we identified five local contacts that play a critical role in folding, suggesting that a small subset of amino acid pairs can be used to solve the "needle in the haystack" problem to design foldable sequences. Thus, using the contact probability of those five local contacts that form during the early stage of folding, we built a classification model that predicts the foldability of a WW sequence with 81% accuracy. This classification model was used to redesign WW domain sequences that could not fold due to frustration and make them foldable by introducing a few mutations that led to the stabilization of these critical local contacts. The experimental analysis shows that a redesigned sequence folds and binds to polyproline peptides with a similar affinity as those observed for native WW domains. Overall, our analysis shows that evolutionary-designed sequences should not only satisfy the folding stability but also ensure a minimally frustrated folding landscape.
Collapse
Affiliation(s)
- Taisong Zou
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona 85287, United States
| | - Brian W Woodrum
- School of Molecular Sciences, Arizona State University, Tempe, Arizona 85287, United States
| | - Nicholas Halloran
- School of Molecular Sciences, Arizona State University, Tempe, Arizona 85287, United States
| | - Paul Campitelli
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona 85287, United States
| | - Andrey A Bobkov
- Conrad Prebys Center for Chemical Genomics, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, California 92037, United States
| | - Giovanna Ghirlanda
- School of Molecular Sciences, Arizona State University, Tempe, Arizona 85287, United States
| | - Sefika Banu Ozkan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona 85287, United States
| |
Collapse
|
106
|
The Predicted Mannosyltransferase GT69-2 Antagonizes RFW-1 To Regulate Cell Fusion in Neurospora crassa. mBio 2021; 12:mBio.00307-21. [PMID: 33727349 PMCID: PMC8092235 DOI: 10.1128/mbio.00307-21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Filamentous fungi undergo somatic cell fusion to create a syncytial, interconnected hyphal network which confers a fitness benefit during colony establishment. However, barriers to somatic cell fusion between genetically different cells have evolved that reduce invasion by parasites or exploitation by maladapted genetic entities (cheaters). Here, we identified a predicted mannosyltransferase, glycosyltransferase family 69 protein (GT69-2) that was required for somatic cell fusion in Neurospora crassa Cells lacking GT69-2 prematurely ceased chemotropic signaling and failed to complete cell wall dissolution and membrane merger in pairings with wild-type cells or between Δgt69-2 cells (self fusion). However, loss-of-function mutations in the linked regulator of cell fusion and cell wall remodeling-1 (rfw-1) locus suppressed the self-cell-fusion defects of Δgt69-2 cells, although Δgt69-2 Δrfw-1 double mutants still failed to undergo fusion with wild-type cells. Both GT69-2 and RFW-1 localized to the Golgi apparatus. Genetic analyses indicated that RFW-1 negatively regulates cell wall remodeling-dependent processes, including cell wall dissolution during cell fusion, separation of conidia during asexual sporulation, and conidial germination. GT69-2 acts as an antagonizer to relieve or prevent negative functions on cell fusion by RFW-1. In Neurospora species and N. crassa populations, alleles of gt69-2 were highly polymorphic and fell into two discrete haplogroups. In all isolates within haplogroup I, rfw-1 was conserved and linked to gt69-2 All isolates within haplogroup II lacked rfw-1. These data indicated that gt69-2/rfw-1 are under balancing selection and provide new mechanisms regulating cell wall remodeling during cell fusion and conidial separation.IMPORTANCE Cell wall remodeling is a dynamic process that balances cell wall integrity versus cell wall dissolution. In filamentous fungi, cell wall dissolution is required for somatic cell fusion and conidial separation during asexual sporulation. In the filamentous fungus Neurospora crassa, allorecognition checkpoints regulate the cell fusion process between genetically different cells. Our study revealed two linked loci with transspecies polymorphisms and under coevolution, rfw-1 and gt69-2, which form a coordinated system to regulate cell wall remodeling during somatic cell fusion, conidial separation, and asexual spore germination. RFW-1 acts as a negative regulator of these three processes, while GT69-2 functions antagonistically to RFW-1. Our findings provide new insight into the mechanisms involved in regulation of fungal cell wall remodeling during growth and development.
Collapse
|
107
|
Hu L, Wang X, Huang YA, Hu P, You ZH. A survey on computational models for predicting protein-protein interactions. Brief Bioinform 2021; 22:6159365. [PMID: 33693513 DOI: 10.1093/bib/bbab036] [Citation(s) in RCA: 58] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Revised: 12/31/2020] [Indexed: 12/24/2022] Open
Abstract
Proteins interact with each other to play critical roles in many biological processes in cells. Although promising, laboratory experiments usually suffer from the disadvantages of being time-consuming and labor-intensive. The results obtained are often not robust and considerably uncertain. Due recently to advances in high-throughput technologies, a large amount of proteomics data has been collected and this presents a significant opportunity and also a challenge to develop computational models to predict protein-protein interactions (PPIs) based on these data. In this paper, we present a comprehensive survey of the recent efforts that have been made towards the development of effective computational models for PPI prediction. The survey introduces the algorithms that can be used to learn computational models for predicting PPIs, and it classifies these models into different categories. To understand their relative merits, the paper discusses different validation schemes and metrics to evaluate the prediction performance. Biological databases that are commonly used in different experiments for performance comparison are also described and their use in a series of extensive experiments to compare different prediction models are discussed. Finally, we present some open issues in PPI prediction for future work. We explain how the performance of PPI prediction can be improved if these issues are effectively tackled.
Collapse
Affiliation(s)
- Lun Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, 830011, Urumqi, China
| | - Xiaojuan Wang
- School of Computer Science and Technology, Wuhan University of Technology, 430070, Wuhan, China
| | - Yu-An Huang
- College of Computer Science and Software Engineering, Shenzhen University, 518060, Shenzhen, China
| | | | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, 830011, Urumqi, China
| |
Collapse
|
108
|
Wang Y, Correa Marrero M, Medema MH, van Dijk ADJ. Coevolution-based prediction of protein-protein interactions in polyketide biosynthetic assembly lines. Bioinformatics 2021; 36:4846-4853. [PMID: 32592463 DOI: 10.1093/bioinformatics/btaa595] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2019] [Revised: 05/20/2020] [Accepted: 06/19/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Polyketide synthases (PKSs) are enzymes that generate diverse molecules of great pharmaceutical importance, including a range of clinically used antimicrobials and antitumor agents. Many polyketides are synthesized by cis-AT modular PKSs, which are organized in assembly lines, in which multiple enzymes line up in a specific order. This order is defined by specific protein-protein interactions (PPIs). The unique modular structure and catalyzing mechanism of these assembly lines makes their products predictable and also spurred combinatorial biosynthesis studies to produce novel polyketides using synthetic biology. However, predicting the interactions of PKSs, and thereby inferring the order of their assembly line, is still challenging, especially for cases in which this order is not reflected by the ordering of the PKS-encoding genes in the genome. RESULTS Here, we introduce PKSpop, which uses a coevolution-based PPI algorithm to infer protein order in PKS assembly lines. Our method accurately predicts protein orders (93% accuracy). Additionally, we identify new residue pairs that are key in determining interaction specificity, and show that coevolution of N- and C-terminal docking domains of PKSs is significantly more predictive for PPIs than coevolution between ketosynthase and acyl carrier protein domains. AVAILABILITY AND IMPLEMENTATION The code is available on http://www.bif.wur.nl/ (under 'Software'). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | | | - Aalt D J van Dijk
- Bioinformatics Group.,Department of Plant Sciences Biometris, Wageningen University & Research, 6708 PB Wageningen, The Netherlands
| |
Collapse
|
109
|
Haldane A, Levy RM. Mi3-GPU: MCMC-based Inverse Ising Inference on GPUs for protein covariation analysis. COMPUTER PHYSICS COMMUNICATIONS 2021; 260:107312. [PMID: 33716309 PMCID: PMC7944406 DOI: 10.1016/j.cpc.2020.107312] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Inverse Ising inference is a method for inferring the coupling parameters of a Potts/Ising model based on observed site-covariation, which has found important applications in protein physics for detecting interactions between residues in protein families. We introduce Mi3-GPU ("mee-three", for MCMC Inverse Ising Inference) software for solving the inverse Ising problem for protein-sequence datasets with few analytic approximations, by parallel Markov-Chain Monte-Carlo sampling on GPUs. We also provide tools for analysis and preparation of protein-family Multiple Sequence Alignments (MSAs) to account for finite-sampling issues, which are a major source of error or bias in inverse Ising inference. Our method is "generative" in the sense that the inferred model can be used to generate synthetic MSAs whose mutational statistics (marginals) can be verified to match the dataset MSA statistics up to the limits imposed by the effects of finite sampling. Our GPU implementation enables the construction of models which reproduce the covariation patterns of the observed MSA with a precision that is not possible with more approximate methods. The main components of our method are a GPU-optimized algorithm to greatly accelerate MCMC sampling, combined with a multi-step Quasi-Newton parameter-update scheme using a "Zwanzig reweighting" technique. We demonstrate the ability of this software to produce generative models on typical protein family datasets for sequence lengths L ~ 300 with 21 residue types with tens of millions of inferred parameters in short running times.
Collapse
Affiliation(s)
- Allan Haldane
- Center for Biophysics and Computational Biology and Department of Physics, Temple University, Philadelphia, Pennsylvania 19122
| | - Ronald M. Levy
- Center for Biophysics and Computational Biology and Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122
| |
Collapse
|
110
|
D'Amico F, Candido S, Libra M. Interaction between matrix metalloproteinase-9 (MMP-9) and neutrophil gelatinase-associated lipocalin (NGAL): A recent evolutionary event in primates. DEVELOPMENTAL AND COMPARATIVE IMMUNOLOGY 2021; 116:103933. [PMID: 33245981 DOI: 10.1016/j.dci.2020.103933] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 10/30/2020] [Accepted: 11/18/2020] [Indexed: 06/11/2023]
Abstract
Matrix metalloproteases are known to represent an early step in the evolution of the immune system. Similarly, neutrophil gelatinase-associated lipocalin is known to be a key effector in immune response. MMP-9 interacts with NGAL, but their interaction mechanisms remain unclear. Functional interaction between proteins is ensured by coevolution. Protein coevolution was inferred by calculating the linear correlation coefficients between inter-protein distance matrices using MirrorTree. Among examined mammal species, we found a robust signal of MMP-9/NGAL coevolution exclusively within Primates (R = 0.96, p < 1e-06). Owing to the high conservation of these proteins among Mammals, we chose to utilize a recent version of Blocks in Sequences (BIS2) algorithm implemented in BIS2Analyzer webserver. Coevolution clusters between the two proteins were identified in MMP-9 fibronectin and hemopexin domains. Our results suggest that MMP-9/NGAL interaction is a recent evolutionary acquisition in Primates. Furthermore, MMP-9 hemopexin domain would represent a promising target for drug design against these molecules.
Collapse
Affiliation(s)
- Fabio D'Amico
- Department of Biomedical and Biotechnological Sciences, University of Catania, Italy.
| | - Saverio Candido
- Department of Biomedical and Biotechnological Sciences, University of Catania, Italy; Research Center for Prevention, Diagnosis and Treatment of Cancer, University of Catania, 95123, Catania, Italy
| | - Massimo Libra
- Department of Biomedical and Biotechnological Sciences, University of Catania, Italy; Research Center for Prevention, Diagnosis and Treatment of Cancer, University of Catania, 95123, Catania, Italy
| |
Collapse
|
111
|
Roche R, Bhattacharya S, Bhattacharya D. Hybridized distance- and contact-based hierarchical structure modeling for folding soluble and membrane proteins. PLoS Comput Biol 2021; 17:e1008753. [PMID: 33621244 PMCID: PMC7935296 DOI: 10.1371/journal.pcbi.1008753] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2020] [Revised: 03/05/2021] [Accepted: 01/31/2021] [Indexed: 11/18/2022] Open
Abstract
Crystallography and NMR system (CNS) is currently a widely used method for fragment-free ab initio protein folding from inter-residue distance or contact maps. Despite its widespread use in protein structure prediction, CNS is a decade-old macromolecular structure determination system that was originally developed for solving macromolecular geometry from experimental restraints as opposed to predictive modeling driven by interaction map data. As such, the adaptation of the CNS experimental structure determination protocol for ab initio protein folding is intrinsically anomalous that may undermine the folding accuracy of computational protein structure prediction. In this paper, we propose a new CNS-free hierarchical structure modeling method called DConStruct for folding both soluble and membrane proteins driven by distance and contact information. Rigorous experimental validation shows that DConStruct attains much better reconstruction accuracy than CNS when tested with the same input contact map at varying contact thresholds. The hierarchical modeling with iterative self-correction employed in DConStruct scales at a much higher degree of folding accuracy than CNS with the increase in contact thresholds, ultimately approaching near-optimal reconstruction accuracy at higher-thresholded contact maps. The folding accuracy of DConStruct can be further improved by exploiting distance-based hybrid interaction maps at tri-level thresholding, as demonstrated by the better performance of our method in folding free modeling targets from the 12th and 13th rounds of the Critical Assessment of techniques for protein Structure Prediction (CASP) experiments compared to popular CNS- and fragment-based approaches and energy-minimization protocols, some of which even using much finer-grained distance maps than ours. Additional large-scale benchmarking shows that DConStruct can significantly improve the folding accuracy of membrane proteins compared to a CNS-based approach. These results collectively demonstrate the feasibility of greatly improving the accuracy of ab initio protein folding by optimally exploiting the information encoded in inter-residue interaction maps beyond what is possible by CNS. Predicting the folded and functional 3-dimensional structure of a protein molecule from its amino acid sequence is of central importance to structural biology. Recently, promising advances have been made in ab initio protein folding due to the reasonably accurate estimation of inter-residue interaction maps at increasingly higher resolutions that range from binary contacts to finer-grained distances. Despite the progress in predicting the interaction maps, approaches for turning the residue-residue interactions projected in these maps into their precise spatial positioning heavily rely on a decade-old experimental structure determination protocol that is not suitable for predictive modeling. This paper presents a new hierarchical structure modeling method, DConStruct, which can better exploit the information encoded in the interaction maps at multiple granularities, from binary contact maps to distance-based hybrid maps at tri-level thresholding, for improved ab initio folding. Multiple large-scale benchmarking experiments show that our proposed method can substantially improve the folding accuracy for both soluble and membrane proteins compared to state-of-the-art approaches. DConStruct is licensed under the GNU General Public License v3 and freely available at https://github.com/Bhattacharya-Lab/DConStruct.
Collapse
Affiliation(s)
- Rahmatullah Roche
- Department of Computer Science and Software Engineering, Auburn University, Auburn, Alabama, United States of America
| | - Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, Alabama, United States of America
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, Alabama, United States of America
- Department of Biological Sciences, Auburn University, Auburn, Alabama, United States of America
- * E-mail:
| |
Collapse
|
112
|
Timonina D, Sharapova Y, Švedas V, Suplatov D. Bioinformatic analysis of subfamily-specific regions in 3D-structures of homologs to study functional diversity and conformational plasticity in protein superfamilies. Comput Struct Biotechnol J 2021; 19:1302-1311. [PMID: 33738079 PMCID: PMC7933735 DOI: 10.1016/j.csbj.2021.02.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Revised: 02/08/2021] [Accepted: 02/09/2021] [Indexed: 02/07/2023] Open
Abstract
Local 3D-structural differences in homologous proteins contribute to functional diversity observed in a superfamily, but so far received little attention as bioinformatic analysis was usually carried out at the level of amino acid sequences. We have developed Zebra3D - the first-of-its-kind bioinformatic software for systematic analysis of 3D-alignments of protein families using machine learning. The new tool identifies subfamily-specific regions (SSRs) - patterns of local 3D-structure (i.e. single residues, loops, or secondary structure fragments) that are spatially equivalent within families/subfamilies, but are different among them, and thus can be associated with functional diversity and function-related conformational plasticity. Bioinformatic analysis of protein superfamilies by Zebra3D can be used to study 3D-determinants of catalytic activity and specific accommodation of ligands, help to prepare focused libraries for directed evolution or assist development of chimeric enzymes with novel properties by exchange of equivalent regions between homologs, and to characterize plasticity in binding sites. A companion Mustguseal web-server is available to automatically construct a 3D-alignment of functionally diverse proteins, thus reducing the minimal input required to operate Zebra3D to a single PDB code. The Zebra3D + Mustguseal combined approach provides the opportunity to systematically explore the value of SSRs in superfamilies and to use this information for protein design and drug discovery. The software is available open-access at https://biokinet.belozersky.msu.ru/Zebra3D.
Collapse
Affiliation(s)
- Daria Timonina
- Lomonosov Moscow State University, Faculty of Bioengineering and Bioinformatics, Lenin Hills 1-73, Moscow 119234, Russia
| | - Yana Sharapova
- Lomonosov Moscow State University, Faculty of Bioengineering and Bioinformatics, Lenin Hills 1-73, Moscow 119234, Russia
- Lomonosov Moscow State University, Belozersky Institute of Physicochemical Biology, Lenin Hills 1-73, Moscow 119234, Russia
| | - Vytas Švedas
- Lomonosov Moscow State University, Faculty of Bioengineering and Bioinformatics, Lenin Hills 1-73, Moscow 119234, Russia
- Lomonosov Moscow State University, Belozersky Institute of Physicochemical Biology, Lenin Hills 1-73, Moscow 119234, Russia
| | - Dmitry Suplatov
- Lomonosov Moscow State University, Belozersky Institute of Physicochemical Biology, Lenin Hills 1-73, Moscow 119234, Russia
- Corresponding author.
| |
Collapse
|
113
|
Bernabeu M, Rosselló JA. Molecular Evolution of rbcL in Orthotrichales (Bryophyta): Site Variation, Adaptive Evolution, and Coevolutionary Patterns of Amino Acid Replacements. J Mol Evol 2021; 89:225-237. [PMID: 33611663 DOI: 10.1007/s00239-021-09998-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2020] [Accepted: 01/31/2021] [Indexed: 11/24/2022]
Abstract
Molecular evolution of the large subunit of the RuBisCO enzyme is understudied in early diverging land plants. These groups show morphological and eco-physiological adaptations to the uneven and intermittent distribution of water in the terrestrial environment. This might have prompted a continuous fine-tuning of RuBisCO under a selective pressure modifying the species-specific optima for photosynthesis in contrasting microdistributions and environmental niches. To gain a better insight into the molecular evolution of RuBisCO large subunits, the aim of this study was to assess the pattern of evolutionary change in the amino acid residues in a monophyletic group of Bryophyta (Orthotrichaceae). Tests for positive, neutral, or purifying selection at the amino acid level were assessed by comparing rates (ω) of non-synonymous (dN) and synonymous (dS) nucleotide substitutions along a Maximum Likelihood phylogenetic tree. Molecular adaptation tests using likelihood ratio tests, reconstruction of ancestral amino acid sites, and intra-protein coevolution analyses were performed. Variable amino acid sites (39) were unevenly distributed across the LSU. The residues are located on rbcL sites that are highly variable in higher plants and close to key regions implying dimer-dimer (L2L2), RuBisCO-activase interactions, and conformational functions during catalysis. Ten rbcL sites (32, 33, 91, 230, 247, 251, 255, 424, 449 and 475) have been identified by the Bayesian Empirical Bayes inference to be under positive selection and under adaptive evolution under the M8 model. The pattern of amino acid variation suggests that it is not lineage specific, but rather representative of a case of convergent evolution, suggesting recurrent changes that potentially favor the same amino acid substitutions that are likely optimized the RuBisCO activity.
Collapse
Affiliation(s)
- Moisès Bernabeu
- Departament de Genètica, Universitat de València, c/ Doctor Moliner 50, Burjassot, 46100, ,València, Spain
| | - Josep A Rosselló
- Jardín Botánico, ICBiBE, Universitat de València, c/ Quart 80, 46008, València, Spain.
| |
Collapse
|
114
|
New Insights into the Co-Occurrences of Glycoside Hydrolase Genes among Prokaryotic Genomes through Network Analysis. Microorganisms 2021; 9:microorganisms9020427. [PMID: 33669523 PMCID: PMC7922503 DOI: 10.3390/microorganisms9020427] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2021] [Revised: 02/06/2021] [Accepted: 02/14/2021] [Indexed: 12/21/2022] Open
Abstract
Glycoside hydrolase (GH) represents a crucial category of enzymes for carbohydrate utilization in most organisms. A series of glycoside hydrolase families (GHFs) have been classified, with relevant information deposited in the CAZy database. Statistical analysis indicated that most GHFs (134 out of 154) were prone to exist in bacteria rather than archaea, in terms of both occurrence frequencies and average gene numbers. Co-occurrence analysis suggested the existence of strong or moderate-strong correlations among 63 GHFs. A combination of network analysis by Gephi and functional classification among these GHFs demonstrated the presence of 12 functional categories (from group A to L), with which the corresponding microbial collections were subsequently labeled, respectively. Interestingly, a progressive enrichment of particular GHFs was found among several types of microbes, and type-L as well as type-E microbes were deemed as functional intensified species which formed during the microbial evolution process toward efficient decomposition of lignocellulose as well as pectin, respectively. Overall, integrating network analysis and enzymatic functional classification, we were able to provide a new angle of view for GHs from known prokaryotic genomes, and thus this study is likely to guide the selection of GHs and microbes for efficient biomass utilization.
Collapse
|
115
|
Armingol E, Officer A, Harismendy O, Lewis NE. Deciphering cell-cell interactions and communication from gene expression. Nat Rev Genet 2021; 22:71-88. [PMID: 33168968 PMCID: PMC7649713 DOI: 10.1038/s41576-020-00292-x] [Citation(s) in RCA: 556] [Impact Index Per Article: 185.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/25/2020] [Indexed: 12/13/2022]
Abstract
Cell-cell interactions orchestrate organismal development, homeostasis and single-cell functions. When cells do not properly interact or improperly decode molecular messages, disease ensues. Thus, the identification and quantification of intercellular signalling pathways has become a common analysis performed across diverse disciplines. The expansion of protein-protein interaction databases and recent advances in RNA sequencing technologies have enabled routine analyses of intercellular signalling from gene expression measurements of bulk and single-cell data sets. In particular, ligand-receptor pairs can be used to infer intercellular communication from the coordinated expression of their cognate genes. In this Review, we highlight discoveries enabled by analyses of cell-cell interactions from transcriptomic data and review the methods and tools used in this context.
Collapse
Affiliation(s)
- Erick Armingol
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
- Novo Nordisk Foundation Center for Biosustainability at the University of California, San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, USA
| | - Adam Officer
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, USA
- Division of Biomedical Informatics, University of California, San Diego, La Jolla, CA, USA
| | - Olivier Harismendy
- Division of Biomedical Informatics, University of California, San Diego, La Jolla, CA, USA.
- Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA.
| | - Nathan E Lewis
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA.
- Novo Nordisk Foundation Center for Biosustainability at the University of California, San Diego, La Jolla, CA, USA.
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
116
|
Taddese B, Garnier A, Deniaud M, Henrion D, Chabbert M. Bios2cor: an R package integrating dynamic and evolutionary correlations to identify functionally important residues in proteins. Bioinformatics 2021; 37:2483-2484. [PMID: 33471079 DOI: 10.1093/bioinformatics/btab002] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Revised: 12/22/2020] [Accepted: 01/04/2021] [Indexed: 11/14/2022] Open
Abstract
SUMMARY Both dynamic correlations in protein sidechain motions during molecular dynamics (MD) simulations and evolutionary correlations in multiple sequence alignments (MSA) of homologous proteins may reveal functionally important residues. We developed the R package Bios2cor that provides a unique framework to investigate and, possibly, integrate both analyses. Bios2cor starts with an MSA or a MD trajectory and computes correlation/covariation scores between positions in the MSA or between sidechain dihedral angles or rotamers in the MD trajectory. In addition, Bios2cor provides a variety of tools for the analysis, the visualization and the interpretation of the data. AVAILABILITY The R package Bios2cor is available from the Comprehensive R Archive Network, at http://cran.r-project.org/ web/packages/Bios2cor/index.html.
Collapse
Affiliation(s)
- Bruck Taddese
- CNRS UMR 6015-INSERM 1083, MITOVASC Laboratory, FRANCE, 3 rue Roger Amsler 49100 ANGERS
| | - Antoine Garnier
- CNRS UMR 6015-INSERM 1083, MITOVASC Laboratory, FRANCE, 3 rue Roger Amsler 49100 ANGERS
| | - Madeline Deniaud
- CNRS UMR 6015-INSERM 1083, MITOVASC Laboratory, FRANCE, 3 rue Roger Amsler 49100 ANGERS
| | - Daniel Henrion
- CNRS UMR 6015-INSERM 1083, MITOVASC Laboratory, FRANCE, 3 rue Roger Amsler 49100 ANGERS
| | - Marie Chabbert
- CNRS UMR 6015-INSERM 1083, MITOVASC Laboratory, FRANCE, 3 rue Roger Amsler 49100 ANGERS
| |
Collapse
|
117
|
Pontes C, Ruiz-Serra V, Lepore R, Valencia A. Unraveling the molecular basis of host cell receptor usage in SARS-CoV-2 and other human pathogenic β-CoVs. Comput Struct Biotechnol J 2021; 19:759-766. [PMID: 33456724 PMCID: PMC7802526 DOI: 10.1016/j.csbj.2021.01.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 01/07/2021] [Accepted: 01/07/2021] [Indexed: 01/13/2023] Open
Abstract
The recent emergence of the novel SARS-CoV-2 in China and its rapid spread in the human population has led to a public health crisis worldwide. Like in SARS-CoV, horseshoe bats currently represent the most likely candidate animal source for SARS-CoV-2. Yet, the specific mechanisms of cross-species transmission and adaptation to the human host remain unknown. Here we show that the unsupervised analysis of conservation patterns across the β-CoV spike protein family, using sequence information alone, can provide valuable insights on the molecular basis of the specificity of β-CoVs to different host cell receptors. More precisely, our results indicate that host cell receptor usage is encoded in the amino acid sequences of different CoV spike proteins in the form of a set of specificity determining positions (SDPs). Furthermore, by integrating structural data, in silico mutagenesis and coevolution analysis we could elucidate the role of SDPs in mediating ACE2 binding across the Sarbecovirus lineage, either by engaging the receptor through direct intermolecular interactions or by affecting the local environment of the receptor binding motif. Finally, by the analysis of coevolving mutations across a paired MSA we were able to identify key intermolecular contacts occurring at the spike-ACE2 interface. These results show that effective mining of the evolutionary records held in the sequence of the spike protein family can help tracing the molecular mechanisms behind the evolution and host-receptor adaptation of circulating and future novel β-CoVs.
Collapse
Key Words
- APC, average product correction
- CoVs, Coronaviruses
- EV, evolutionary rate
- Functional specificity
- MCA, multiple correspondence analysis
- MI, mutual information
- MSA, multiple sequence alignment
- NTD, N-terminal domain
- Phylogenetic analysis
- Protein subfamilies
- RBD, receptor binding domain
- RBM, receptor binding motif
- SARS-CoV-2
- SDPs, specificity determining positions
- Specificity Determining Positions
- Spike protein evolution
- hACE2, human angiotensin converting enzyme 2
Collapse
Affiliation(s)
- Camila Pontes
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
- University of Brasília (UnB), 70910-900, Brasília - DF, Brazil
| | | | - Rosalba Lepore
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
| | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
118
|
Chen Z, Shen Z, Xu L, Zhao D, Zou Q. Regulator Network Analysis of Rice and Maize Yield-Related Genes. Front Cell Dev Biol 2021; 8:621464. [PMID: 33425929 PMCID: PMC7793993 DOI: 10.3389/fcell.2020.621464] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Accepted: 11/12/2020] [Indexed: 11/13/2022] Open
Abstract
Rice and maize are the principal food crop species worldwide. The mechanism of gene regulation for the yield of rice and maize is still the research focus at present. Seed size, weight and shape are important traits of crop yield in rice and maize. Most members of three gene families, APETALA2/ethylene response factor, auxin response factors and MADS, were identified to be involved in yield traits in rice and maize. Analysis of molecular regulation mechanisms related to yield traits provides theoretical support for the improvement of crop yield. Genetic regulatory network analysis can provide new insights into gene families with the improvement of sequencing technology. Here, we analyzed the evolutionary relationships and the genetic regulatory network for the gene family members to predicted genes that may be involved in yield-related traits in rice and maize. The results may provide some theoretical and application guidelines for future investigations of molecular biology, which may be helpful for developing new rice and maize varieties with high yield traits.
Collapse
Affiliation(s)
- Zheng Chen
- School of Applied Chemistry and Biological Technology, Shenzhen Polytechnic, Shenzhen, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Zijie Shen
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Da Zhao
- School of Applied Chemistry and Biological Technology, Shenzhen Polytechnic, Shenzhen, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
119
|
Slater O, Miller B, Kontoyianni M. Decoding Protein-protein Interactions: An Overview. Curr Top Med Chem 2021; 20:855-882. [PMID: 32101126 DOI: 10.2174/1568026620666200226105312] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2019] [Revised: 11/27/2019] [Accepted: 11/27/2019] [Indexed: 12/24/2022]
Abstract
Drug discovery has focused on the paradigm "one drug, one target" for a long time. However, small molecules can act at multiple macromolecular targets, which serves as the basis for drug repurposing. In an effort to expand the target space, and given advances in X-ray crystallography, protein-protein interactions have become an emerging focus area of drug discovery enterprises. Proteins interact with other biomolecules and it is this intricate network of interactions that determines the behavior of the system and its biological processes. In this review, we briefly discuss networks in disease, followed by computational methods for protein-protein complex prediction. Computational methodologies and techniques employed towards objectives such as protein-protein docking, protein-protein interactions, and interface predictions are described extensively. Docking aims at producing a complex between proteins, while interface predictions identify a subset of residues on one protein that could interact with a partner, and protein-protein interaction sites address whether two proteins interact. In addition, approaches to predict hot spots and binding sites are presented along with a representative example of our internal project on the chemokine CXC receptor 3 B-isoform and predictive modeling with IP10 and PF4.
Collapse
Affiliation(s)
- Olivia Slater
- Department of Pharmaceutical Sciences, Southern Illinois University, Edwardsville, IL 62026, United States
| | - Bethany Miller
- Department of Pharmaceutical Sciences, Southern Illinois University, Edwardsville, IL 62026, United States
| | - Maria Kontoyianni
- Department of Pharmaceutical Sciences, Southern Illinois University, Edwardsville, IL 62026, United States
| |
Collapse
|
120
|
Kaur H, Kalia M, Singh V, Modgil V, Mohan B, Taneja N. In silico identification and characterization of promising drug targets in highly virulent uropathogenic Escherichia coli strain CFT073 by protein-protein interaction network analysis. INFORMATICS IN MEDICINE UNLOCKED 2021. [DOI: 10.1016/j.imu.2021.100704] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
|
121
|
Salmanian S, Pezeshk H, Sadeghi M. Inter-protein residue covariation information unravels physically interacting protein dimers. BMC Bioinformatics 2020; 21:584. [PMID: 33334319 PMCID: PMC7745481 DOI: 10.1186/s12859-020-03930-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Accepted: 12/09/2020] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Predicting physical interaction between proteins is one of the greatest challenges in computational biology. There are considerable various protein interactions and a huge number of protein sequences and synthetic peptides with unknown interacting counterparts. Most of co-evolutionary methods discover a combination of physical interplays and functional associations. However, there are only a handful of approaches which specifically infer physical interactions. Hybrid co-evolutionary methods exploit inter-protein residue coevolution to unravel specific physical interacting proteins. In this study, we introduce a hybrid co-evolutionary-based approach to predict physical interplays between pairs of protein families, starting from protein sequences only. RESULTS In the present analysis, pairs of multiple sequence alignments are constructed for each dimer and the covariation between residues in those pairs are calculated by CCMpred (Contacts from Correlated Mutations predicted) and three mutual information based approaches for ten accessible surface area threshold groups. Then, whole residue couplings between proteins of each dimer are unified into a single Frobenius norm value. Norms of residue contact matrices of all dimers in different accessible surface area thresholds are fed into support vector machine as single or multiple feature models. The results of training the classifiers by single features show no apparent different accuracies in distinct methods for different accessible surface area thresholds. Nevertheless, mutual information product and context likelihood of relatedness procedures may roughly have an overall higher and lower performances than other two methods for different accessible surface area cut-offs, respectively. The results also demonstrate that training support vector machine with multiple norm features for several accessible surface area thresholds leads to a considerable improvement of prediction performance. In this context, CCMpred roughly achieves an overall better performance than mutual information based approaches. The best accuracy, sensitivity, specificity, precision and negative predictive value for that method are 0.98, 1, 0.962, 0.96, and 0.962, respectively. CONCLUSIONS In this paper, by feeding norm values of protein dimers into support vector machines in different accessible surface area thresholds, we demonstrate that even small number of proteins in pairs of multiple alignments could allow one to accurately discriminate between positive and negative dimers.
Collapse
Affiliation(s)
- Sara Salmanian
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Hamid Pezeshk
- School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran
- Present Address: Department of Mathematics and Statistics, Concordia University, Montreal, Canada
- School of Biological Sciences, Institute for Research in Fundamental Sciences, Tehran, Iran
| | - Mehdi Sadeghi
- National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
| |
Collapse
|
122
|
CryoEM map of Pseudomonas aeruginosa PilQ enables structural characterization of TsaP. Structure 2020; 29:457-466.e4. [PMID: 33338410 DOI: 10.1016/j.str.2020.11.019] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 10/22/2020] [Accepted: 11/24/2020] [Indexed: 01/22/2023]
Abstract
The type IV pilus machinery is a multi-protein complex that polymerizes and depolymerizes a pilus fiber used for attachment, twitching motility, phage adsorption, natural competence, protein secretion, and surface-sensing. An outer membrane secretin pore is required for passage of the pilus fiber out of the cell. Herein, the structure of the tetradecameric secretin, PilQ, from the Pseudomonas aeruginosa type IVa pilus system was determined to 4.3 Å and 4.4 Å resolution in the presence and absence of C7 symmetric spikes, respectively. The heptameric spikes were found to be two tandem C-terminal domains of TsaP. TsaP forms a belt around PilQ and while it is not essential for twitching motility, overexpression of TsaP triggers a signal cascade upstream of PilY1 leading to cyclic di-GMP up-regulation. These results resolve the identity of the spikes identified with Proteobacterial PilQ homologs and may reveal a new component of the surface-sensing cyclic di-GMP signal cascade.
Collapse
|
123
|
Kanapeckaitė A, Beaurivage C, Hancock M, Verschueren E. Fi-score: a novel approach to characterise protein topology and aid in drug discovery studies. J Biomol Struct Dyn 2020; 40:4197-4207. [DOI: 10.1080/07391102.2020.1854859] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Affiliation(s)
| | - Claudia Beaurivage
- Galapagos BV, Leiden, The Netherlands
- Department of Biomedical Science, Faculty of Science, University of Sheffield, Sheffield, UK
| | | | | |
Collapse
|
124
|
Muntoni AP, Pagnani A, Weigt M, Zamponi F. Aligning biological sequences by exploiting residue conservation and coevolution. Phys Rev E 2020; 102:062409. [PMID: 33465950 DOI: 10.1103/physreve.102.062409] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Accepted: 11/12/2020] [Indexed: 11/07/2022]
Abstract
Sequences of nucleotides (for DNA and RNA) or amino acids (for proteins) are central objects in biology. Among the most important computational problems is that of sequence alignment, i.e., arranging sequences from different organisms in such a way to identify similar regions, to detect evolutionary relationships between sequences, and to predict biomolecular structure and function. This is typically addressed through profile models, which capture position specificities like conservation in sequences but assume an independent evolution of different positions. Over recent years, it has been well established that coevolution of different amino-acid positions is essential for maintaining three-dimensional structure and function. Modeling approaches based on inverse statistical physics can catch the coevolution signal in sequence ensembles, and they are now widely used in predicting protein structure, protein-protein interactions, and mutational landscapes. Here, we present DCAlign, an efficient alignment algorithm based on an approximate message-passing strategy, which is able to overcome the limitations of profile models, to include coevolution among positions in a general way, and to be therefore universally applicable to protein- and RNA-sequence alignment without the need of using complementary structural information. The potential of DCAlign is carefully explored using well-controlled simulated data, as well as real protein and RNA sequences.
Collapse
Affiliation(s)
- Anna Paola Muntoni
- Department of Applied Science and Technology (DISAT), Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy
- Laboratoire de Physique de l'Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative LCQB, F-75005 Paris, France
| | - Andrea Pagnani
- Department of Applied Science and Technology (DISAT), Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy
- Italian Institute for Genomic Medicine, IRCCS Candiolo, SP-142, I-10060 Candiolo (TO), Italy
- INFN, Sezione di Torino, Via Giuria 1, I-10125 Torino, Italy
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative LCQB, F-75005 Paris, France
| | - Francesco Zamponi
- Laboratoire de Physique de l'Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France
| |
Collapse
|
125
|
Srikant S. Evolutionary history of ATP-binding cassette proteins. FEBS Lett 2020; 594:3882-3897. [PMID: 33145769 DOI: 10.1002/1873-3468.13985] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Revised: 10/01/2020] [Accepted: 10/15/2020] [Indexed: 12/11/2022]
Abstract
ATP-binding cassette (ABC) proteins are found in every sequenced genome and evolved deep in the phylogenetic tree of life. ABC proteins form one of the largest homologous protein families, with most being involved in substrate transport across biological membranes, and a few cytoplasmic members regulating in essential processes like translation. The predominant ABC protein classification scheme is derived from human members, but the increasing number of fully sequenced genomes permits to reevaluate this paradigm in the light of the evolutionary history the ABC-protein superfamily. As we study the diversity of substrates, mechanisms, and physiological roles of ABC proteins, knowledge of the evolutionary relationships highlights similarities and differences that can be attributed to specific branches in protein divergence. While alignments and trees built on natural sequence variation account for the evolutionary divergence of ABC proteins, high-throughput experiments and next-generation sequencing creating experimental sequence variation are instrumental in identifying functional constraints. The combination of natural and experimentally produced sequence variation allows a broader and more rational study of the function and physiological roles of ABC proteins.
Collapse
Affiliation(s)
- Sriram Srikant
- Department of Biology, Massachusetts Institute of Technology
| |
Collapse
|
126
|
Hu L, Hu P, Luo X, Yuan X, You ZH. Incorporating the Coevolving Information of Substrates in Predicting HIV-1 Protease Cleavage Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:2017-2028. [PMID: 31056514 DOI: 10.1109/tcbb.2019.2914208] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Human immunodeficiency virus 1 (HIV-1) protease (PR) plays a crucial role in the maturation of the virus. The study of substrate specificity of HIV-1 PR as a new endeavor strives to increase our ability to understand how HIV-1 PR recognizes its various cleavage sites. To predict HIV-1 PR cleavage sites, most of the existing approaches have been developed solely based on the homogeneity of substrate sequence information with supervised classification techniques. Although efficient, these approaches are found to be restricted to the ability of explaining their results and probably provide few insights into the mechanisms by which HIV-1 PR cleaves the substrates in a site-specific manner. In this work, a coevolutionary pattern-based prediction model for HIV-1 PR cleavage sites, namely EvoCleave, is proposed by integrating the coevolving information obtained from substrate sequences with a linear SVM classifier. The experiment results showed that EvoCleave yielded a very promising performance in terms of ROC analysis and f-measure. We also prospectively assessed the biological significance of coevolutionary patterns by applying them to study three fundamental issues of HIV-1 PR cleavage site. The analysis results demonstrated that the coevolutionary patterns offered valuable insights into the understanding of substrate specificity of HIV-1 PR.
Collapse
|
127
|
Suplatov D, Sharapova Y, Geraseva E, Švedas V. Zebra2: advanced and easy-to-use web-server for bioinformatic analysis of subfamily-specific and conserved positions in diverse protein superfamilies. Nucleic Acids Res 2020; 48:W65-W71. [PMID: 32313959 PMCID: PMC7319439 DOI: 10.1093/nar/gkaa276] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2020] [Revised: 03/29/2020] [Accepted: 04/08/2020] [Indexed: 12/17/2022] Open
Abstract
Zebra2 is a highly automated web-tool to search for subfamily-specific and conserved positions (i.e. the determinants of functional diversity as well as the key catalytic and structural residues) in protein superfamilies. The bioinformatic analysis is facilitated by Mustguseal—a companion web-server to automatically collect and superimpose a large representative set of functionally diverse homologs with high structure similarity but low sequence identity to the selected query protein. The results are automatically prioritized and provided at four information levels to facilitate the knowledge-driven expert selection of the most promising positions on-line: as a sequence similarity network; interfaces to sequence-based and 3D-structure-based analysis of conservation and variability; and accompanied by the detailed annotation of proteins accumulated from the integrated databases with links to the external resources. The integration of Zebra2 and Mustguseal web-tools provides the first of its kind out-of-the-box open-access solution to conduct a systematic analysis of evolutionarily related proteins implementing different functions within a shared 3D-structure of the superfamily, determine common and specific patterns of function-associated local structural elements, assist to select hot-spots for rational design and to prepare focused libraries for directed evolution. The web-servers are free and open to all users at https://biokinet.belozersky.msu.ru/zebra2, no login required.
Collapse
Affiliation(s)
- Dmitry Suplatov
- Lomonosov Moscow State University, Belozersky Institute of Physicochemical Biology and Faculty of Bioengineering and Bioinformatics, Lenin Hills 1-73, Moscow 119234, Russia
| | - Yana Sharapova
- Lomonosov Moscow State University, Belozersky Institute of Physicochemical Biology and Faculty of Bioengineering and Bioinformatics, Lenin Hills 1-73, Moscow 119234, Russia
| | - Elizaveta Geraseva
- Lomonosov Moscow State University, Belozersky Institute of Physicochemical Biology and Faculty of Bioengineering and Bioinformatics, Lenin Hills 1-73, Moscow 119234, Russia
| | - Vytas Švedas
- Lomonosov Moscow State University, Belozersky Institute of Physicochemical Biology and Faculty of Bioengineering and Bioinformatics, Lenin Hills 1-73, Moscow 119234, Russia
| |
Collapse
|
128
|
Casas-Pastor D, Diehl A, Fritz G. Coevolutionary Analysis Reveals a Conserved Dual Binding Interface between Extracytoplasmic Function σ Factors and Class I Anti-σ Factors. mSystems 2020; 5:e00310-20. [PMID: 32753504 PMCID: PMC7406223 DOI: 10.1128/msystems.00310-20] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Accepted: 07/17/2020] [Indexed: 11/30/2022] Open
Abstract
Extracytoplasmic function σ factors (ECFs) belong to the most abundant signal transduction mechanisms in bacteria. Among the diverse regulators of ECF activity, class I anti-σ factors are the most important signal transducers in response to internal and external stress conditions. Despite the conserved secondary structure of the class I anti-σ factor domain (ASDI) that binds and inhibits the ECF under noninducing conditions, the binding interface between ECFs and ASDIs is surprisingly variable between the published cocrystal structures. In this work, we provide a comprehensive computational analysis of the ASDI protein family and study the different contact themes between ECFs and ASDIs. To this end, we harness the coevolution of these diverse protein families and predict covarying amino acid residues as likely candidates of an interaction interface. As a result, we find two common binding interfaces linking the first alpha-helix of the ASDI to the DNA-binding region in the σ4 domain of the ECF, and the fourth alpha-helix of the ASDI to the RNA polymerase (RNAP)-binding region of the σ2 domain. The conservation of these two binding interfaces contrasts with the apparent quaternary structure diversity of the ECF/ASDI complexes, partially explaining the high specificity between cognate ECF and ASDI pairs. Furthermore, we suggest that the dual inhibition of RNAP- and DNA-binding interfaces is likely a universal feature of other ECF anti-σ factors, preventing the formation of nonfunctional trimeric complexes between σ/anti-σ factors and RNAP or DNA.IMPORTANCE In the bacterial world, extracytoplasmic function σ factors (ECFs) are the most widespread family of alternative σ factors, mediating many cellular responses to environmental cues, such as stress. This work uses a computational approach to investigate how these σ factors interact with class I anti-σ factors-the most abundant regulators of ECF activity. By comprehensively classifying the anti-σs into phylogenetic groups and by comparing this phylogeny to the one of the cognate ECFs, the study shows how these protein families have coevolved to maintain their interaction over evolutionary time. These results shed light on the common contact residues that link ECFs and anti-σs in different phylogenetic families and set the basis for the rational design of anti-σs to specifically target certain ECFs. This will help to prevent the cross talk between heterologous ECF/anti-σ pairs, allowing their use as orthogonal regulators for the construction of genetic circuits in synthetic biology.
Collapse
Affiliation(s)
- Delia Casas-Pastor
- Center for Synthetic Microbiology (SYNMIKRO), Philipps-University Marburg, Marburg, Germany
| | - Angelika Diehl
- Center for Synthetic Microbiology (SYNMIKRO), Philipps-University Marburg, Marburg, Germany
- School of Molecular Sciences, University of Western Australia, Perth, Australia
| | - Georg Fritz
- Center for Synthetic Microbiology (SYNMIKRO), Philipps-University Marburg, Marburg, Germany
- School of Molecular Sciences, University of Western Australia, Perth, Australia
| |
Collapse
|
129
|
Younginger BS, Friesen ML. Connecting signals and benefits through partner choice in plant-microbe interactions. FEMS Microbiol Lett 2020; 366:5626345. [PMID: 31730203 DOI: 10.1093/femsle/fnz217] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Accepted: 10/17/2019] [Indexed: 12/20/2022] Open
Abstract
Stabilizing mechanisms in plant-microbe symbioses are critical to maintaining beneficial functions, with two main classes: host sanctions and partner choice. Sanctions are currently presumed to be more effective and widespread, based on the idea that microbes rapidly evolve cheating while retaining signals matching cooperative strains. However, hosts that effectively discriminate among a pool of compatible symbionts would gain a significant fitness advantage. Using the well-characterized legume-rhizobium symbiosis as a model, we evaluate the evidence for partner choice in the context of the growing field of genomics. Empirical studies that rely upon bacteria varying only in nitrogen-fixation ability ignore host-symbiont signaling and frequently conclude that partner choice is not a robust stabilizing mechanism. Here, we argue that partner choice is an overlooked mechanism of mutualism stability and emphasize that plants need not use the microbial services provided a priori to discriminate among suitable partners. Additionally, we present a model that shows that partner choice signaling increases symbiont and host fitness in the absence of sanctions. Finally, we call for a renewed focus on elucidating the signaling mechanisms that are critical to partner choice while further aiming to understand their evolutionary dynamics in nature.
Collapse
Affiliation(s)
- Brett S Younginger
- Department of Plant Pathology, Washington State University, PO Box 646430, 345 Johnson Hall, Pullman, WA 99164, USA
| | - Maren L Friesen
- Department of Plant Pathology, Washington State University, PO Box 646430, 345 Johnson Hall, Pullman, WA 99164, USA.,Department of Crop and Soil Sciences, Washington State University, PO Box 646420, 115 Johnson Hall, Pullman, WA 99164, USA
| |
Collapse
|
130
|
Huang R, Huang Y, Guo Y, Ji S, Lu M, Li T. Systematic characterization and prediction of post-translational modification cross-talk between proteins. Bioinformatics 2020; 35:2626-2633. [PMID: 30590394 DOI: 10.1093/bioinformatics/bty1033] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2018] [Revised: 12/10/2018] [Accepted: 12/16/2018] [Indexed: 01/02/2023] Open
Abstract
MOTIVATION Protein post-translational modifications (PTMs) regulate a wide range of cellular protein functions. Many PTM sites from the same (intra) or different (inter) proteins often cooperate with each other to perform a function, which is defined as PTM cross-talk. PTM cross-talk within proteins attracted great attentions in the past a few years. However, the inter-protein PTM cross-talk is largely under studied due to its large protein pair space and lack of a gold standard dataset, even though the PTM interplay between proteins is a key element in cell signaling and regulatory networks. RESULTS In this study, 199 inter-protein PTM cross-talk pairs in 82 pairs of human proteins were collected from literature, which to our knowledge is the first effort in compiling such dataset. By comparing with background PTM pairs from the same protein pairs, we found that inter-protein cross-talk PTM pairs have higher sequence co-evolution at both PTM residue and motif levels. Also, we found that cross-talk PTMs have higher co-modification across multiple species and 88 human tissues or conditions. Furthermore, we showed that these features are predictive for PTM cross-talk between proteins, and applied a random forest model to integrate these features with achieving an area under the receiver operating characteristic curve of 0.81 in 10-fold cross-validation, prevailing over using any single feature alone. Therefore, this method would be a valuable tool to identify inter-protein PTM cross-talk at proteome-wide scale. AVAILABILITY AND IMPLEMENTATION A web server for prioritization of both intra- and inter-protein PTM cross-talk candidates is at http://bioinfo.bjmu.edu.cn/ptm-x/. Python code for local computer is also freely available at https://github.com/huangyh09/PTM-X. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rongting Huang
- Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China
| | - Yuanhua Huang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Yubin Guo
- Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China
| | - Shangwei Ji
- Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China
| | - Ming Lu
- Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China
| | - Tingting Li
- Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China
| |
Collapse
|
131
|
Correa Marrero M, Immink RGH, de Ridder D, van Dijk ADJ. Improved inference of intermolecular contacts through protein-protein interaction prediction using coevolutionary analysis. Bioinformatics 2020; 35:2036-2042. [PMID: 30398547 DOI: 10.1093/bioinformatics/bty924] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Revised: 10/11/2018] [Accepted: 11/05/2018] [Indexed: 01/09/2023] Open
Abstract
MOTIVATION Predicting residue-residue contacts between interacting proteins is an important problem in bioinformatics. The growing wealth of sequence data can be used to infer these contacts through correlated mutation analysis on multiple sequence alignments of interacting homologs of the proteins of interest. This requires correct identification of pairs of interacting proteins for many species, in order to avoid introducing noise (i.e. non-interacting sequences) in the analysis that will decrease predictive performance. RESULTS We have designed Ouroboros, a novel algorithm to reduce such noise in intermolecular contact prediction. Our method iterates between weighting proteins according to how likely they are to interact based on the correlated mutations signal, and predicting correlated mutations based on the weighted sequence alignment. We show that this approach accurately discriminates between protein interaction versus non-interaction and simultaneously improves the prediction of intermolecular contact residues compared to a naive application of correlated mutation analysis. This requires no training labels concerning interactions or contacts. Furthermore, the method relaxes the assumption of one-to-one interaction of previous approaches, allowing for the study of many-to-many interactions. AVAILABILITY AND IMPLEMENTATION Source code and test data are available at www.bif.wur.nl/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Richard G H Immink
- Laboratory of Molecular Biology, Department of Plant Sciences.,Bioscience, Wageningen Plant Research
| | | | - Aalt D J van Dijk
- Bioinformatics Group, Department of Plant Sciences.,Bioscience, Wageningen Plant Research.,Biometris, Department of Plant Sciences, Wageningen University & Research, Wageningen PB, The Netherlands
| |
Collapse
|
132
|
Robins WP, Mekalanos JJ. Protein covariance networks reveal interactions important to the emergence of SARS coronaviruses as human pathogens. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020. [PMID: 32577639 DOI: 10.1101/2020.06.05.136887] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
SARS-CoV-2 is one of three recognized coronaviruses (CoVs) that have caused epidemics or pandemics in the 21 st century and that have likely emerged from animal reservoirs based on genomic similarities to bat and other animal viruses. Here we report the analysis of conserved interactions between amino acid residues in proteins encoded by SARS-CoV-related viruses. We identified pairs and networks of residue variants that exhibited statistically high frequencies of covariance with each other. While these interactions are likely key to both protein structure and other protein-protein interactions, we have also found that they can be used to provide a new computational approach (CoVariance-based Phylogeny Analysis) for understanding viral evolution and adaptation. Our data provide evidence that the evolutionary processes that converted a bat virus into human pathogen occurred through recombination with other viruses in combination with new adaptive mutations important for entry into human cells.
Collapse
|
133
|
Tomiczek B, Delewski W, Nierzwicki L, Stolarska M, Grochowina I, Schilke B, Dutkiewicz R, Uzarska MA, Ciesielski SJ, Czub J, Craig EA, Marszalek J. Two-step mechanism of J-domain action in driving Hsp70 function. PLoS Comput Biol 2020; 16:e1007913. [PMID: 32479549 PMCID: PMC7289447 DOI: 10.1371/journal.pcbi.1007913] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Revised: 06/11/2020] [Accepted: 04/28/2020] [Indexed: 12/02/2022] Open
Abstract
J-domain proteins (JDPs), obligatory Hsp70 cochaperones, play critical roles in protein homeostasis. They promote key allosteric transitions that stabilize Hsp70 interaction with substrate polypeptides upon hydrolysis of its bound ATP. Although a recent crystal structure revealed the physical mode of interaction between a J-domain and an Hsp70, the structural and dynamic consequences of J-domain action once bound and how Hsp70s discriminate among its multiple JDP partners remain enigmatic. We combined free energy simulations, biochemical assays and evolutionary analyses to address these issues. Our results indicate that the invariant aspartate of the J-domain perturbs a conserved intramolecular Hsp70 network of contacts that crosses domains. This perturbation leads to destabilization of the domain-domain interface—thereby promoting the allosteric transition that triggers ATP hydrolysis. While this mechanistic step is driven by conserved residues, evolutionarily variable residues are key to initial JDP/Hsp70 recognition—via electrostatic interactions between oppositely charged surfaces. We speculate that these variable residues allow an Hsp70 to discriminate amongst JDP partners, as many of them have coevolved. Together, our data points to a two-step mode of J-domain action, a recognition stage followed by a mechanistic stage. It is well appreciated that Hsp70-based systems are the most versatile among molecular chaperones—functioning in all cell types and in all subcellular compartments. Via cyclic binding to protein substrates, Hsp70s facilitate their folding, trafficking, degradation and ability to interact with other proteins. Hsp70 function, however, depends on transient interaction with J-domain protein cochaperones that not only deliver substrates, but also activate the structural changes needed for efficient Hsp70 binding to substrate. But how J-domain proteins mechanistically function to drive these changes and how an Hsp70 discriminates among multiple J-domain partners have remained challenging central questions. Here, by using a combination of computational, evolutionary and experimental approaches, we provide evidence for a two-step mechanism. The initial recognition step involves variable residues that allow fine tuning of both the specificity and strength of J-domain protein interaction with Hsp70. The second, that is the mechanistic step, involves conserved residues that act to disrupt a conserved network of intramolecular interactions within Hsp70, thus ensuring robust activation of the structural changes necessary for effective substrate binding. We suggest that our findings are likely applicable to most Hsp70 systems.
Collapse
Affiliation(s)
- Bartlomiej Tomiczek
- Intercollegiate Faculty of Biotechnology, University of Gdansk and Medical University of Gdansk, Gdansk, Poland
| | - Wojciech Delewski
- Intercollegiate Faculty of Biotechnology, University of Gdansk and Medical University of Gdansk, Gdansk, Poland
- Department of Biochemistry, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Lukasz Nierzwicki
- Department of Physical Chemistry, Gdansk University of Technology, Gdansk, Poland
| | - Milena Stolarska
- Intercollegiate Faculty of Biotechnology, University of Gdansk and Medical University of Gdansk, Gdansk, Poland
| | - Igor Grochowina
- Intercollegiate Faculty of Biotechnology, University of Gdansk and Medical University of Gdansk, Gdansk, Poland
| | - Brenda Schilke
- Department of Biochemistry, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Rafal Dutkiewicz
- Intercollegiate Faculty of Biotechnology, University of Gdansk and Medical University of Gdansk, Gdansk, Poland
| | - Marta A. Uzarska
- Intercollegiate Faculty of Biotechnology, University of Gdansk and Medical University of Gdansk, Gdansk, Poland
| | - Szymon J. Ciesielski
- Department of Biochemistry, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Jacek Czub
- Department of Physical Chemistry, Gdansk University of Technology, Gdansk, Poland
- * E-mail: (JC); (EAC); (JM)
| | - Elizabeth A. Craig
- Department of Biochemistry, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- * E-mail: (JC); (EAC); (JM)
| | - Jaroslaw Marszalek
- Intercollegiate Faculty of Biotechnology, University of Gdansk and Medical University of Gdansk, Gdansk, Poland
- Department of Biochemistry, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- * E-mail: (JC); (EAC); (JM)
| |
Collapse
|
134
|
Andreani J, Quignot C, Guerois R. Structural prediction of protein interactions and docking using conservation and coevolution. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2020. [DOI: 10.1002/wcms.1470] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Jessica Andreani
- Université Paris‐Saclay CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC) Gif‐sur‐Yvette France
| | - Chloé Quignot
- Université Paris‐Saclay CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC) Gif‐sur‐Yvette France
| | - Raphael Guerois
- Université Paris‐Saclay CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC) Gif‐sur‐Yvette France
| |
Collapse
|
135
|
Fontove F, Del Rio G. Residue Cluster Classes: A Unified Protein Representation for Efficient Structural and Functional Classification. ENTROPY 2020; 22:e22040472. [PMID: 33286246 PMCID: PMC7516957 DOI: 10.3390/e22040472] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2020] [Revised: 03/30/2020] [Accepted: 04/07/2020] [Indexed: 11/16/2022]
Abstract
Proteins are characterized by their structures and functions, and these two fundamental aspects of proteins are assumed to be related. To model such a relationship, a single representation to model both protein structure and function would be convenient, yet so far, the most effective models for protein structure or function classification do not rely on the same protein representation. Here we provide a computationally efficient implementation for large datasets to calculate residue cluster classes (RCCs) from protein three-dimensional structures and show that such representations enable a random forest algorithm to effectively learn the structural and functional classifications of proteins, according to the CATH and Gene Ontology criteria, respectively. RCCs are derived from residue contact maps built from different distance criteria, and we show that 7 or 8 Å with or without amino acid side-chain atoms rendered the best classification models. The potential use of a unified representation of proteins is discussed and possible future areas for improvement and exploration are presented.
Collapse
Affiliation(s)
| | - Gabriel Del Rio
- Department of Biochemistry and Structural Biology, Instituto de Fisiología Celular, UNAM, Mexico City 04510, Mexico
- Correspondence:
| |
Collapse
|
136
|
Chen J, Siu SWI. Machine Learning Approaches for Quality Assessment of Protein Structures. Biomolecules 2020; 10:biom10040626. [PMID: 32316682 PMCID: PMC7226485 DOI: 10.3390/biom10040626] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 04/07/2020] [Accepted: 04/09/2020] [Indexed: 11/16/2022] Open
Abstract
Protein structures play a very important role in biomedical research, especially in drug discovery and design, which require accurate protein structures in advance. However, experimental determinations of protein structure are prohibitively costly and time-consuming, and computational predictions of protein structures have not been perfected. Methods that assess the quality of protein models can help in selecting the most accurate candidates for further work. Driven by this demand, many structural bioinformatics laboratories have developed methods for estimating model accuracy (EMA). In recent years, EMA by machine learning (ML) have consistently ranked among the top-performing methods in the community-wide CASP challenge. Accordingly, we systematically review all the major ML-based EMA methods developed within the past ten years. The methods are grouped by their employed ML approach-support vector machine, artificial neural networks, ensemble learning, or Bayesian learning-and their significances are discussed from a methodology viewpoint. To orient the reader, we also briefly describe the background of EMA, including the CASP challenge and its evaluation metrics, and introduce the major ML/DL techniques. Overall, this review provides an introductory guide to modern research on protein quality assessment and directions for future research in this area.
Collapse
|
137
|
Postel Z, Touzet P. Cytonuclear Genetic Incompatibilities in Plant Speciation. PLANTS 2020; 9:plants9040487. [PMID: 32290056 PMCID: PMC7238192 DOI: 10.3390/plants9040487] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Revised: 04/03/2020] [Accepted: 04/07/2020] [Indexed: 12/13/2022]
Abstract
Due to the endosymbiotic origin of organelles, a pattern of coevolution and coadaptation between organellar and nuclear genomes is required for proper cell function. In this review, we focus on the impact of cytonuclear interaction on the reproductive isolation of plant species. We give examples of cases where species exhibit barriers to reproduction which involve plastid-nuclear or mito-nuclear genetic incompatibilities, and describe the evolutionary processes at play. We also discuss potential mechanisms of hybrid fitness recovery such as paternal leakage. Finally, we point out the possible interplay between plant mating systems and cytonuclear coevolution, and its consequence on plant speciation.
Collapse
|
138
|
Quadeer AA, McKay MR, Barton JP, Louie RHY. MPF-BML: a standalone GUI-based package for maximum entropy model inference. Bioinformatics 2020; 36:2278-2279. [PMID: 31851308 DOI: 10.1093/bioinformatics/btz925] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Revised: 11/03/2019] [Accepted: 12/16/2019] [Indexed: 11/12/2022] Open
Abstract
SUMMARY Learning underlying correlation patterns in data is a central problem across scientific fields. Maximum entropy models present an important class of statistical approaches for addressing this problem. However, accurately and efficiently inferring model parameters are a major challenge, particularly for modern high-dimensional applications such as in biology, for which the number of parameters is enormous. Previously, we developed a statistical method, minimum probability flow-Boltzmann Machine Learning (MPF-BML), for performing fast and accurate inference of maximum entropy model parameters, which was applied to genetic sequence data to estimate the fitness landscape for the surface proteins of human immunodeficiency virus and hepatitis C virus. To facilitate seamless use of MPF-BML and encourage more widespread application to data in diverse fields, we present a standalone cross-platform package of MPF-BML which features an easy-to-use graphical user interface. The package only requires the input data (protein sequence data or data of multiple configurations of a complex system with large number of variables) and returns the maximum entropy model parameters. AVAILABILITY AND IMPLEMENTATION The MPF-BML software is publicly available under the MIT License at https://github.com/ahmedaq/MPF-BML-GUI. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ahmed A Quadeer
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
| | - Matthew R McKay
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, China.,Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
| | - John P Barton
- Department of Physics and Astronomy, University of California, Riverside, CA 92521, USA
| | - Raymond H Y Louie
- The Kirby Institute, University of New South Wales, Sydney, NSW 2052, Australia.,School of Medical Sciences, University of New South Wales, Sydney, NSW 2052, Australia
| |
Collapse
|
139
|
Quadeer AA, Morales-Jimenez D, McKay MR. RocaSec: a standalone GUI-based package for robust co-evolutionary analysis of proteins. Bioinformatics 2020; 36:2262-2263. [PMID: 31800008 DOI: 10.1093/bioinformatics/btz890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Revised: 11/19/2019] [Accepted: 12/02/2019] [Indexed: 11/14/2022] Open
Abstract
SUMMARY Patterns of mutational correlations, learnt from protein sequences, have been shown to be informative of co-evolutionary sectors that are tightly linked to functional and/or structural properties of proteins. Previously, we developed a statistical inference method, robust co-evolutionary analysis (RoCA), to reliably predict co-evolutionary sectors of proteins, while controlling for statistical errors caused by limited data. RoCA was demonstrated on multiple viral proteins, with the inferred sectors showing close correspondences with experimentally-known biochemical domains. To facilitate seamless use of RoCA and promote more widespread application to protein data, here we present a standalone cross-platform package 'RocaSec' which features an easy-to-use GUI. The package only requires the multiple sequence alignment of a protein for inferring the co-evolutionary sectors. In addition, when information on the protein biochemical domains is provided, RocaSec returns the corresponding statistical association between the inferred sectors and biochemical domains. AVAILABILITY AND IMPLEMENTATION The RocaSec software is publicly available under the MIT License at https://github.com/ahmedaq/RocaSec. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ahmed A Quadeer
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
| | - David Morales-Jimenez
- Institute of Electronics, Communications and Information Technology, Queen's University Belfast, NI Science Park, Queens Road, Belfast BT3 9DT, UK
| | - Matthew R McKay
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
- Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
| |
Collapse
|
140
|
Bao Q, Hotz-Wagenblatt A, Betts MJ, Hipp M, Hugo A, Pougialis G, Lei-Rossmann J, Löchelt M. Shared and cell type-specific adaptation strategies of Gag and Env yield high titer bovine foamy virus variants. INFECTION GENETICS AND EVOLUTION 2020; 82:104287. [PMID: 32179148 DOI: 10.1016/j.meegid.2020.104287] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2019] [Revised: 03/05/2020] [Accepted: 03/11/2020] [Indexed: 12/27/2022]
Abstract
During in vitro selection and evolution screens to adapt the tightly cell-associated bovine foamy virus BFV to high titer cell-free transmission, common, cell-type specific and concurrent adaptive changes in Gag and Env, the major players of foamy virus particle assembly and release, were detected. Upon early establishment of cell type-independent pioneering mutations in Env and, subsequently in Gag, a diverse virus pool emerged that was characterized by the occurrence of shared and additional cell type-specific exchanges. At late passages and saturated titers, remarkably homogeneous virus populations characterized by functionally important mutations developed which may be partly due to stochastic evolutionary events that occurred earlier during adaptation. Reverse genetics showed that defined mutations were functionally important for high titer cell-free transmission.
Collapse
Affiliation(s)
- Qiuying Bao
- Division of Viral Transformation Mechanisms, Research Focus Infection, Inflammation and Cancer, German Cancer Research Center (Deutsches Krebsforschungszentrum, DKFZ), Heidelberg, Germany.
| | | | - Matthew J Betts
- CellNetworks, Bioquant, Im Neuenheimer Feld 267, 69120 Heidelberg, Germany.
| | - Michaela Hipp
- Division of Viral Transformation Mechanisms, Research Focus Infection, Inflammation and Cancer, German Cancer Research Center (Deutsches Krebsforschungszentrum, DKFZ), Heidelberg, Germany.
| | - Annette Hugo
- Division of Viral Transformation Mechanisms, Research Focus Infection, Inflammation and Cancer, German Cancer Research Center (Deutsches Krebsforschungszentrum, DKFZ), Heidelberg, Germany.
| | - Georgios Pougialis
- Division of Viral Transformation Mechanisms, Research Focus Infection, Inflammation and Cancer, German Cancer Research Center (Deutsches Krebsforschungszentrum, DKFZ), Heidelberg, Germany.
| | - Janet Lei-Rossmann
- Division of Viral Transformation Mechanisms, Research Focus Infection, Inflammation and Cancer, German Cancer Research Center (Deutsches Krebsforschungszentrum, DKFZ), Heidelberg, Germany.
| | - Martin Löchelt
- Division of Viral Transformation Mechanisms, Research Focus Infection, Inflammation and Cancer, German Cancer Research Center (Deutsches Krebsforschungszentrum, DKFZ), Heidelberg, Germany.
| |
Collapse
|
141
|
do Amaral MJ, Araujo TS, Díaz NC, Accornero F, Polycarpo CR, Cordeiro Y, Cabral KM, Almeida MS. Phase Separation and Disorder-to-Order Transition of Human Brain Expressed X-Linked 3 (hBEX3) in the Presence of Small Fragments of tRNA. J Mol Biol 2020; 432:2319-2348. [DOI: 10.1016/j.jmb.2020.02.030] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Revised: 02/10/2020] [Accepted: 02/27/2020] [Indexed: 12/19/2022]
|
142
|
Nerattini F, Figliuzzi M, Cardelli C, Tubiana L, Bianco V, Dellago C, Coluzza I. Identification of Protein Functional Regions. Chemphyschem 2020; 21:335-347. [PMID: 31944517 DOI: 10.1002/cphc.201900898] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Revised: 11/01/2019] [Indexed: 11/12/2022]
Abstract
Protein sequence stores the information relative to both functionality and stability, thus making it difficult to disentangle the two contributions. However, the identification of critical residues for function and stability has important implications for the mapping of the proteome interactions, as well as for many pharmaceutical applications, e. g. the identification of ligand binding regions for targeted pharmaceutical protein design. In this work, we propose a computational method to identify critical residues for protein functionality and stability and to further categorise them in strictly functional, structural and intermediate. We evaluate single site conservation and use Direct Coupling Analysis (DCA) to identify co-evolved residues both in natural and artificial evolution processes. We reproduce artificial evolution using protein design and base our approach on the hypothesis that artificial evolution in the absence of any functional constraint would exclusively lead to site conservation and co-evolution events of the structural type. Conversely, natural evolution intrinsically embeds both functional and structural information. By comparing the lists of conserved and co-evolved residues, outcomes of the analysis on natural and artificial evolution, we identify the functional residues without the need of any a priori knowledge of the biological role of the analysed protein.
Collapse
Affiliation(s)
- Francesca Nerattini
- Faculty of Physics, University of Vienna, Boltzmanngasse 5, 1090, Vienna, Austria
| | - Matteo Figliuzzi
- Sorbonne Universites, UPMC, Institut de Biologie Paris-Seine, CNRS, Laboratoire de Biologie Computationnelle et Quantitative UMR, 7238, Paris, France
| | - Chiara Cardelli
- Faculty of Physics, University of Vienna, Boltzmanngasse 5, 1090, Vienna, Austria
| | - Luca Tubiana
- Physics Department, Universitá degli studi di Trento, via Sommarive 14, 38123, Trento, IT
| | - Valentino Bianco
- Faculty of Physics, University of Vienna, Boltzmanngasse 5, 1090, Vienna, Austria.,Faculty of Chemistry, Chemical Physics Department, Universidad Complutense de Madrid, Plaza de las Ciencias, Ciudad Universitaria, Madrid, 28040, Spain
| | - Christoph Dellago
- Faculty of Physics, University of Vienna, Boltzmanngasse 5, 1090, Vienna, Austria
| | - Ivan Coluzza
- CIC biomaGUNE, Paseo Miramon 182, 20014 San Sebastian, Spain, and IKERBASQUE, Basque Foundation for Science, 48013, Bilbao, Spain
| |
Collapse
|
143
|
D'Amico F, Nadalin F, Libra M. S100A7/Ran-binding protein 9 coevolution in mammals. Immunogenetics 2020; 72:155-164. [PMID: 32043173 DOI: 10.1007/s00251-020-01155-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Accepted: 01/13/2020] [Indexed: 10/25/2022]
Abstract
S100A7 has been suggested to interact with Ran-binding protein 9. Both proteins are nowadays considered key effectors in immune response. Functional interaction between proteins is ensured by coevolution. The mechanisms of vertebrate coevolution between S100A7 and RanBP9 remain unclear. Several approaches for studying coevolution have been developed. Protein coevolution was inferred by calculating the linear correlation coefficients between inter-protein distance matrices using Mirrortree. We found an overall moderate correlation value (R = 0.53, p < 1e-06). Moreover, owing to the high conservation of RanBP9 protein among vertebrates, we chose to utilize a recent version of Blocks in Sequences (BIS2) algorithm implemented in BIS2Analyzer webserver. A coevolution cluster was identified between the two proteins (p < 8.10e-05). In conclusion, our coevolutionary analysis suggests that amino acid variations may modulate S100A7/RanBP9 interaction with potential pathogenic effects. Such findings could guide further analysis to better elucidate the function of S100A7 and RanBP9 and to design drugs targeting for these molecules in diseases.
Collapse
Affiliation(s)
- Fabio D'Amico
- Department of Biomedical and Biotechnological Sciences, University of Catania, Catania, Italy.
| | - Francesca Nadalin
- Laboratoire de Biologie Computationnelle et Quantitative (LCQB) - UMR 7238, Sorbonne Université, Univ P6, CNRS, IBPS, Paris, France
| | - Massimo Libra
- Department of Biomedical and Biotechnological Sciences, University of Catania, Catania, Italy
| |
Collapse
|
144
|
Sauer MF, Sevy AM, Crowe JE, Meiler J. Multi-state design of flexible proteins predicts sequences optimal for conformational change. PLoS Comput Biol 2020; 16:e1007339. [PMID: 32032348 PMCID: PMC7032724 DOI: 10.1371/journal.pcbi.1007339] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Revised: 02/20/2020] [Accepted: 12/23/2019] [Indexed: 12/11/2022] Open
Abstract
Computational protein design of an ensemble of conformations for one protein–i.e., multi-state design–determines the side chain identity by optimizing the energetic contributions of that side chain in each of the backbone conformations. Sampling the resulting large sequence-structure search space limits the number of conformations and the size of proteins in multi-state design algorithms. Here, we demonstrated that the REstrained CONvergence (RECON) algorithm can simultaneously evaluate the sequence of large proteins that undergo substantial conformational changes. Simultaneous optimization of side chain conformations across all conformations increased sequence conservation when compared to single-state designs in all cases. More importantly, the sequence space sampled by RECON MSD resembled the evolutionary sequence space of flexible proteins, particularly when confined to predicting the mutational preferences of limited common ancestral descent, such as in the case of influenza type A hemagglutinin. Additionally, we found that sequence positions which require substantial changes in their local environment across an ensemble of conformations are more likely to be conserved. These increased conservation rates are better captured by RECON MSD over multiple conformations and thus multiple local residue environments during design. To quantify this rewiring of contacts at a certain position in sequence and structure, we introduced a new metric designated ‘contact proximity deviation’ that enumerates contact map changes. This measure allows mapping of global conformational changes into local side chain proximity adjustments, a property not captured by traditional global similarity metrics such as RMSD or local similarity metrics such as changes in φ and ψ angles. Multi-state design can be used to engineer proteins that need to exist in multiple conformations or that bind to multiple partner molecules. In essence, multi-state design selects a compromise of protein sequences that allow for an ensemble of protein conformations, or states, associated with a particular biological function. In this paper, we used the REstrained CONvergence (RECON) algorithm with Rosetta to show that multi-state design of flexible proteins predicts sequences optimal for conformational change, mimicking mutation preferences sampled in evolution. Modeling optimal local side chain physicochemical environments within an ensemble selected significantly more native-like sequences than selections performed when all conformations states are designed independently. This outcome was particularly true for amino acids whose local side chain environment change between conformations. To quantify such contact map changes, we introduced a novel metric to show that sequence conservation is dependent on protein flexibility, i.e., changes in local side chain environments between stated limit the space of tolerated mutations. Additionally, such positions in sequence and structure are more likely to be energetically frustrated, at least in some states. Importantly, we showed that multi-state design over an ensemble of conformations (space) can explore evolutionary tolerated sequence space (time), thus enabling RECON to not only design proteins that require multiple states for function but also predict mutations that might be tolerated in native proteins but have not yet been explored by evolution. The latter aspect can be important to anticipate escape mutations, for example in pathogens or oncoproteins.
Collapse
Affiliation(s)
- Marion F Sauer
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America.,Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Alexander M Sevy
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America.,Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - James E Crowe
- Vanderbilt Vaccine Center, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America.,Department of Pediatrics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America.,Department of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Jens Meiler
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America.,Department of Chemistry, Vanderbilt University, Nashville, Tennessee, United States of America
| |
Collapse
|
145
|
Sheik Amamuddy O, Veldman W, Manyumwa C, Khairallah A, Agajanian S, Oluyemi O, Verkhivker GM, Tastan Bishop Ö. Integrated Computational Approaches and Tools forAllosteric Drug Discovery. Int J Mol Sci 2020; 21:E847. [PMID: 32013012 PMCID: PMC7036869 DOI: 10.3390/ijms21030847] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2019] [Revised: 01/20/2020] [Accepted: 01/21/2020] [Indexed: 12/16/2022] Open
Abstract
Understanding molecular mechanisms underlying the complexity of allosteric regulationin proteins has attracted considerable attention in drug discovery due to the benefits and versatilityof allosteric modulators in providing desirable selectivity against protein targets while minimizingtoxicity and other side effects. The proliferation of novel computational approaches for predictingligand-protein interactions and binding using dynamic and network-centric perspectives has ledto new insights into allosteric mechanisms and facilitated computer-based discovery of allostericdrugs. Although no absolute method of experimental and in silico allosteric drug/site discoveryexists, current methods are still being improved. As such, the critical analysis and integration ofestablished approaches into robust, reproducible, and customizable computational pipelines withexperimental feedback could make allosteric drug discovery more efficient and reliable. In this article,we review computational approaches for allosteric drug discovery and discuss how these tools can beutilized to develop consensus workflows for in silico identification of allosteric sites and modulatorswith some applications to pathogen resistance and precision medicine. The emerging realization thatallosteric modulators can exploit distinct regulatory mechanisms and can provide access to targetedmodulation of protein activities could open opportunities for probing biological processes and insilico design of drug combinations with improved therapeutic indices and a broad range of activities.
Collapse
Affiliation(s)
- Olivier Sheik Amamuddy
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Grahamstown 6140, South Africa; (O.S.A.); (W.V.); (C.M.); (A.K.)
| | - Wayde Veldman
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Grahamstown 6140, South Africa; (O.S.A.); (W.V.); (C.M.); (A.K.)
| | - Colleen Manyumwa
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Grahamstown 6140, South Africa; (O.S.A.); (W.V.); (C.M.); (A.K.)
| | - Afrah Khairallah
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Grahamstown 6140, South Africa; (O.S.A.); (W.V.); (C.M.); (A.K.)
| | - Steve Agajanian
- Graduate Program in Computational and Data Sciences, Keck Center for Science and Engineering, Schmid College of Science and Technology, Chapman University, One University Drive, Orange, CA 92866, USA; (S.A.); (O.O.)
| | - Odeyemi Oluyemi
- Graduate Program in Computational and Data Sciences, Keck Center for Science and Engineering, Schmid College of Science and Technology, Chapman University, One University Drive, Orange, CA 92866, USA; (S.A.); (O.O.)
| | - Gennady M. Verkhivker
- Graduate Program in Computational and Data Sciences, Keck Center for Science and Engineering, Schmid College of Science and Technology, Chapman University, One University Drive, Orange, CA 92866, USA; (S.A.); (O.O.)
- Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA 92618, USA
| | - Özlem Tastan Bishop
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Grahamstown 6140, South Africa; (O.S.A.); (W.V.); (C.M.); (A.K.)
| |
Collapse
|
146
|
Shang Y, Huang S. Engineering Plant Cytochrome P450s for Enhanced Synthesis of Natural Products: Past Achievements and Future Perspectives. PLANT COMMUNICATIONS 2020; 1:100012. [PMID: 33404545 PMCID: PMC7747987 DOI: 10.1016/j.xplc.2019.100012] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Cytochrome P450s (P450s) are the most versatile catalysts and are widely used by plants to synthesize a vast array of structurally diverse specialized metabolites that not only play essential ecological roles but also constitute a valuable resource for the development of new drugs. To accelerate the metabolic engineering of these high-value metabolites, it is imperative to identify and characterize pathway P450s, and to further improve their activities through protein engineering. In this review, we focus on P450 engineering and summarize the major strategies for enhancing the stability and activity of P450s and successful cases of P450 engineering. Studies in which the functions of P450s were altered to create de novo metabolic pathways or novel compounds are discussed as well. We also overview emerging tools, specifically DNA synthesis, machine learning, and de novo protein design, as well as the evolutionary patterns of P450s unveiled from a massive number of DNA sequences that could be integrated to accelerate the engineering of these enzymes. These approaches would greatly aid in the exploitation of plant-specialized metabolites or derivatives for various uses including medical applications.
Collapse
Affiliation(s)
- Yi Shang
- Key Laboratory for Potato Biology of Yunnan Province, The CAAS-YNNU-YINMORE Joint Academy of Potato Science, Yunnan Normal University, Kunming, China
| | - Sanwen Huang
- Lingnan Guangdong Laboratory of Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| |
Collapse
|
147
|
Wu Z, Liu H, Xu L, Chen HF, Feng Y. Algorithm-based coevolution network identification reveals key functional residues of the α/β hydrolase subfamilies. FASEB J 2020; 34:1983-1995. [PMID: 31907985 DOI: 10.1096/fj.201900948rr] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 10/02/2019] [Accepted: 10/21/2019] [Indexed: 11/11/2022]
Abstract
Covariant residues identified by computational algorithms have provided new insights into enzyme evolutionary routes. However, the reliability and accuracy of routine statistical coupling analysis (SCA) are unable to satisfy the needs of protein engineering because SCA depends only on sequence information. Here, we set up a new SCA algorithm, SCA.SIM, by integrating structure information and MD simulation data. The more reliable covariant residues with high-quality scores are obtained from sequence alignment weighted by residual movement for eight related subfamilies, belonging to α/β hydrolase family, with Candida antarctica lipase B (CALB). The 38 predicted covariant residues are tested for function by high-throughput quantitative evaluation in combination with activity and thermostability assays of a mutant library and deep sequencing. Based on the landscapes of both activity and thermostability, most mutants play key roles in catalysis, and some mutants gain 2.4- to 6-fold increase in half-life at 50°C and 9- to 12-fold improvement in catalytic efficiency. The activity of double mutants for A225F/T103A is higher than those of A225F and T103A which means that SCA.SIM method might be useful for identifying the allosteric coupling. The SCA.SIM algorithm can be used for protein coevolution and enzyme engineering research.
Collapse
Affiliation(s)
- Zhiyun Wu
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Hao Liu
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Lishi Xu
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Hai-Feng Chen
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.,Shanghai Center for Bioinformation Technology, Shanghai, China
| | - Yan Feng
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, National Experimental Teaching Center for Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| |
Collapse
|
148
|
Rizzato F, Coucke A, de Leonardis E, Barton JP, Tubiana J, Monasson R, Cocco S. Inference of compressed Potts graphical models. Phys Rev E 2020; 101:012309. [PMID: 32069678 DOI: 10.1103/physreve.101.012309] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Indexed: 06/10/2023]
Abstract
We consider the problem of inferring a graphical Potts model on a population of variables. This inverse Potts problem generally involves the inference of a large number of parameters, often larger than the number of available data, and, hence, requires the introduction of regularization. We study here a double regularization scheme, in which the number of Potts states (colors) available to each variable is reduced and interaction networks are made sparse. To achieve the color compression, only Potts states with large empirical frequency (exceeding some threshold) are explicitly modeled on each site, while the others are grouped into a single state. We benchmark the performances of this mixed regularization approach, with two inference algorithms, adaptive cluster expansion (ACE) and pseudolikelihood maximization (PLM), on synthetic data obtained by sampling disordered Potts models on Erdős-Rényi random graphs. We show in particular that color compression does not affect the quality of reconstruction of the parameters corresponding to high-frequency symbols, while drastically reducing the number of the other parameters and thus the computational time. Our procedure is also applied to multisequence alignments of protein families, with similar results.
Collapse
Affiliation(s)
- Francesca Rizzato
- Laboratoire de Physique de l'Ecole normale supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France
| | - Alice Coucke
- Laboratoire de Physique de l'Ecole normale supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France
| | - Eleonora de Leonardis
- Laboratoire de Physique de l'Ecole normale supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France
| | - John P Barton
- Department of Physics and Astronomy, University of California, Riverside, 900 University Avenue, Riverside, California 92521, USA
| | - Jérôme Tubiana
- Laboratoire de Physique de l'Ecole normale supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France
| | - Rémi Monasson
- Laboratoire de Physique de l'Ecole normale supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France
| | - Simona Cocco
- Laboratoire de Physique de l'Ecole normale supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France
| |
Collapse
|
149
|
Teppa E, Nadalin F, Combet C, Zea DJ, David L, Carbone A. Coevolution analysis of amino-acids reveals diversified drug-resistance solutions in viral sequences: a case study of hepatitis B virus. Virus Evol 2020; 6:veaa006. [PMID: 32158552 PMCID: PMC7050494 DOI: 10.1093/ve/veaa006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
The study of mutational landscapes of viral proteins is fundamental for the understanding of the mechanisms of cross-resistance to drugs and the design of effective therapeutic strategies based on several drugs. Antiviral therapy with nucleos(t)ide analogues targeting the hepatitis B virus (HBV) polymerase protein (Pol) can inhibit disease progression by suppression of HBV replication and makes it an important case study. In HBV, treatment may fail due to the emergence of drug-resistant mutants. Primary and compensatory mutations have been associated with lamivudine resistance, whereas more complex mutational patterns are responsible for resistance to other HBV antiviral drugs. So far, all known drug-resistance mutations are located in one of the four Pol domains, called reverse transcriptase. We demonstrate that sequence covariation identifies drug-resistance mutations in viral sequences. A new algorithmic strategy, BIS2TreeAnalyzer, is designed to apply the coevolution analysis method BIS2, successfully used in the past on small sets of conserved sequences, to large sets of evolutionary related sequences. When applied to HBV, BIS2TreeAnalyzer highlights diversified viral solutions by discovering thirty-seven positions coevolving with residues known to be associated with drug resistance and located on the four Pol domains. These results suggest a sequential mechanism of emergence for some mutational patterns. They reveal complex combinations of positions involved in HBV drug resistance and contribute with new information to the landscape of HBV evolutionary solutions. The computational approach is general and can be applied to other viral sequences when compensatory mutations are presumed.
Collapse
Affiliation(s)
- Elin Teppa
- Sorbonne Université, Univ P6, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB) - UMR 7238, 4 Place Jussieu, 75005 Paris, France
- Sorbonne Université, Institut des Sciences du Calcul et des Données (ISCD), 4 Place Jussieu, 75005 Paris, France
| | - Francesca Nadalin
- Sorbonne Université, Univ P6, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB) - UMR 7238, 4 Place Jussieu, 75005 Paris, France
- Institute Curie, PSL Research University, INSERM U932, Immunity and Cancer Department, 26 rue d’Ulm, 75248 Paris, France
| | - Christophe Combet
- Univ Lyon, Université Claude Bernard Lyon 1, INSERM 1052, CNRS 5286, Centre Léon Bérard, Centre de recherche en cancérologie de Lyon, 151 Cours Albert Thomas, 69424 Lyon, France
| | - Diego Javier Zea
- Sorbonne Université, Univ P6, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB) - UMR 7238, 4 Place Jussieu, 75005 Paris, France
| | - Laurent David
- Sorbonne Université, Univ P6, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB) - UMR 7238, 4 Place Jussieu, 75005 Paris, France
| | - Alessandra Carbone
- Sorbonne Université, Univ P6, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB) - UMR 7238, 4 Place Jussieu, 75005 Paris, France
- Institut Universitaire de France, 1 rue Descartes, 75231 Paris, France
| |
Collapse
|
150
|
Rizzato F, Zamuner S, Pagnani A, Laio A. A common root for coevolution and substitution rate variability in protein sequence evolution. Sci Rep 2019; 9:18032. [PMID: 31792239 PMCID: PMC6888882 DOI: 10.1038/s41598-019-53958-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Accepted: 10/25/2019] [Indexed: 11/09/2022] Open
Abstract
We introduce a simple model that describes the average occurrence of point variations in a generic protein sequence. This model is based on the idea that mutations are more likely to be fixed at sites in contact with others that have mutated in the recent past. Therefore, we extend the usual assumptions made in protein coevolution by introducing a time dumping on the effect of a substitution on its surrounding and makes correlated substitutions happen in avalanches localized in space and time. The model correctly predicts the average correlation of substitutions as a function of their distance along the sequence. At the same time, it predicts an among-site distribution of the number of substitutions per site highly compatible with a negative binomial, consistently with experimental data. The promising outcomes achieved with this model encourage the application of the same ideas in the field of pairwise and multiple sequence alignment.
Collapse
Affiliation(s)
- Francesca Rizzato
- Scuola Internazionale Superiore di Studi Avanzati (SISSA), Trieste, Italy
| | - Stefano Zamuner
- Scuola Internazionale Superiore di Studi Avanzati (SISSA), Trieste, Italy
| | - Andrea Pagnani
- DISAT, Politecnico di Torino, Torino, Italy.,Italian Institute for Genomic Medicine (IIGM), Torino, Italy.,Istituto Nazionale di Fisica Nucleare (INFN) Sezione di Torino, Torino, Italy
| | - Alessandro Laio
- Scuola Internazionale Superiore di Studi Avanzati (SISSA), Trieste, Italy. .,The Abdus Salam International Centre for Theoretical Physics (ICTP), Trieste, Italy.
| |
Collapse
|