1
|
Lupo U, Sgarbossa D, Bitbol AF. Pairing interacting protein sequences using masked language modeling. Proc Natl Acad Sci U S A 2024; 121:e2311887121. [PMID: 38913900 PMCID: PMC11228504 DOI: 10.1073/pnas.2311887121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Accepted: 12/18/2023] [Indexed: 06/26/2024] Open
Abstract
Predicting which proteins interact together from amino acid sequences is an important task. We develop a method to pair interacting protein sequences which leverages the power of protein language models trained on multiple sequence alignments (MSAs), such as MSA Transformer and the EvoFormer module of AlphaFold. We formulate the problem of pairing interacting partners among the paralogs of two protein families in a differentiable way. We introduce a method called Differentiable Pairing using Alignment-based Language Models (DiffPALM) that solves it by exploiting the ability of MSA Transformer to fill in masked amino acids in multiple sequence alignments using the surrounding context. MSA Transformer encodes coevolution between functionally or structurally coupled amino acids within protein chains. It also captures inter-chain coevolution, despite being trained on single-chain data. Relying on MSA Transformer without fine-tuning, DiffPALM outperforms existing coevolution-based pairing methods on difficult benchmarks of shallow multiple sequence alignments extracted from ubiquitous prokaryotic protein datasets. It also outperforms an alternative method based on a state-of-the-art protein language model trained on single sequences. Paired alignments of interacting protein sequences are a crucial ingredient of supervised deep learning methods to predict the three-dimensional structure of protein complexes. Starting from sequences paired by DiffPALM substantially improves the structure prediction of some eukaryotic protein complexes by AlphaFold-Multimer. It also achieves competitive performance with using orthology-based pairing.
Collapse
Affiliation(s)
- Umberto Lupo
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| | - Damiano Sgarbossa
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| |
Collapse
|
2
|
Guo L, Wang J. GSScore: a novel Graphormer-based shell-like scoring method for protein-ligand docking. Brief Bioinform 2024; 25:bbae201. [PMID: 38706316 PMCID: PMC11070652 DOI: 10.1093/bib/bbae201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Revised: 02/05/2024] [Accepted: 04/16/2024] [Indexed: 05/07/2024] Open
Abstract
Protein-ligand interactions (PLIs) are essential for cellular activities and drug discovery. But due to the complexity and high cost of experimental methods, there is a great demand for computational approaches to recognize PLI patterns, such as protein-ligand docking. In recent years, more and more models based on machine learning have been developed to directly predict the root mean square deviation (RMSD) of a ligand docking pose with reference to its native binding pose. However, new scoring methods are pressingly needed in methodology for more accurate RMSD prediction. We present a new deep learning-based scoring method for RMSD prediction of protein-ligand docking poses based on a Graphormer method and Shell-like graph architecture, named GSScore. To recognize near-native conformations from a set of poses, GSScore takes atoms as nodes and then establishes the docking interface of protein-ligand into multiple bipartite graphs within different shell ranges. Benefiting from the Graphormer and Shell-like graph architecture, GSScore can effectively capture the subtle differences between energetically favorable near-native conformations and unfavorable non-native poses without extra information. GSScore was extensively evaluated on diverse test sets including a subset of PDBBind version 2019, CASF2016 as well as DUD-E, and obtained significant improvements over existing methods in terms of RMSE, $R$ (Pearson correlation coefficient), Spearman correlation coefficient and Docking power.
Collapse
Affiliation(s)
- Linyuan Guo
- School of Computer Science and Engineering, Central South University, Rd. Lu Shan Nan, 410083, Changsha, P.R. China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Rd. Lu Shan Nan, 410083, Changsha, P.R. China
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Rd. Lu Shan Nan, 410083, Changsha, P.R. China
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Rd. Lu Shan Nan, 410083, Changsha, P.R. China
| |
Collapse
|
3
|
Fang T, Szklarczyk D, Hachilif R, von Mering C. Enhancing coevolutionary signals in protein-protein interaction prediction through clade-wise alignment integration. Sci Rep 2024; 14:6009. [PMID: 38472223 PMCID: PMC10933411 DOI: 10.1038/s41598-024-55655-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 02/26/2024] [Indexed: 03/14/2024] Open
Abstract
Protein-protein interactions (PPIs) play essential roles in most biological processes. The binding interfaces between interacting proteins impose evolutionary constraints that have successfully been employed to predict PPIs from multiple sequence alignments (MSAs). To construct MSAs, critical choices have to be made: how to ensure the reliable identification of orthologs, and how to optimally balance the need for large alignments versus sufficient alignment quality. Here, we propose a divide-and-conquer strategy for MSA generation: instead of building a single, large alignment for each protein, multiple distinct alignments are constructed under distinct clades in the tree of life. Coevolutionary signals are searched separately within these clades, and are only subsequently integrated using machine learning techniques. We find that this strategy markedly improves overall prediction performance, concomitant with better alignment quality. Using the popular DCA algorithm to systematically search pairs of such alignments, a genome-wide all-against-all interaction scan in a bacterial genome is demonstrated. Given the recent successes of AlphaFold in predicting direct PPIs at atomic detail, a discover-and-refine approach is proposed: our method could provide a fast and accurate strategy for pre-screening the entire genome, submitting to AlphaFold only promising interaction candidates-thus reducing false positives as well as computation time.
Collapse
Affiliation(s)
- Tao Fang
- Department of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Damian Szklarczyk
- Department of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Radja Hachilif
- Department of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Christian von Mering
- Department of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland.
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland.
| |
Collapse
|
4
|
Fam BSDO, Vargas-Pinilla P, Paré P, Landau L, Viscardi LH, Pissinatti A, Falótico T, Maestri R, Bortolini MC. Exploring the diversity of AVPR2 in Primates and its evolutionary implications. Genet Mol Biol 2023; 46:e20230045. [PMID: 37930141 PMCID: PMC10626583 DOI: 10.1590/1678-4685-gmb-2023-0045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 09/09/2023] [Indexed: 11/07/2023] Open
Abstract
The current study focuses on the investigation of AVPR2 (VTR2C) protein-coupled receptor variants specific to different primate taxa. AVPR2 is activated by the neurohormone AVP, which modulates physiological processes, including water homeostasis. Our findings reveal positive selection at three AVPR2 sites at positions 190, 250, and 346. Variation at position 250 is associated with human Congenital Nephrogenic Diabetes Insipidus (cNDI), a condition characterized by excessive water loss. Other 13 functional sites with potential adaptive relevance include positions 185, 202, 204, and 252 associated with cNDI. We identified SH3-binding motifs in AVPR2's ICL3 and N-terminus domains, with some losses observed in clades of Cercopithecidae, Callitrichinae, and Atelidae. SH3-binding motifs are crucial in regulating cellular physiology, indicating that the differences may be adaptive. Co-evolution was found between AVPR2 residues and those in the AVP signal peptide/Neurophysin-2 and AQP2, other molecules in the same signaling cascade. No significant correlation was found between these Primates' taxon-specific variants and the bioclimatic variables of the areas where they live. Distinct co-evolving amino acid sequences in functional sites were found in Platyrrhini and Catarrhini, which may have adaptive implications involving glucocorticoid hormones, suggesting varied selective pressures. Further studies are required to confirm these results.
Collapse
Affiliation(s)
- Bibiana Sampaio de Oliveira Fam
- Universidade Federal do Rio Grande do Sul, Departamento de Genética,
Laboratório de Evolução Humana e Molecular, Porto Alegre, RS, Brazil
| | - Pedro Vargas-Pinilla
- Universidade Federal do Rio Grande do Sul, Departamento de Genética,
Laboratório de Evolução Humana e Molecular, Porto Alegre, RS, Brazil
- Universidade de São Paulo, Faculdade de Medicina, Departamento de
Bioquímica e Imunologia, Ribeirão Preto, SP, Brazil
| | - Pâmela Paré
- Universidade Federal do Rio Grande do Sul, Departamento de Genética,
Laboratório de Evolução Humana e Molecular, Porto Alegre, RS, Brazil
| | - Luane Landau
- Universidade Federal do Rio Grande do Sul, Departamento de Genética,
Laboratório de Evolução Humana e Molecular, Porto Alegre, RS, Brazil
| | - Lucas H. Viscardi
- Universidade Federal do Rio Grande do Sul, Departamento de Genética,
Laboratório de Evolução Humana e Molecular, Porto Alegre, RS, Brazil
| | | | - Tiago Falótico
- Universidade de São Paulo, Escola de Artes, Ciências e Humanidades,
São Paulo, SP, Brazil
| | - Renan Maestri
- Universidade Federal do Rio Grande do Sul, Departamento de Ecologia,
Laboratório de Ecomorfologia e Macroevolução, Porto Alegre, RS, Brazil
| | - Maria Cátira Bortolini
- Universidade Federal do Rio Grande do Sul, Departamento de Genética,
Laboratório de Evolução Humana e Molecular, Porto Alegre, RS, Brazil
| |
Collapse
|
5
|
Liu Z, Zhu YH, Shen LC, Xiao X, Qiu WR, Yu DJ. Integrating unsupervised language model with multi-view multiple sequence alignments for high-accuracy inter-chain contact prediction. Comput Biol Med 2023; 166:107529. [PMID: 37748220 DOI: 10.1016/j.compbiomed.2023.107529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 08/30/2023] [Accepted: 09/19/2023] [Indexed: 09/27/2023]
Abstract
Accurate identification of inter-chain contacts in the protein complex is critical to determine the corresponding 3D structures and understand the biological functions. We proposed a new deep learning method, ICCPred, to deduce the inter-chain contacts from the amino acid sequences of the protein complex. This pipeline was built on the designed deep residual network architecture, integrating the pre-trained language model with three multiple sequence alignments (MSAs) from different biological views. Experimental results on 709 non-redundant benchmarking protein complexes showed that the proposed ICCPred significantly increased inter-chain contact prediction accuracy compared to the state-of-the-art approaches. Detailed data analyses showed that the significant advantage of ICCPred lies in the utilization of pre-trained transformer language models which can effectively extract the complementary co-evolution diversity from three MSAs. Meanwhile, the designed deep residual network enhances the correlation between the co-evolution diversity and the patterns of inter-chain contacts. These results demonstrated a new avenue for high-accuracy deep-learning inter-chain contact prediction that is applicable to large-scale protein-protein interaction annotations from sequence alone.
Collapse
Affiliation(s)
- Zi Liu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing, 210094, China; Computer Department, Jingdezhen Ceramic University, Jingdezhen, 333403 , China
| | - Yi-Heng Zhu
- College of Artificial Intelligence, Nanjing Agricultural University, Nanjing, 210095 , China
| | - Long-Chen Shen
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing, 210094, China
| | - Xuan Xiao
- Computer Department, Jingdezhen Ceramic University, Jingdezhen, 333403 , China
| | - Wang-Ren Qiu
- Computer Department, Jingdezhen Ceramic University, Jingdezhen, 333403 , China.
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing, 210094, China.
| |
Collapse
|
6
|
van Keulen SC, Bonvin AMJJ. Improving the quality of co-evolution intermolecular contact prediction with DisVis. Proteins 2023; 91:1407-1416. [PMID: 37237441 DOI: 10.1002/prot.26514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 03/29/2023] [Accepted: 04/19/2023] [Indexed: 05/28/2023]
Abstract
The steep rise in protein sequences and structures has paved the way for bioinformatics approaches to predict residue-residue interactions in protein complexes. Multiple sequence alignments are commonly used in contact predictions to identify co-evolving residues. These contacts, however, often include false positives (FPs), which may impair their use to predict three dimensional structures of biomolecular complexes and affect the accuracy of the generated models. Previously, we have developed DisVis to identify FP in mass spectrometry cross-linking data. DisVis allows to assess the accessible interaction space between two proteins consistent with a set of distance restraints. Here, we investigate if a similar approach could be applied to co-evolution predicted contacts in order to improve their precision prior to using them for modeling. We analyze co-evolution contact predictions with DisVis for a set of 26 protein-protein complexes. The DisVis-reranked and the original co-evolution contacts are then used to model the complexes with our integrative docking software HADDOCK using different filtering scenarios. Our results show that HADDOCK is robust with respect to the precision of the predicted contacts due to the 50% random contact removal during docking and can enhance the quality of docking predictions when combined with DisVis filtering for low precision contact data. DisVis can thus have a beneficial effect on low quality data, but overall HADDOCK can accommodate FP restraints without negatively impacting the quality of the resulting models. Other more precision-sensitive docking protocols might, however, benefit from the increased precision of the predicted contacts after DisVis filtering.
Collapse
Affiliation(s)
- Siri C van Keulen
- Bijvoet Centre for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Utrecht, the Netherlands
| | - Alexandre M J J Bonvin
- Bijvoet Centre for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Utrecht, the Netherlands
| |
Collapse
|
7
|
Gandarilla-Pérez CA, Pinilla S, Bitbol AF, Weigt M. Combining phylogeny and coevolution improves the inference of interaction partners among paralogous proteins. PLoS Comput Biol 2023; 19:e1011010. [PMID: 36996234 PMCID: PMC10089317 DOI: 10.1371/journal.pcbi.1011010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 04/11/2023] [Accepted: 03/08/2023] [Indexed: 04/01/2023] Open
Abstract
Predicting protein-protein interactions from sequences is an important goal of computational biology. Various sources of information can be used to this end. Starting from the sequences of two interacting protein families, one can use phylogeny or residue coevolution to infer which paralogs are specific interaction partners within each species. We show that these two signals can be combined to improve the performance of the inference of interaction partners among paralogs. For this, we first align the sequence-similarity graphs of the two families through simulated annealing, yielding a robust partial pairing. We next use this partial pairing to seed a coevolution-based iterative pairing algorithm. This combined method improves performance over either separate method. The improvement obtained is striking in the difficult cases where the average number of paralogs per species is large or where the total number of sequences is modest.
Collapse
Affiliation(s)
- Carlos A Gandarilla-Pérez
- Facultad de Física, Universidad de la Habana, San Lázaro y L, Vedado, Habana, Cuba
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative (LCQB, UMR 7238), Paris, France
| | - Sergio Pinilla
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative (LCQB, UMR 7238), Paris, France
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire Jean Perrin (UMR 8237), Paris, France
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative (LCQB, UMR 7238), Paris, France
| |
Collapse
|
8
|
Karamanos TK. Chasing long-range evolutionary couplings in the AlphaFold era. Biopolymers 2023; 114:e23530. [PMID: 36752285 PMCID: PMC10909459 DOI: 10.1002/bip.23530] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Revised: 01/26/2023] [Accepted: 01/27/2023] [Indexed: 02/09/2023]
Abstract
Coevolution between protein residues is normally interpreted as direct contact. However, the evolutionary record of a protein sequence contains rich information that may include long-range functional couplings, couplings that report on homo-oligomeric states or even conformational changes. Due to the complexity of the sequence space and the lack of structural information on various members of a protein family, it has been difficult to effectively mine the additional information encoded in a multiple sequence alignment (MSA). Here, taking advantage of the recent release of the AlphaFold (AF) database we attempt to identify coevolutionary couplings that cannot be explained simply by spatial proximity. We propose a simple computational method that performs direct coupling analysis on a MSA and searches for couplings that are not satisfied in any of the AF models of members of the identified protein family. Application of this method on 2012 protein families suggests that ~12% of the total identified coevolving residue pairs are spatially distant and more likely to be disordered than their contacting counterparts. We expect that this analysis will help improve the quality of coevolutionary distance restraints used for structure determination and will be useful in identifying potentially functional/allosteric cross-talk between distant residues.
Collapse
|
9
|
Lin P, Yan Y, Huang SY. DeepHomo2.0: improved protein-protein contact prediction of homodimers by transformer-enhanced deep learning. Brief Bioinform 2023; 24:6849483. [PMID: 36440949 DOI: 10.1093/bib/bbac499] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 10/08/2022] [Accepted: 10/21/2022] [Indexed: 11/30/2022] Open
Abstract
Protein-protein interactions play an important role in many biological processes. However, although structure prediction for monomer proteins has achieved great progress with the advent of advanced deep learning algorithms like AlphaFold, the structure prediction for protein-protein complexes remains an open question. Taking advantage of the Transformer model of ESM-MSA, we have developed a deep learning-based model, named DeepHomo2.0, to predict protein-protein interactions of homodimeric complexes by leveraging the direct-coupling analysis (DCA) and Transformer features of sequences and the structure features of monomers. DeepHomo2.0 was extensively evaluated on diverse test sets and compared with eight state-of-the-art methods including protein language model-based, DCA-based and machine learning-based methods. It was shown that DeepHomo2.0 achieved a high precision of >70% with experimental monomer structures and >60% with predicted monomer structures for the top 10 predicted contacts on the test sets and outperformed the other eight methods. Moreover, even the version without using structure information, named DeepHomoSeq, still achieved a good precision of >55% for the top 10 predicted contacts. Integrating the predicted contacts into protein docking significantly improved the structure prediction of realistic Critical Assessment of Protein Structure Prediction homodimeric complexes. DeepHomo2.0 and DeepHomoSeq are available at http://huanglab.phys.hust.edu.cn/DeepHomo2/.
Collapse
Affiliation(s)
- Peicong Lin
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| | - Yumeng Yan
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| |
Collapse
|
10
|
Marszalek J, Craig EA. Interaction of client—the scaffold on which FeS clusters are build—with J-domain protein Hsc20 and its evolving Hsp70 partners. Front Mol Biosci 2022; 9:1034453. [PMID: 36310602 PMCID: PMC9596805 DOI: 10.3389/fmolb.2022.1034453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 09/26/2022] [Indexed: 12/04/2022] Open
Abstract
In cells molecular chaperone systems consisting of Hsp70 and its obligatory J-domain protein (JDP) co-chaperones transiently interact with a myriad of client proteins—with JDPs typically recruiting their partner Hsp70 to interact with particular clients. The fundamentals of this cyclical interactions between JDP/Hsp70 systems and clients are well established. Much less is known about other aspects of JDP/Hsp70 system function, including how such systems evolved over time. Here we discuss the JDP/Hsp70 system involved in the biogenesis of iron-sulfur (FeS) clusters. Interaction between the client protein, the scaffold on which clusters are built, and its specialized JDP Hsc20 has stayed constant. However, the system’s Hsp70 has changed at least twice. In some species Hsc20’s Hsp70 partner interacts only with the scaffold, in others it has many JDP partners in addition to Hsc20 and interacts with many client proteins. Analysis of this switching of Hsp70 partners has provided insight into the insulation of JDP/Hsp70 systems from one another that can occur when more than one Hsp70 is present in a cellular compartment, as well as how competition among JDPs is balanced when an Hsp70 partner is shared amongst a number of JDPs. Of particularly broad relevance, even though the scaffold’s interactions with Hsc20 and Hsp70 are functionally critical for the biogenesis of FeS cluster-containing proteins, it is the modulation of the Hsc20-Hsp70 interaction per se that allows Hsc20 to function with such different Hsp70 partners.
Collapse
Affiliation(s)
- Jaroslaw Marszalek
- Intercollegiate Faculty of Biotechnology, University of Gdansk and Medical University of Gdansk, Gdansk, Poland
- *Correspondence: Jaroslaw Marszalek, ; Elizabeth A. Craig,
| | - Elizabeth A. Craig
- Department of Biochemistry, University of Wisconsin—Madison, Madison, WI, United States
- *Correspondence: Jaroslaw Marszalek, ; Elizabeth A. Craig,
| |
Collapse
|
11
|
Ahmed S, Chattopadhyay G, Manjunath K, Bhasin M, Singh N, Rasool M, Das S, Rana V, Khan N, Mitra D, Asok A, Singh R, Varadarajan R. Combining cysteine scanning with chemical labeling to map protein-protein interactions and infer bound structure in an intrinsically disordered region. Front Mol Biosci 2022; 9:997653. [PMID: 36275627 PMCID: PMC9585320 DOI: 10.3389/fmolb.2022.997653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 09/12/2022] [Indexed: 11/13/2022] Open
Abstract
The Mycobacterium tuberculosis genome harbours nine toxin-antitoxin (TA) systems of the mazEF family. These consist of two proteins, a toxin and an antitoxin, encoded in an operon. While the toxin has a conserved fold, the antitoxins are structurally diverse and the toxin binding region is typically intrinsically disordered before binding. We describe high throughput methodology for accurate mapping of interfacial residues and apply it to three MazEF complexes. The method involves screening one partner protein against a panel of chemically masked single cysteine mutants of its interacting partner, displayed on the surface of yeast cells. Such libraries have much lower diversity than those generated by saturation mutagenesis, simplifying library generation and data analysis. Further, because of the steric bulk of the masking reagent, labeling of virtually all exposed epitope residues should result in loss of binding, and buried residues are inaccessible to the labeling reagent. The binding residues are deciphered by probing the loss of binding to the labeled cognate partner by flow cytometry. Using this methodology, we have identified the interfacial residues for MazEF3, MazEF6 and MazEF9 TA systems of M. tuberculosis. In the case of MazEF9, where a crystal structure was available, there was excellent agreement between our predictions and the crystal structure, superior to those with AlphaFold2. We also report detailed biophysical characterization of the MazEF3 and MazEF9 TA systems and measured the relative affinities between cognate and non-cognate toxin–antitoxin partners in order to probe possible cross-talk between these systems.
Collapse
Affiliation(s)
- Shahbaz Ahmed
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | | | | | - Munmun Bhasin
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Neelam Singh
- Tuberculosis Research Laboratory, Translational Health Science and Technology Institute, Faridabad, India
| | - Mubashir Rasool
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Sayan Das
- Tuberculosis Research Laboratory, Translational Health Science and Technology Institute, Faridabad, India
| | - Varsha Rana
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Neha Khan
- Tuberculosis Research Laboratory, Translational Health Science and Technology Institute, Faridabad, India
| | - Debarghya Mitra
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Aparna Asok
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Ramandeep Singh
- Tuberculosis Research Laboratory, Translational Health Science and Technology Institute, Faridabad, India
| | - Raghavan Varadarajan
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
- *Correspondence: Raghavan Varadarajan,
| |
Collapse
|
12
|
Deciphering polymorphism in 61,157 Escherichia coli genomes via epistatic sequence landscapes. Nat Commun 2022; 13:4030. [PMID: 35821377 PMCID: PMC9276797 DOI: 10.1038/s41467-022-31643-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 06/27/2022] [Indexed: 12/05/2022] Open
Abstract
Characterizing the effect of mutations is key to understand the evolution of protein sequences and to separate neutral amino-acid changes from deleterious ones. Epistatic interactions between residues can lead to a context dependence of mutation effects. Context dependence constrains the amino-acid changes that can contribute to polymorphism in the short term, and the ones that can accumulate between species in the long term. We use computational approaches to accurately predict the polymorphisms segregating in a panel of 61,157 Escherichia coli genomes from the analysis of distant homologues. By comparing a context-aware Direct-Coupling Analysis modelling to a non-epistatic approach, we show that the genetic context strongly constrains the tolerable amino acids in 30% to 50% of amino-acid sites. The study of more distant species suggests the gradual build-up of genetic context over long evolutionary timescales by the accumulation of small epistatic contributions. Predicting the effects of mutations in a species is a major challenge in genetics. Here, the authors investigate protein sequence landscapes using diverged E. coli sequences, to predict tolerated mutations and capture interactions between mutations.
Collapse
|
13
|
Zhang W, Meng Q, Wang J, Guo F. HDIContact: a novel predictor of residue-residue contacts on hetero-dimer interfaces via sequential information and transfer learning strategy. Brief Bioinform 2022; 23:6599074. [PMID: 35653713 DOI: 10.1093/bib/bbac169] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 03/07/2022] [Accepted: 04/16/2022] [Indexed: 11/12/2022] Open
Abstract
Proteins maintain the functional order of cell in life by interacting with other proteins. Determination of protein complex structural information gives biological insights for the research of diseases and drugs. Recently, a breakthrough has been made in protein monomer structure prediction. However, due to the limited number of the known protein structure and homologous sequences of complexes, the prediction of residue-residue contacts on hetero-dimer interfaces is still a challenge. In this study, we have developed a deep learning framework for inferring inter-protein residue contacts from sequential information, called HDIContact. We utilized transfer learning strategy to produce Multiple Sequence Alignment (MSA) two-dimensional (2D) embedding based on patterns of concatenated MSA, which could reduce the influence of noise on MSA caused by mismatched sequences or less homology. For MSA 2D embedding, HDIContact took advantage of Bi-directional Long Short-Term Memory (BiLSTM) with two-channel to capture 2D context of residue pairs. Our comprehensive assessment on the Escherichia coli (E. coli) test dataset showed that HDIContact outperformed other state-of-the-art methods, with top precision of 65.96%, the Area Under the Receiver Operating Characteristic curve (AUROC) of 83.08% and the Area Under the Precision Recall curve (AUPR) of 25.02%. In addition, we analyzed the potential of HDIContact for human-virus protein-protein complexes, by achieving top five precision of 80% on O75475-P04584 related to Human Immunodeficiency Virus. All experiments indicated that our method was a valuable technical tool for predicting inter-protein residue contacts, which would be helpful for understanding protein-protein interaction mechanisms.
Collapse
Affiliation(s)
- Wei Zhang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Qiaozhen Meng
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
14
|
Si Y, Yan C. Protein complex structure prediction powered by multiple sequence alignments of interologs from multiple taxonomic ranks and AlphaFold2. Brief Bioinform 2022; 23:6596987. [PMID: 35649388 DOI: 10.1093/bib/bbac208] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 04/17/2022] [Accepted: 05/05/2022] [Indexed: 12/19/2022] Open
Abstract
AlphaFold2 can predict protein complex structures as long as a multiple sequence alignment (MSA) of the interologs of the target protein-protein interaction (PPI) can be provided. In this study, a simplified phylogeny-based approach was applied to generate the MSA of interologs, which was then used as the input to AlphaFold2 for protein complex structure prediction. In this extensively benchmarked protocol on nonredundant PPI dataset, including 107 bacterial PPIs and 442 eukaryotic PPIs, we show complex structures of 79.5% of the bacterial PPIs and 49.8% of the eukaryotic PPIs can be successfully predicted, which yielded significantly better performance than the application of MSA of interologs prepared by two existing approaches. Considering PPIs may not be conserved in species with long evolutionary distances, we further restricted interologs in the MSA to different taxonomic ranks of the species of the target PPI in protein complex structure prediction. We found that the success rates can be increased to 87.9% for the bacterial PPIs and 56.3% for the eukaryotic PPIs if interologs in the MSA are restricted to a specific taxonomic rank of the species of each target PPI. Finally, we show that the optimal taxonomic ranks for protein complex structure prediction can be selected with the application of the predicted template modeling (TM) scores of the output models.
Collapse
Affiliation(s)
- Yunda Si
- School of Physics, Huazhong University of Science and Technology, China
| | - Chengfei Yan
- School of Physics, Huazhong University of Science and Technology, China
| |
Collapse
|
15
|
Labes S, Stupp D, Wagner N, Bloch I, Lotem M, L Lahad E, Polak P, Pupko T, Tabach Y. Machine-learning of complex evolutionary signals improves classification of SNVs. NAR Genom Bioinform 2022; 4:lqac025. [PMID: 35402908 PMCID: PMC8988715 DOI: 10.1093/nargab/lqac025] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2021] [Revised: 02/08/2022] [Accepted: 03/28/2022] [Indexed: 12/12/2022] Open
Abstract
Conservation is a strong predictor for the pathogenicity of single-nucleotide variants (SNVs). However, some positions that present complex conservation patterns across vertebrates stray from this paradigm. Here, we analyzed the association between complex conservation patterns and the pathogenicity of SNVs in the 115 disease-genes that had sufficient variant data. We show that conservation is not a one-rule-fits-all solution since its accuracy highly depends on the analyzed set of species and genes. For example, pairwise comparisons between the human and 99 vertebrate species showed that species differ in their ability to predict the clinical outcomes of variants among different genes using conservation. Furthermore, certain genes were less amenable for conservation-based variant prediction, while others demonstrated species that optimize prediction. These insights led to developing EvoDiagnostics, which uses the conservation against each species as a feature within a random-forest machine-learning classification algorithm. EvoDiagnostics outperformed traditional conservation algorithms, deep-learning based methods and most ensemble tools in every prediction-task, highlighting the strength of optimizing conservation analysis per-species and per-gene. Overall, we suggest a new and a more biologically relevant approach for analyzing conservation, which improves prediction of variant pathogenicity.
Collapse
Affiliation(s)
- Sapir Labes
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Faculty of Medicine, and Hadassah University Medical School, The Hebrew University of Jerusalem, Jerusalem9112001, Israel
| | - Doron Stupp
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Faculty of Medicine, and Hadassah University Medical School, The Hebrew University of Jerusalem, Jerusalem9112001, Israel
| | - Naama Wagner
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Idit Bloch
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Faculty of Medicine, and Hadassah University Medical School, The Hebrew University of Jerusalem, Jerusalem9112001, Israel
| | - Michal Lotem
- Sharett Institute of Oncology, Hadassah University Medical Center, The Hebrew University of Jerusalem, Jerusalem9112001, Israel
| | - Ephrat L Lahad
- Medical Genetics Institute, Shaare Zedek Medical Center, Jerusalem9103102, Israel
| | - Paz Polak
- Oncological Sciences, Icahn School of Medicine at Mount Sinai, NY10029, USA
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Yuval Tabach
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Faculty of Medicine, and Hadassah University Medical School, The Hebrew University of Jerusalem, Jerusalem9112001, Israel
| |
Collapse
|
16
|
Gao M, Nakajima An D, Parks JM, Skolnick J. AF2Complex predicts direct physical interactions in multimeric proteins with deep learning. Nat Commun 2022; 13:1744. [PMID: 35365655 PMCID: PMC8975832 DOI: 10.1038/s41467-022-29394-2] [Citation(s) in RCA: 107] [Impact Index Per Article: 53.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 03/15/2022] [Indexed: 12/20/2022] Open
Abstract
Accurate descriptions of protein-protein interactions are essential for understanding biological systems. Remarkably accurate atomic structures have been recently computed for individual proteins by AlphaFold2 (AF2). Here, we demonstrate that the same neural network models from AF2 developed for single protein sequences can be adapted to predict the structures of multimeric protein complexes without retraining. In contrast to common approaches, our method, AF2Complex, does not require paired multiple sequence alignments. It achieves higher accuracy than some complex protein-protein docking strategies and provides a significant improvement over AF-Multimer, a development of AlphaFold for multimeric proteins. Moreover, we introduce metrics for predicting direct protein-protein interactions between arbitrary protein pairs and validate AF2Complex on some challenging benchmark sets and the E. coli proteome. Lastly, using the cytochrome c biogenesis system I as an example, we present high-confidence models of three sought-after assemblies formed by eight members of this system. Accurate descriptions of protein-protein interactions are essential for understanding biological systems. Here the authors present AF2Complex and show that application to the E. coli cytochrome biogenesis system I yields confident computational models for three sought-after assemblies.
Collapse
Affiliation(s)
- Mu Gao
- Center for the Study of Systems Biology, School of Biological Sciences, Atlanta, GA, USA.
| | - Davi Nakajima An
- School of Computer Science, Georgia Institute of Technology, Atlanta, GA, USA
| | - Jerry M Parks
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biological Sciences, Atlanta, GA, USA.
| |
Collapse
|
17
|
Karaca E, Prévost C, Sacquin-Mora S. Modeling the Dynamics of Protein–Protein Interfaces, How and Why? Molecules 2022; 27:molecules27061841. [PMID: 35335203 PMCID: PMC8950966 DOI: 10.3390/molecules27061841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Revised: 03/06/2022] [Accepted: 03/08/2022] [Indexed: 12/07/2022] Open
Abstract
Protein–protein assemblies act as a key component in numerous cellular processes. Their accurate modeling at the atomic level remains a challenge for structural biology. To address this challenge, several docking and a handful of deep learning methodologies focus on modeling protein–protein interfaces. Although the outcome of these methods has been assessed using static reference structures, more and more data point to the fact that the interaction stability and specificity is encoded in the dynamics of these interfaces. Therefore, this dynamics information must be taken into account when modeling and assessing protein interactions at the atomistic scale. Expanding on this, our review initially focuses on the recent computational strategies aiming at investigating protein–protein interfaces in a dynamic fashion using enhanced sampling, multi-scale modeling, and experimental data integration. Then, we discuss how interface dynamics report on the function of protein assemblies in globular complexes, in fuzzy complexes containing intrinsically disordered proteins, as well as in active complexes, where chemical reactions take place across the protein–protein interface.
Collapse
Affiliation(s)
- Ezgi Karaca
- Izmir Biomedicine and Genome Center, Izmir 35340, Turkey;
- Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir 35340, Turkey
| | - Chantal Prévost
- CNRS, Laboratoire de Biochimie Théorique, UPR9080, Université de Paris, 13 rue Pierre et Marie Curie, 75005 Paris, France;
- Institut de Biologie Physico-Chimique, Fondation Edmond de Rothschild, PSL Research University, 75006 Paris, France
| | - Sophie Sacquin-Mora
- CNRS, Laboratoire de Biochimie Théorique, UPR9080, Université de Paris, 13 rue Pierre et Marie Curie, 75005 Paris, France;
- Institut de Biologie Physico-Chimique, Fondation Edmond de Rothschild, PSL Research University, 75006 Paris, France
- Correspondence:
| |
Collapse
|
18
|
Improved prediction of protein-protein interactions using AlphaFold2. Nat Commun 2022; 13:1265. [PMID: 35273146 PMCID: PMC8913741 DOI: 10.1038/s41467-022-28865-w] [Citation(s) in RCA: 308] [Impact Index Per Article: 154.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 02/11/2022] [Indexed: 01/02/2023] Open
Abstract
Predicting the structure of interacting protein chains is a fundamental step towards understanding protein function. Unfortunately, no computational method can produce accurate structures of protein complexes. AlphaFold2, has shown unprecedented levels of accuracy in modelling single chain protein structures. Here, we apply AlphaFold2 for the prediction of heterodimeric protein complexes. We find that the AlphaFold2 protocol together with optimised multiple sequence alignments, generate models with acceptable quality (DockQ ≥ 0.23) for 63% of the dimers. From the predicted interfaces we create a simple function to predict the DockQ score which distinguishes acceptable from incorrect models as well as interacting from non-interacting proteins with state-of-art accuracy. We find that, using the predicted DockQ scores, we can identify 51% of all interacting pairs at 1% FPR. Predicting the structure of protein complexes is extremely difficult. Here, authors apply AlphaFold2 with optimized multiple sequence alignments to model complexes of interacting proteins, enabling prediction of both if and how proteins interact with state-of-art accuracy.
Collapse
|
19
|
Guo L, He J, Lin P, Huang SY, Wang J. TRScore: a three-dimensional RepVGG-based scoring method for ranking protein docking models. Bioinformatics 2022; 38:2444-2451. [PMID: 35199137 DOI: 10.1093/bioinformatics/btac120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Revised: 01/19/2022] [Accepted: 02/21/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Protein-protein interactions (PPI) play important roles in cellular activities. Due to the technical difficulty and high cost of experimental methods, there are considerable interests towards the development of computational approaches, such as protein docking, to decipher PPI patterns. One of the important and difficult aspects in protein docking is recognizing near-native conformations from a set of decoys, but unfortunately traditional scoring functions still suffer from limited accuracy. Therefore, new scoring methods are pressingly needed in methodological and/or practical implications. RESULTS We present a new deep learning-based scoring method for ranking protein-protein docking models based on a three-dimensional (3D) RepVGG network, named TRScore. To recognize near-native conformations from a set of decoys, TRScore voxelizes the protein-protein interface into a 3D grid labeled by the number of atoms in different physicochemical classes. Benefiting from the deep convolutional RepVGG architecture, TRScore can effectively capture the subtle differences between energetically favorable near-native models and unfavorable non-native decoys without needing extra information. TRScore was extensively evaluated on diverse test sets including protein-protein docking benchmark 5.0 update set, DockGround decoy set, as well as realistic CAPRI decoy set, and overall obtained a significant improvement over existing methods in cross validation and independent evaluations. AVAILABILITY Codes available at: https://github.com/BioinformaticsCSU/TRScore.
Collapse
Affiliation(s)
- Linyuan Guo
- School of Computer Science, Central South University, Changsha, Hunan 410083, China
| | - Jiahua He
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Peicong Lin
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Jianxin Wang
- School of Computer Science, Central South University, Changsha, Hunan 410083, China
| |
Collapse
|
20
|
Mehrabiani KM, Cheng RR, Onuchic JN. Expanding Direct Coupling Analysis to Identify Heterodimeric Interfaces from Limited Protein Sequence Data. J Phys Chem B 2021; 125:11408-11417. [PMID: 34618469 DOI: 10.1021/acs.jpcb.1c07145] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Direct coupling analysis (DCA) is a global statistical approach that uses information encoded in protein sequence data to predict spatial contacts in a three-dimensional structure of a folded protein. DCA has been widely used to predict the monomeric fold at amino acid resolution and to identify biologically relevant interaction sites within a folded protein. Going beyond single proteins, DCA has also been used to identify spatial contacts that stabilize the interaction in protein complex formation. However, extracting this higher order information necessary to predict dimer contacts presents a significant challenge. A DCA evolutionary signal is much stronger at the single protein level (intraprotein contacts) than at the protein-protein interface (interprotein contacts). Therefore, if DCA-derived information is to be used to predict the structure of these complexes, there is a need to identify statistically significant DCA predictions. We propose a simple Z-score measure that can filter good predictions despite noisy, limited data. This new methodology not only improves our prediction ability but also provides a quantitative measure for the validity of the prediction.
Collapse
Affiliation(s)
- Kareem M Mehrabiani
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States.,Systems, Synthetic, and Physical Biology, Rice University, Houston, Texas 77005, United States
| | - Ryan R Cheng
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
| | - José N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States.,Systems, Synthetic, and Physical Biology, Rice University, Houston, Texas 77005, United States.,Department of Physics & Astronomy, Rice University, Houston, Texas 77005, United States.,Department of Chemistry, Rice University, Houston, Texas 77005, United States.,Department of Biosciences, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
21
|
Barrat-Charlaix P, Muntoni AP, Shimagaki K, Weigt M, Zamponi F. Sparse generative modeling via parameter reduction of Boltzmann machines: Application to protein-sequence families. Phys Rev E 2021; 104:024407. [PMID: 34525554 DOI: 10.1103/physreve.104.024407] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Accepted: 07/19/2021] [Indexed: 11/07/2022]
Abstract
Boltzmann machines (BMs) are widely used as generative models. For example, pairwise Potts models (PMs), which are instances of the BM class, provide accurate statistical models of families of evolutionarily related protein sequences. Their parameters are the local fields, which describe site-specific patterns of amino acid conservation, and the two-site couplings, which mirror the coevolution between pairs of sites. This coevolution reflects structural and functional constraints acting on protein sequences during evolution. The most conservative choice to describe the coevolution signal is to include all possible two-site couplings into the PM. This choice, typical of what is known as Direct Coupling Analysis, has been successful for predicting residue contacts in the three-dimensional structure, mutational effects, and generating new functional sequences. However, the resulting PM suffers from important overfitting effects: many couplings are small, noisy, and hardly interpretable; the PM is close to a critical point, meaning that it is highly sensitive to small parameter perturbations. In this work, we introduce a general parameter-reduction procedure for BMs, via a controlled iterative decimation of the less statistically significant couplings, identified by an information-based criterion that selects either weak or statistically unsupported couplings. For several protein families, our procedure allows one to remove more than 90% of the PM couplings, while preserving the predictive and generative properties of the original dense PM, and the resulting model is far away from criticality, hence more robust to noise.
Collapse
Affiliation(s)
- Pierre Barrat-Charlaix
- Biozentrum, Universität Basel, Switzerland, Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Anna Paola Muntoni
- Department of Applied Science and Technology (DISAT), Politecnico di Torino, Corso Duca degli Abruzzi 24, Torino 10129, Italy.,Italian Institute for Genomic Medicine, IRCCS Candiolo, SP-142, I-10060 Candiolo (TO), Italy.,Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative-LCQB, F-75005 Paris, France.,Laboratoire de Physique de l'Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France
| | - Kai Shimagaki
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative-LCQB, F-75005 Paris, France
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative-LCQB, F-75005 Paris, France
| | - Francesco Zamponi
- Laboratoire de Physique de l'Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France
| |
Collapse
|
22
|
Yan Y, Huang SY. Accurate prediction of inter-protein residue-residue contacts for homo-oligomeric protein complexes. Brief Bioinform 2021; 22:bbab038. [PMID: 33693482 PMCID: PMC8425427 DOI: 10.1093/bib/bbab038] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2020] [Revised: 01/09/2021] [Indexed: 12/14/2022] Open
Abstract
Protein-protein interactions play a fundamental role in all cellular processes. Therefore, determining the structure of protein-protein complexes is crucial to understand their molecular mechanisms and develop drugs targeting the protein-protein interactions. Recently, deep learning has led to a breakthrough in intra-protein contact prediction, achieving an unusual high accuracy in recent Critical Assessment of protein Structure Prediction (CASP) structure prediction challenges. However, due to the limited number of known homologous protein-protein interactions and the challenge to generate joint multiple sequence alignments of two interacting proteins, the advances in inter-protein contact prediction remain limited. Here, we have proposed a deep learning model to predict inter-protein residue-residue contacts across homo-oligomeric protein interfaces, named as DeepHomo. Unlike previous deep learning approaches, we integrated intra-protein distance map and inter-protein docking pattern, in addition to evolutionary coupling, sequence conservation, and physico-chemical information of monomers. DeepHomo was extensively tested on both experimentally determined structures and realistic CASP-Critical Assessment of Predicted Interaction (CAPRI) targets. It was shown that DeepHomo achieved a high precision of >60% for the top predicted contact and outperformed state-of-the-art direct-coupling analysis and machine learning-based approaches. Integrating predicted inter-chain contacts into protein-protein docking significantly improved the docking accuracy on the benchmark dataset of realistic homo-dimeric targets from CASP-CAPRI experiments. DeepHomo is available at http://huanglab.phys.hust.edu.cn/DeepHomo/.
Collapse
Affiliation(s)
- Yumeng Yan
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, PR China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, PR China
| |
Collapse
|
23
|
Abstract
Enzymes are widely used in the food industry. Their use as a supplement to the raw material for animal feed is a current research topic. Although there are several studies on the application of enzyme additives in the animal feed industry, it is necessary to search for new enzymes, as well as to utilize bioinformatics tools for the design of specific enzymes that work in certain environmental conditions and substrates. This will allow the improvement of the productive parameters in animals, reducing costs and making the processes more efficient. Technological needs have considered these catalysts as essential in many industrial sectors and research is constantly being carried out to optimize their use in those processes. This review describes the enzymes used in animal nutrition, their mode of action, their production and new sources of production as well as studies on different animal models to evaluate their effect on the productive performance intended for the production of animal feed.
Collapse
|
24
|
Dapkūnas J, Olechnovič K, Venclovas Č. Modeling of protein complexes in CASP14 with emphasis on the interaction interface prediction. Proteins 2021; 89:1834-1843. [PMID: 34176161 PMCID: PMC9292421 DOI: 10.1002/prot.26167] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Revised: 06/21/2021] [Accepted: 06/23/2021] [Indexed: 01/08/2023]
Abstract
The goal of CASP experiments is to monitor the progress in the protein structure prediction field. During the 14th CASP edition we aimed to test our capabilities of predicting structures of protein complexes. Our protocol for modeling protein assemblies included both template‐based modeling and free docking. Structural templates were identified using sensitive sequence‐based searches. If sequence‐based searches failed, we performed structure‐based template searches using selected CASP server models. In the absence of reliable templates we applied free docking starting from monomers generated by CASP servers. We evaluated and ranked models of protein complexes using an improved version of our protein structure quality assessment method, VoroMQA, taking into account both interaction interface and global structure scores. If reliable templates could be identified, generally accurate models of protein assemblies were generated with the exception of an antibody‐antigen interaction. The success of free docking mainly depended on the accuracy of initial subunit models and on the scoring of docking solutions. To put our overall results in perspective, we analyzed our performance in the context of other CASP groups. Although the subunits in our assembly models often were not of the top quality, these models had, overall, the best‐predicted intersubunit interfaces according to several accuracy measures. We attribute our relative success primarily to the emphasis on the interaction interface when modeling and scoring.
Collapse
Affiliation(s)
- Justas Dapkūnas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| |
Collapse
|
25
|
Foster CA, Silversmith RE, Immormino RM, Vass LR, Kennedy EN, Pazy Y, Collins EJ, Bourret RB. Role of Position K+4 in the Phosphorylation and Dephosphorylation Reaction Kinetics of the CheY Response Regulator. Biochemistry 2021; 60:2130-2151. [PMID: 34167303 DOI: 10.1021/acs.biochem.1c00246] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Two-component signaling is a primary method by which microorganisms interact with their environments. A kinase detects stimuli and modulates autophosphorylation activity. The signal propagates by phosphotransfer from the kinase to a response regulator, eliciting a response. Response regulators operate over a range of time scales, corresponding to their related biological processes. Response regulator active site chemistry is highly conserved, but certain variable residues can influence phosphorylation kinetics. An Ala-to-Pro substitution (K+4, residue 113) in the Escherichia coli response regulator CheY triggers a constitutively active phenotype; however, the A113P substitution is too far from the active site to directly affect phosphochemistry. To better understand the activating mechanism(s) of the substitution, we analyzed receiver domain sequences to characterize the evolutionary role of the K+4 position. Although most featured Pro, Leu, Ile, and Val residues, chemotaxis-related proteins exhibited atypical Ala, Gly, Asp, and Glu residues at K+4. Structural and in silico analyses revealed that CheY A113P adopted a partially active configuration. Biochemical data showed that A113P shifted CheY toward a more activated state, enhancing autophosphorylation. By characterizing CheY variants, we determined that this functionality was transmitted through a hydrophobic network bounded by the β5α5 loop and the α1 helix of CheY. This region also interacts with the phosphodonor CheAP1, suggesting that binding generates an activating perturbation similar to the A113P substitution. Atypical residues like Ala at the K+4 position likely serve two purposes. First, restricting autophosphorylation may minimize background noise generated by intracellular phosphodonors such as acetyl phosphate. Second, optimizing interactions with upstream partners may help prime the receiver domain for phosphorylation.
Collapse
Affiliation(s)
- Clay A Foster
- Department of Microbiology and Immunology, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Ruth E Silversmith
- Department of Microbiology and Immunology, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Robert M Immormino
- Department of Microbiology and Immunology, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Luke R Vass
- Department of Microbiology and Immunology, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Emily N Kennedy
- Department of Microbiology and Immunology, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Yael Pazy
- Department of Microbiology and Immunology, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Edward J Collins
- Department of Microbiology and Immunology, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Robert B Bourret
- Department of Microbiology and Immunology, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| |
Collapse
|
26
|
ELIHKSIR Web Server: Evolutionary Links Inferred for Histidine Kinase Sensors Interacting with Response Regulators. ENTROPY 2021; 23:e23020170. [PMID: 33573110 PMCID: PMC7911359 DOI: 10.3390/e23020170] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 01/21/2021] [Accepted: 01/26/2021] [Indexed: 12/03/2022]
Abstract
Two-component systems (TCS) are signaling machinery that consist of a histidine kinases (HK) and response regulator (RR). When an environmental change is detected, the HK phosphorylates its cognate response regulator (RR). While cognate interactions were considered orthogonal, experimental evidence shows the prevalence of crosstalk interactions between non-cognate HK–RR pairs. Currently, crosstalk interactions have been demonstrated for TCS proteins in a limited number of organisms. By providing specificity predictions across entire TCS networks for a large variety of organisms, the ELIHKSIR web server assists users in identifying interactions for TCS proteins and their mutants. To generate specificity scores, a global probabilistic model was used to identify interfacial couplings and local fields from sequence information. These couplings and local fields were then used to construct Hamiltonian scores for positions with encoded specificity, resulting in the specificity score. These methods were applied to 6676 organisms available on the ELIHKSIR web server. Due to the ability to mutate proteins and display the resulting network changes, there are nearly endless combinations of TCS networks to analyze using ELIHKSIR. The functionality of ELIHKSIR allows users to perform a variety of TCS network analyses and visualizations to support TCS research efforts.
Collapse
|
27
|
María Hernández-Domínguez E, Sofía Castillo-Ortega L, García-Esquivel Y, Mandujano-González V, Díaz-Godínez G, Álvarez-Cervantes J. Bioinformatics as a Tool for the Structural and Evolutionary Analysis of Proteins. Comput Biol Chem 2020. [DOI: 10.5772/intechopen.89594] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
This chapter deals with the topic of bioinformatics, computational, mathematics, and statistics tools applied to biology, essential for the analysis and characterization of biological molecules, in particular proteins, which play an important role in all cellular and evolutionary processes of the organisms. In recent decades, with the next generation sequencing technologies and bioinformatics, it has facilitated the collection and analysis of a large amount of genomic, transcriptomic, proteomic, and metabolomic data from different organisms that have allowed predictions on the regulation of expression, transcription, translation, structure, and mechanisms of action of proteins as well as homology, mutations, and evolutionary processes that generate structural and functional changes over time. Although the information in the databases is greater every day, all bioinformatics tools continue to be constantly modified to improve performance that leads to more accurate predictions regarding protein functionality, which is why bioinformatics research remains a great challenge.
Collapse
|
28
|
Salmanian S, Pezeshk H, Sadeghi M. Inter-protein residue covariation information unravels physically interacting protein dimers. BMC Bioinformatics 2020; 21:584. [PMID: 33334319 PMCID: PMC7745481 DOI: 10.1186/s12859-020-03930-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Accepted: 12/09/2020] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Predicting physical interaction between proteins is one of the greatest challenges in computational biology. There are considerable various protein interactions and a huge number of protein sequences and synthetic peptides with unknown interacting counterparts. Most of co-evolutionary methods discover a combination of physical interplays and functional associations. However, there are only a handful of approaches which specifically infer physical interactions. Hybrid co-evolutionary methods exploit inter-protein residue coevolution to unravel specific physical interacting proteins. In this study, we introduce a hybrid co-evolutionary-based approach to predict physical interplays between pairs of protein families, starting from protein sequences only. RESULTS In the present analysis, pairs of multiple sequence alignments are constructed for each dimer and the covariation between residues in those pairs are calculated by CCMpred (Contacts from Correlated Mutations predicted) and three mutual information based approaches for ten accessible surface area threshold groups. Then, whole residue couplings between proteins of each dimer are unified into a single Frobenius norm value. Norms of residue contact matrices of all dimers in different accessible surface area thresholds are fed into support vector machine as single or multiple feature models. The results of training the classifiers by single features show no apparent different accuracies in distinct methods for different accessible surface area thresholds. Nevertheless, mutual information product and context likelihood of relatedness procedures may roughly have an overall higher and lower performances than other two methods for different accessible surface area cut-offs, respectively. The results also demonstrate that training support vector machine with multiple norm features for several accessible surface area thresholds leads to a considerable improvement of prediction performance. In this context, CCMpred roughly achieves an overall better performance than mutual information based approaches. The best accuracy, sensitivity, specificity, precision and negative predictive value for that method are 0.98, 1, 0.962, 0.96, and 0.962, respectively. CONCLUSIONS In this paper, by feeding norm values of protein dimers into support vector machines in different accessible surface area thresholds, we demonstrate that even small number of proteins in pairs of multiple alignments could allow one to accurately discriminate between positive and negative dimers.
Collapse
Affiliation(s)
- Sara Salmanian
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Hamid Pezeshk
- School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran
- Present Address: Department of Mathematics and Statistics, Concordia University, Montreal, Canada
- School of Biological Sciences, Institute for Research in Fundamental Sciences, Tehran, Iran
| | - Mehdi Sadeghi
- National Institute of Genetic Engineering and Biotechnology, Tehran, Iran
| |
Collapse
|
29
|
D'Amico RN, Murray AM, Boehr DD. Driving Protein Conformational Cycles in Physiology and Disease: "Frustrated" Amino Acid Interaction Networks Define Dynamic Energy Landscapes: Amino Acid Interaction Networks Change Progressively Along Alpha Tryptophan Synthase's Catalytic Cycle. Bioessays 2020; 42:e2000092. [PMID: 32720327 DOI: 10.1002/bies.202000092] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Revised: 06/09/2020] [Indexed: 12/22/2022]
Abstract
A general framework by which dynamic interactions within a protein will promote the necessary series of structural changes, or "conformational cycle," required for function is proposed. It is suggested that the free-energy landscape of a protein is biased toward this conformational cycle. Fluctuations into higher energy, although thermally accessible, conformations drive the conformational cycle forward. The amino acid interaction network is defined as those intraprotein interactions that contribute most to the free-energy landscape. Some network connections are consistent in every structural state, while others periodically change their interaction strength according to the conformational cycle. It is reviewed here that structural transitions change these periodic network connections, which then predisposes the protein toward the next set of network changes, and hence the next structural change. These concepts are illustrated by recent work on tryptophan synthase. Disruption of these dynamic connections may lead to aberrant protein function and disease states.
Collapse
Affiliation(s)
- Rebecca N D'Amico
- Department of Chemistry, The Pennsylvania State University, 107 Chemistry Building, University Park, PA, 16802, USA
| | - Alec M Murray
- Department of Chemistry, The Pennsylvania State University, 107 Chemistry Building, University Park, PA, 16802, USA
| | - David D Boehr
- Department of Chemistry, The Pennsylvania State University, 107 Chemistry Building, University Park, PA, 16802, USA
| |
Collapse
|
30
|
Chanda P, Costa E, Hu J, Sukumar S, Van Hemert J, Walia R. Information Theory in Computational Biology: Where We Stand Today. ENTROPY (BASEL, SWITZERLAND) 2020; 22:E627. [PMID: 33286399 PMCID: PMC7517167 DOI: 10.3390/e22060627] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 05/31/2020] [Accepted: 06/03/2020] [Indexed: 12/30/2022]
Abstract
"A Mathematical Theory of Communication" was published in 1948 by Claude Shannon to address the problems in the field of data compression and communication over (noisy) communication channels. Since then, the concepts and ideas developed in Shannon's work have formed the basis of information theory, a cornerstone of statistical learning and inference, and has been playing a key role in disciplines such as physics and thermodynamics, probability and statistics, computational sciences and biological sciences. In this article we review the basic information theory based concepts and describe their key applications in multiple major areas of research in computational biology-gene expression and transcriptomics, alignment-free sequence comparison, sequencing and error correction, genome-wide disease-gene association mapping, metabolic networks and metabolomics, and protein sequence, structure and interaction analysis.
Collapse
Affiliation(s)
- Pritam Chanda
- Corteva Agriscience™, Indianapolis, IN 46268, USA
- Computer and Information Science, Indiana University-Purdue University, Indianapolis, IN 46202, USA
| | - Eduardo Costa
- Corteva Agriscience™, Mogi Mirim, Sao Paulo 13801-540, Brazil
| | - Jie Hu
- Corteva Agriscience™, Indianapolis, IN 46268, USA
| | | | | | - Rasna Walia
- Corteva Agriscience™, Johnston, IA 50131, USA
| |
Collapse
|
31
|
Gandarilla-Pérez CA, Mergny P, Weigt M, Bitbol AF. Statistical physics of interacting proteins: Impact of dataset size and quality assessed in synthetic sequences. Phys Rev E 2020; 101:032413. [PMID: 32290011 DOI: 10.1103/physreve.101.032413] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Accepted: 03/04/2020] [Indexed: 11/07/2022]
Abstract
Identifying protein-protein interactions is crucial for a systems-level understanding of the cell. Recently, algorithms based on inverse statistical physics, e.g., direct coupling analysis (DCA), have allowed to use evolutionarily related sequences to address two conceptually related inference tasks: finding pairs of interacting proteins and identifying pairs of residues which form contacts between interacting proteins. Here we address two underlying questions: How are the performances of both inference tasks related? How does performance depend on dataset size and the quality? To this end, we formalize both tasks using Ising models defined over stochastic block models, with individual blocks representing single proteins and interblock couplings protein-protein interactions; controlled synthetic sequence data are generated by Monte Carlo simulations. We show that DCA is able to address both inference tasks accurately when sufficiently large training sets of known interaction partners are available and that an iterative pairing algorithm allows to make predictions even without a training set. Noise in the training data deteriorates performance. In both tasks we find a quadratic scaling relating dataset quality and size that is consistent with noise adding in square-root fashion and signal adding linearly when increasing the dataset. This implies that it is generally good to incorporate more data even if their quality are imperfect, thereby shedding light on the empirically observed performance of DCA applied to natural protein sequences.
Collapse
Affiliation(s)
- Carlos A Gandarilla-Pérez
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative (LCQB, UMR 7238), F-75005 Paris, France.,Facultad de Física, Universidad de la Habana, San Lázaro y L, Vedado, Habana 4, CP-10400, Cuba
| | - Pierre Mergny
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative (LCQB, UMR 7238), F-75005 Paris, France.,Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire Jean Perrin (LJP, UMR 8237), F-75005 Paris, France
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative (LCQB, UMR 7238), F-75005 Paris, France
| | - Anne-Florence Bitbol
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire Jean Perrin (LJP, UMR 8237), F-75005 Paris, France.,Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
| |
Collapse
|
32
|
Koukos P, Bonvin A. Integrative Modelling of Biomolecular Complexes. J Mol Biol 2020; 432:2861-2881. [DOI: 10.1016/j.jmb.2019.11.009] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2019] [Revised: 11/12/2019] [Accepted: 11/13/2019] [Indexed: 12/31/2022]
|
33
|
Tomáška Ľ, Nosek J. Co-evolution in the Jungle: From Leafcutter Ant Colonies to Chromosomal Ends. J Mol Evol 2020; 88:293-318. [PMID: 32157325 DOI: 10.1007/s00239-020-09935-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Accepted: 02/25/2020] [Indexed: 02/06/2023]
Abstract
Biological entities are multicomponent systems where each part is directly or indirectly dependent on the others. In effect, a change in a single component might have a consequence on the functioning of its partners, thus affecting the fitness of the entire system. In this article, we provide a few examples of such complex biological systems, ranging from ant colonies to a population of amino acids within a single-polypeptide chain. Based on these examples, we discuss one of the central and still challenging questions in biology: how do such multicomponent consortia co-evolve? More specifically, we ask how telomeres, nucleo-protein complexes protecting the integrity of linear DNA chromosomes, originated from the ancestral organisms having circular genomes and thus not dealing with end-replication and end-protection problems. Using the examples of rapidly evolving topologies of mitochondrial genomes in eukaryotic microorganisms, we show what means of co-evolution were employed to accommodate various types of telomere-maintenance mechanisms in mitochondria. We also describe an unprecedented runaway evolution of telomeric repeats in nuclei of ascomycetous yeasts accompanied by co-evolution of telomere-associated proteins. We propose several scenarios derived from research on telomeres and supported by other studies from various fields of biology, while emphasizing that the relevant answers are still not in sight. It is this uncertainty and a lack of a detailed roadmap that makes the journey through the jungle of biological systems still exciting and worth undertaking.
Collapse
Affiliation(s)
- Ľubomír Tomáška
- Department of Genetics, Faculty of Natural Sciences, Comenius University in Bratislava, Ilkovičova 6, 842 15, Bratislava, Slovakia.
| | - Jozef Nosek
- Department of Biochemistry, Faculty of Natural Sciences, Comenius University in Bratislava, Ilkovičova 6, 842 15, Bratislava, Slovakia
| |
Collapse
|
34
|
Machine learning for protein folding and dynamics. Curr Opin Struct Biol 2020; 60:77-84. [DOI: 10.1016/j.sbi.2019.12.005] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Revised: 11/21/2019] [Accepted: 12/05/2019] [Indexed: 12/17/2022]
|
35
|
Role of protein-protein interactions in allosteric drug design for DNA methyltransferases. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2020; 121:49-84. [PMID: 32312426 DOI: 10.1016/bs.apcsb.2019.12.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
DNA methyltransferases (DNMTs) not only play key roles in epigenetic gene regulation, but also serve as emerging targets for several diseases, especially for cancers. Due to the multi-domains of DNMT structures, targeting allosteric sites of protein-protein interactions (PPIs) is becoming an attractive strategy in epigenetic drug discovery. This chapter aims to review the major contemporary approaches utilized for the drug discovery based on PPIs in different dimensions, from the enumeration of allosteric mechanism to the identification of allosteric pockets. These include the construction of protein structure networks (PSNs) based on molecular dynamics (MD) simulations, performing elastic network models (ENMs) and perturbation response scanning (PRS) calculation, the sequence-based conservation and coupling analysis, and the allosteric pockets identification. Furthermore, we complement this methodology by highlighting the role of computational approaches in promising practical applications for the computer-aided drug design, with special focus on two DNMTs, namely, DNMT1 and DNMT3A.
Collapse
|
36
|
Barreto CAV, Baptista SJ, Preto AJ, Matos-Filipe P, Mourão J, Melo R, Moreira I. Prediction and targeting of GPCR oligomer interfaces. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2020; 169:105-149. [PMID: 31952684 DOI: 10.1016/bs.pmbts.2019.11.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
GPCR oligomerization has emerged as a hot topic in the GPCR field in the last years. Receptors that are part of these oligomers can influence each other's function, although it is not yet entirely understood how these interactions work. The existence of such a highly complex network of interactions between GPCRs generates the possibility of alternative targets for new therapeutic approaches. However, challenges still exist in the characterization of these complexes, especially at the interface level. Different experimental approaches, such as FRET or BRET, are usually combined to study GPCR oligomer interactions. Computational methods have been applied as a useful tool for retrieving information from GPCR sequences and the few X-ray-resolved oligomeric structures that are accessible, as well as for predicting new and trustworthy GPCR oligomeric interfaces. Machine-learning (ML) approaches have recently helped with some hindrances of other methods. By joining and evaluating multiple structure-, sequence- and co-evolution-based features on the same algorithm, it is possible to dilute the issues of particular structures and residues that arise from the experimental methodology into all-encompassing algorithms capable of accurately predict GPCR-GPCR interfaces. All these methods used as a single or a combined approach provide useful information about GPCR oligomerization and its role in GPCR function and dynamics. Altogether, we present experimental, computational and machine-learning methods used to study oligomers interfaces, as well as strategies that have been used to target these dynamic complexes.
Collapse
Affiliation(s)
- Carlos A V Barreto
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
| | - Salete J Baptista
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal; Centro de Ciências e Tecnologias Nucleares, Instituto Superior Técnico, Universidade de Lisboa, CTN, LRS, Portugal
| | - António José Preto
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
| | - Pedro Matos-Filipe
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
| | - Joana Mourão
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal; Institute for Interdisciplinary Research, University of Coimbra, Coimbra, Portugal
| | - Rita Melo
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal; Centro de Ciências e Tecnologias Nucleares, Instituto Superior Técnico, Universidade de Lisboa, CTN, LRS, Portugal
| | - Irina Moreira
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal; Science and Technology Faculty, University of Coimbra, Coimbra, Portugal.
| |
Collapse
|
37
|
Gueudré T, Baldassi C, Pagnani A, Weigt M. Predicting Interacting Protein Pairs by Coevolutionary Paralog Matching. Methods Mol Biol 2020; 2074:57-65. [PMID: 31583630 DOI: 10.1007/978-1-4939-9873-9_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Even if we know that two families of homologous proteins interact, we do not necessarily know, which specific proteins interact inside each species. The reason is that most families contain paralogs, i.e., more than one homologous sequence per species. We have developed a tool to predict interacting paralogs between the two protein families, which is based on the idea of inter-protein coevolution: our algorithm matches those members of the two protein families, which belong to the same species and collectively maximize the detectable coevolutionary signal. It is applicable even in cases, where simpler methods based, e.g., on genomic co-localization of genes coding for interacting proteins or orthology-based methods fail. In this method paper, we present an efficient implementation of this idea based on freely available software.
Collapse
Affiliation(s)
| | - Carlo Baldassi
- Bocconi Institute for Data Science and Analytics, Bocconi University, Milan, Italy
- INFN, Sezione di Torino, Torino, Italy
| | - Andrea Pagnani
- Italian Institute for Genomic Medicine, Turin, Italy
- INFN, Sezione di Torino, Torino, Italy
- Dipartimento di Scienza Applicata e Tecnologia, Politecnico di Torino, Torino, Italy
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative-LCQB, Paris, France.
| |
Collapse
|
38
|
Sala D, Cerofolini L, Fragai M, Giachetti A, Luchinat C, Rosato A. A protocol to automatically calculate homo-oligomeric protein structures through the integration of evolutionary constraints and NMR ambiguous contacts. Comput Struct Biotechnol J 2019; 18:114-124. [PMID: 31969972 PMCID: PMC6961069 DOI: 10.1016/j.csbj.2019.12.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 11/20/2019] [Accepted: 12/06/2019] [Indexed: 12/15/2022] Open
Abstract
Protein assemblies are involved in many important biological processes. Solid-state NMR (SSNMR) spectroscopy is a technique suitable for the structural characterization of samples with high molecular weight and thus can be applied to such assemblies. A significant bottleneck in terms of both effort and time required is the manual identification of unambiguous intermolecular contacts. This is particularly challenging for homo-oligomeric complexes, where simple uniform labeling may not be effective. We tackled this challenge by exploiting coevolution analysis to extract information on homo-oligomeric interfaces from NMR-derived ambiguous contacts. After removing the evolutionary couplings (ECs) that are already satisfied by the 3D structure of the monomer, the predicted ECs are matched with the automatically generated list of experimental contacts. This approach provides a selection of potential interface residues that is used directly in monomer-monomer docking calculations. We validated the protocol on tetrameric L-asparaginase II and dimeric Sod1.
Collapse
Affiliation(s)
- Davide Sala
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Linda Cerofolini
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Marco Fragai
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| | - Andrea Giachetti
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Claudio Luchinat
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| | - Antonio Rosato
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| |
Collapse
|
39
|
Croce G, Gueudré T, Ruiz Cuevas MV, Keidel V, Figliuzzi M, Szurmant H, Weigt M. A multi-scale coevolutionary approach to predict interactions between protein domains. PLoS Comput Biol 2019; 15:e1006891. [PMID: 31634362 PMCID: PMC6822775 DOI: 10.1371/journal.pcbi.1006891] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Revised: 10/31/2019] [Accepted: 09/27/2019] [Indexed: 11/18/2022] Open
Abstract
Interacting proteins and protein domains coevolve on multiple scales, from their correlated presence across species, to correlations in amino-acid usage. Genomic databases provide rapidly growing data for variability in genomic protein content and in protein sequences, calling for computational predictions of unknown interactions. We first introduce the concept of direct phyletic couplings, based on global statistical models of phylogenetic profiles. They strongly increase the accuracy of predicting pairs of related protein domains beyond simpler correlation-based approaches like phylogenetic profiling (80% vs. 30-50% positives out of the 1000 highest-scoring pairs). Combined with the direct coupling analysis of inter-protein residue-residue coevolution, we provide multi-scale evidence for direct but unknown interaction between protein families. An in-depth discussion shows these to be biologically sensible and directly experimentally testable. Negative phyletic couplings highlight alternative solutions for the same functionality, including documented cases of convergent evolution. Thereby our work proves the strong potential of global statistical modeling approaches to genome-wide coevolutionary analysis, far beyond the established use for individual protein complexes and domain-domain interactions.
Collapse
Affiliation(s)
- Giancarlo Croce
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| | | | - Maria Virginia Ruiz Cuevas
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| | - Victoria Keidel
- Department of Basic Medical Sciences, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, Pomona CA, United States of America
| | - Matteo Figliuzzi
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| | - Hendrik Szurmant
- Department of Basic Medical Sciences, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, Pomona CA, United States of America
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| |
Collapse
|
40
|
Shimagaki K, Weigt M. Selection of sequence motifs and generative Hopfield-Potts models for protein families. Phys Rev E 2019; 100:032128. [PMID: 31639992 DOI: 10.1103/physreve.100.032128] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Indexed: 06/10/2023]
Abstract
Statistical models for families of evolutionary related proteins have recently gained interest: In particular, pairwise Potts models as those inferred by the direct-coupling analysis have been able to extract information about the three-dimensional structure of folded proteins and about the effect of amino acid substitutions in proteins. These models are typically requested to reproduce the one- and two-point statistics of the amino acid usage in a protein family, i.e., to capture the so-called residue conservation and covariation statistics of proteins of common evolutionary origin. Pairwise Potts models are the maximum-entropy models achieving this. Although being successful, these models depend on huge numbers of ad hoc introduced parameters, which have to be estimated from finite amounts of data and whose biophysical interpretation remains unclear. Here, we propose an approach to parameter reduction, which is based on selecting collective sequence motifs. It naturally leads to the formulation of statistical sequence models in terms of Hopfield-Potts models. These models can be accurately inferred using a mapping to restricted Boltzmann machines and persistent contrastive divergence. We show that, when applied to protein data, even 20-40 patterns are sufficient to obtain statistically close-to-generative models. The Hopfield patterns form interpretable sequence motifs and may be used to clusterize amino acid sequences into functional subfamilies. However, the distributed collective nature of these motifs intrinsically limits the ability of Hopfield-Potts models in predicting contact maps, showing the necessity of developing models going beyond the Hopfield-Potts models discussed here.
Collapse
Affiliation(s)
- Kai Shimagaki
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative-LCQB, Paris, France
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative-LCQB, Paris, France
| |
Collapse
|
41
|
Marchi J, Galpern EA, Espada R, Ferreiro DU, Walczak AM, Mora T. Size and structure of the sequence space of repeat proteins. PLoS Comput Biol 2019; 15:e1007282. [PMID: 31415557 PMCID: PMC6733475 DOI: 10.1371/journal.pcbi.1007282] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2019] [Revised: 09/09/2019] [Accepted: 07/24/2019] [Indexed: 11/18/2022] Open
Abstract
The coding space of protein sequences is shaped by evolutionary constraints set by requirements of function and stability. We show that the coding space of a given protein family—the total number of sequences in that family—can be estimated using models of maximum entropy trained on multiple sequence alignments of naturally occuring amino acid sequences. We analyzed and calculated the size of three abundant repeat proteins families, whose members are large proteins made of many repetitions of conserved portions of ∼30 amino acids. While amino acid conservation at each position of the alignment explains most of the reduction of diversity relative to completely random sequences, we found that correlations between amino acid usage at different positions significantly impact that diversity. We quantified the impact of different types of correlations, functional and evolutionary, on sequence diversity. Analysis of the detailed structure of the coding space of the families revealed a rugged landscape, with many local energy minima of varying sizes with a hierarchical structure, reminiscent of fustrated energy landscapes of spin glass in physics. This clustered structure indicates a multiplicity of subtypes within each family, and suggests new strategies for protein design. Natural protein molecules are only a small subset of the possible strings of amino acids. This naturally calls the question of how many protein sequences theoretically exist that are functional, and how many have already been explored by nature. To help answer this question, we developed a statistical method to calculate the total potential number of protein sequences of a given family, focusing on three families of repeat proteins, which play important roles in a variety of cellular processes. The number of sequences that we compute is limited by functional interactions between the residues of the protein, as well as its evolutionary history. Applying techniques from the physics of disordered systems, we show that the space of sequences has a rugged structure, which could hinder their evolution. Individual proteins can be organised into distinct clusters corresponding to basins of attraction of the landscape, suggesting the existence of subfamilies within each family.
Collapse
Affiliation(s)
- Jacopo Marchi
- Laboratoire de physique de l’École normale supérieure (PSL University), CNRS, Sorbonne Université, and Université de Paris, 75005 Paris, France
| | - Ezequiel A. Galpern
- Protein Physiology Lab, Universidad de Buenos Aires, Facultad de Ciencias Exactas y Naturales, Departamento de Química Biológica, Buenos Aires, Argentina
- CONICET - Universidad de Buenos Aires, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Buenos Aires, Argentina
| | - Rocio Espada
- Laboratoire Gulliver, Ecole supérieure de physique et chimie industrielles (PSL University) and CNRS, 75005, Paris, France
| | - Diego U. Ferreiro
- Protein Physiology Lab, Universidad de Buenos Aires, Facultad de Ciencias Exactas y Naturales, Departamento de Química Biológica, Buenos Aires, Argentina
- CONICET - Universidad de Buenos Aires, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Buenos Aires, Argentina
| | - Aleksandra M. Walczak
- Laboratoire de physique de l’École normale supérieure (PSL University), CNRS, Sorbonne Université, and Université de Paris, 75005 Paris, France
- * E-mail: (AMW); (TM)
| | - Thierry Mora
- Laboratoire de physique de l’École normale supérieure (PSL University), CNRS, Sorbonne Université, and Université de Paris, 75005 Paris, France
- * E-mail: (AMW); (TM)
| |
Collapse
|
42
|
Szurmant H. Evolutionary couplings of amino acid residues reveal structure and function of bacterial signaling proteins. Mol Microbiol 2019; 112:432-437. [PMID: 31102561 DOI: 10.1111/mmi.14282] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/15/2019] [Indexed: 12/12/2022]
Abstract
The genomic era along with major advances in high-throughput sequencing technology has led to a rapid expansion of the genomic and consequently the protein sequence space. Bacterial extracytoplasmic function sigma factors have emerged as an important group of signaling proteins in bacteria involved in many regulatory decisions, most notably the adaptation to cell envelope stress. Their wide prevalence and amplification among bacterial genomes has led to sub-group classification and the realization of diverse signaling mechanisms. Mathematical frameworks have been developed to utilize extensive protein sequence alignments to extract co-evolutionary signals of interaction. This has proven useful in a number of different biological fields, including de novo structure prediction, protein-protein partner identification and the elucidation of alternative protein conformations for signal proteins, to name a few. The mathematical tools, commonly referred to under the name 'Direct Coupling Analysis' have now been applied to deduce molecular mechanisms of activation for sub-groups of extracytoplasmic sigma factors adding to previous successes on bacterial two-component signaling proteins. The amplification of signal transduction protein genes in bacterial genomes made them the first to be amenable to this approach but the sequences are available now to aid the molecular microbiologist, no matter their protein pathway of interest.
Collapse
Affiliation(s)
- Hendrik Szurmant
- Basic Medical Science, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, Pomona, CA, USA
| |
Collapse
|
43
|
Buschiazzo A, Trajtenberg F. Two-Component Sensing and Regulation: How Do Histidine Kinases Talk with Response Regulators at the Molecular Level? Annu Rev Microbiol 2019; 73:507-528. [PMID: 31226026 DOI: 10.1146/annurev-micro-091018-054627] [Citation(s) in RCA: 77] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Perceiving environmental and internal information and reacting in adaptive ways are essential attributes of living organisms. Two-component systems are relevant protein machineries from prokaryotes and lower eukaryotes that enable cells to sense and process signals. Implicating sensory histidine kinases and response regulator proteins, both components take advantage of protein phosphorylation and flexibility to switch conformations in a signal-dependent way. Dozens of two-component systems act simultaneously in any given cell, challenging our understanding about the means that ensure proper connectivity. This review dives into the molecular level, attempting to summarize an emerging picture of how histidine kinases and cognate response regulators achieve required efficiency, specificity, and directionality of signaling pathways, properties that rely on protein:protein interactions. α helices that carry information through long distances, the fine combination of loose and specific kinase/regulator interactions, and malleable reaction centers built when the two components meet emerge as relevant universal principles.
Collapse
Affiliation(s)
- Alejandro Buschiazzo
- Laboratory of Molecular and Structural Microbiology, Institut Pasteur de Montevideo, Montevideo 11400, Uruguay; , .,Integrative Microbiology of Zoonotic Agents, Department of Microbiology, Institut Pasteur, Paris 75015, France
| | - Felipe Trajtenberg
- Laboratory of Molecular and Structural Microbiology, Institut Pasteur de Montevideo, Montevideo 11400, Uruguay; ,
| |
Collapse
|
44
|
Pucci F, Schug A. Shedding light on the dark matter of the biomolecular structural universe: Progress in RNA 3D structure prediction. Methods 2019; 162-163:68-73. [DOI: 10.1016/j.ymeth.2019.04.012] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2018] [Revised: 04/12/2019] [Accepted: 04/22/2019] [Indexed: 11/25/2022] Open
|
45
|
Horovitz A, Fleisher RC, Mondal T. Double-mutant cycles: new directions and applications. Curr Opin Struct Biol 2019; 58:10-17. [PMID: 31029859 DOI: 10.1016/j.sbi.2019.03.025] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2019] [Accepted: 03/20/2019] [Indexed: 11/17/2022]
Abstract
Double-mutant cycle (DMC) analysis is a powerful approach for detecting and quantifying the energetics of both direct and long-range interactions in proteins and other chemical systems. It can also be used to unravel higher-order interactions (e.g. three-body effects) that lead to cooperativity in protein folding and function. In this review, we describe new applications of DMC analysis based on advances in native mass spectrometry and high-throughput methods such as next generation sequencing and protein complementation assays. These developments have facilitated carrying out high-throughput DMC analysis, which can be used to characterize increasingly higher-order interactions and very large interaction networks in proteins. Such studies have provided insights into the extent of cooperativity (epistasis) in protein structures. High-throughput DMC studies have also been used to validate correlated mutation analysis and can provide restraints for protein docking.
Collapse
Affiliation(s)
- Amnon Horovitz
- Department of Structural Biology, Weizmann Institute of Science, Rehovot 7610001, Israel.
| | - Rachel C Fleisher
- Department of Structural Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Tridib Mondal
- Department of Structural Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| |
Collapse
|
46
|
The role of coevolutionary signatures in protein interaction dynamics, complex inference, molecular recognition, and mutational landscapes. Curr Opin Struct Biol 2019; 56:179-186. [PMID: 31029927 DOI: 10.1016/j.sbi.2019.03.024] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2019] [Revised: 03/18/2019] [Accepted: 03/19/2019] [Indexed: 11/22/2022]
Abstract
Evolution imposes constraints at the interface of interacting biomolecules in order to preserve function or maintain fitness. This pressure may have a direct effect on the sequence composition of interacting biomolecules. As a result, statistical patterns of amino acid or nucleotide covariance that encode for physical and functional interactions are observed in sequences of extant organisms. In recent years, global pairwise models of amino acid and nucleotide coevolution from multiple sequence alignments have been developed and utilized to study molecular interactions in structural biology. In proteins, for which the energy landscape is funneled and minimally frustrated, a direct connection between the physical and sequence space landscapes can be established. Estimating coevolutionary information from sequences of interacting molecules has a broad impact in molecular biology. Applications include the accurate determination of 3D structures of molecular complexes, inference of protein interaction partners, models of protein-protein interaction specificity, the elucidation, and design of protein-nucleic acid recognition as well as the discovery of genome-wide epistatic effects. The current state of the art of coevolutionary analysis includes biomedical applications ranging from mutational landscapes and drug-design to vaccine development.
Collapse
|
47
|
Zhong Z, Liu CC. Probing pathways of adaptation with continuous evolution. CURRENT OPINION IN SYSTEMS BIOLOGY 2019; 14:18-24. [PMID: 31608311 PMCID: PMC6788780 DOI: 10.1016/j.coisb.2019.02.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Affiliation(s)
- Ziwei Zhong
- Department of Biomedical Engineering, University of California, Irvine, Irvine, CA 92697, USA
| | - Chang C. Liu
- Department of Biomedical Engineering, University of California, Irvine, Irvine, CA 92697, USA
- Department of Chemistry, University of California, Irvine, Irvine, CA 92697, USA
- Department of Molecular Biology and Biochemistry, University of California, Irvine, Irvine, CA 92697, USA
- Lead Contact
| |
Collapse
|
48
|
Liang Z, Verkhivker GM, Hu G. Integration of network models and evolutionary analysis into high-throughput modeling of protein dynamics and allosteric regulation: theory, tools and applications. Brief Bioinform 2019; 21:815-835. [DOI: 10.1093/bib/bbz029] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Revised: 02/04/2019] [Accepted: 02/21/2019] [Indexed: 12/24/2022] Open
Abstract
Abstract
Proteins are dynamical entities that undergo a plethora of conformational changes, accomplishing their biological functions. Molecular dynamics simulation and normal mode analysis methods have become the gold standard for studying protein dynamics, analyzing molecular mechanism and allosteric regulation of biological systems. The enormous amount of the ensemble-based experimental and computational data on protein structure and dynamics has presented a major challenge for the high-throughput modeling of protein regulation and molecular mechanisms. In parallel, bioinformatics and systems biology approaches including genomic analysis, coevolution and network-based modeling have provided an array of powerful tools that complemented and enriched biophysical insights by enabling high-throughput analysis of biological data and dissection of global molecular signatures underlying mechanisms of protein function and interactions in the cellular environment. These developments have provided a powerful interdisciplinary framework for quantifying the relationships between protein dynamics and allosteric regulation, allowing for high-throughput modeling and engineering of molecular mechanisms. Here, we review fundamental advances in protein dynamics, network theory and coevolutionary analysis that have provided foundation for rapidly growing computational tools for modeling of allosteric regulation. We discuss recent developments in these interdisciplinary areas bridging computational biophysics and network biology, focusing on promising applications in allosteric regulations, including the investigation of allosteric communication pathways, protein–DNA/RNA interactions and disease mutations in genomic medicine. We conclude by formulating and discussing future directions and potential challenges facing quantitative computational investigations of allosteric regulatory mechanisms in protein systems.
Collapse
Affiliation(s)
- Zhongjie Liang
- School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Gennady M Verkhivker
- Department of Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA, USA
- Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA, USA
| | - Guang Hu
- School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| |
Collapse
|
49
|
Simultaneous Bayesian inference of phylogeny and molecular coevolution. Proc Natl Acad Sci U S A 2019; 116:5027-5036. [PMID: 30808804 DOI: 10.1073/pnas.1813836116] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Patterns of molecular coevolution can reveal structural and functional constraints within or among organic molecules. These patterns are better understood when considering the underlying evolutionary process, which enables us to disentangle the signal of the dependent evolution of sites (coevolution) from the effects of shared ancestry of genes. Conversely, disregarding the dependent evolution of sites when studying the history of genes negatively impacts the accuracy of the inferred phylogenetic trees. Although molecular coevolution and phylogenetic history are interdependent, analyses of the two processes are conducted separately, a choice dictated by computational convenience, but at the expense of accuracy. We present a Bayesian method and associated software to infer how many and which sites of an alignment evolve according to an independent or a pairwise dependent evolutionary process, and to simultaneously estimate the phylogenetic relationships among sequences. We validate our method on synthetic datasets and challenge our predictions of coevolution on the 16S rRNA molecule by comparing them with its known molecular structure. Finally, we assess the accuracy of phylogenetic trees inferred under the assumption of independence among sites using synthetic datasets, the 16S rRNA molecule and 10 additional alignments of protein-coding genes of eukaryotes. Our results demonstrate that inferring phylogenetic trees while accounting for dependent site evolution significantly impacts the estimates of the phylogeny and the evolutionary process.
Collapse
|
50
|
Figliuzzi M, Barrat-Charlaix P, Weigt M. How Pairwise Coevolutionary Models Capture the Collective Residue Variability in Proteins? Mol Biol Evol 2019; 35:1018-1027. [PMID: 29351669 DOI: 10.1093/molbev/msy007] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Global coevolutionary models of homologous protein families, as constructed by direct coupling analysis (DCA), have recently gained popularity in particular due to their capacity to accurately predict residue-residue contacts from sequence information alone, and thereby to facilitate tertiary and quaternary protein structure prediction. More recently, they have also been used to predict fitness effects of amino-acid substitutions in proteins, and to predict evolutionary conserved protein-protein interactions. These models are based on two currently unjustified hypotheses: 1) correlations in the amino-acid usage of different positions are resulting collectively from networks of direct couplings; and 2) pairwise couplings are sufficient to capture the amino-acid variability. Here, we propose a highly precise inference scheme based on Boltzmann-machine learning, which allows us to systematically address these hypotheses. We show how correlations are built up in a highly collective way by a large number of coupling paths, which are based on the proteins three-dimensional structure. We further find that pairwise coevolutionary models capture the collective residue variability across homologous proteins even for quantities which are not imposed by the inference procedure, like three-residue correlations, the clustered structure of protein families in sequence space or the sequence distances between homologs. These findings strongly suggest that pairwise coevolutionary models are actually sufficient to accurately capture the residue variability in homologous protein families.
Collapse
Affiliation(s)
- Matteo Figliuzzi
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Computational and Quantitative Biology - UMR7238, 75005 Paris, France
| | - Pierre Barrat-Charlaix
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Computational and Quantitative Biology - UMR7238, 75005 Paris, France
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Computational and Quantitative Biology - UMR7238, 75005 Paris, France
| |
Collapse
|