1
|
Nerín-Fonz F, Caprai C, Morales-Pastor A, Lopez-Balastegui M, Aranda-García D, Giorgino T, Selent J. AlloViz: A tool for the calculation and visualisation of protein allosteric communication networks. Comput Struct Biotechnol J 2024; 23:1938-1944. [PMID: 38736696 PMCID: PMC11087696 DOI: 10.1016/j.csbj.2024.04.047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 04/18/2024] [Accepted: 04/18/2024] [Indexed: 05/14/2024] Open
Abstract
Allostery, the presence of functional interactions between distant parts of proteins, is a critical concept in the field of biochemistry and molecular biology, particularly in the context of protein function and regulation. Understanding the principles of allosteric regulation is essential for advancing our knowledge of biology and developing new therapeutic strategies. This paper presents AlloViz, an open-source Python package designed to quantitatively determine, analyse, and visually represent allosteric communication networks on the basis of molecular dynamics (MD) simulation data. The software integrates well-known techniques for understanding allosteric properties simplifying the process of accessing, rationalising, and representing protein allostery and communication routes. It overcomes the inefficiency of having multiple methods with heterogeneous implementations and showcases the advantages of using MD simulations and multiple replicas to obtain statistically sound information on protein dynamics; it also enables the calculation of "consensus-like" scores aggregating methods that consider multiple structural aspects of allosteric networks. We demonstrate the features of AlloViz on two proteins: β-arrestin 1, a key player for regulating G protein-coupled receptor (GPCR) signalling, and the protein tyrosine phosphatase 1B, an important pharmaceutical target for allosteric inhibitors. The software includes comprehensive documentation and examples, tutorials, and a user-friendly graphical interface.
Collapse
Affiliation(s)
- Francho Nerín-Fonz
- Hospital del Mar Research Institute & Universitat Pompeu Fabra, C/ Dr. Aiguader 88, Barcelona, 08003, Spain
| | - Camilla Caprai
- Department of Biosciences, Università degli Studi di Milano, Via Celoria 26, Milan, 20133, Italy
- National Research Council of Italy, Biophysics Institute (CNR-IBF), Via Celoria 26, Milan, 20133, Italy
| | - Adrián Morales-Pastor
- Hospital del Mar Research Institute & Universitat Pompeu Fabra, C/ Dr. Aiguader 88, Barcelona, 08003, Spain
| | - Marta Lopez-Balastegui
- Hospital del Mar Research Institute & Universitat Pompeu Fabra, C/ Dr. Aiguader 88, Barcelona, 08003, Spain
| | - David Aranda-García
- Hospital del Mar Research Institute & Universitat Pompeu Fabra, C/ Dr. Aiguader 88, Barcelona, 08003, Spain
| | - Toni Giorgino
- National Research Council of Italy, Biophysics Institute (CNR-IBF), Via Celoria 26, Milan, 20133, Italy
| | - Jana Selent
- Hospital del Mar Research Institute & Universitat Pompeu Fabra, C/ Dr. Aiguader 88, Barcelona, 08003, Spain
| |
Collapse
|
2
|
Levine H, Tu Y. Machine learning meets physics: A two-way street. Proc Natl Acad Sci U S A 2024; 121:e2403580121. [PMID: 38913898 DOI: 10.1073/pnas.2403580121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/26/2024] Open
Affiliation(s)
- Herbert Levine
- Center for Theoretical Biological Physics, Northeastern University, Boston, MA 02115
| | - Yuhai Tu
- IBM T. J. Watson Research Center, Yorktown Heights, New York, NY 10598
| |
Collapse
|
3
|
Basu S, Subedi U, Tonelli M, Afshinpour M, Tiwari N, Fuentes EJ, Chakravarty S. Assessing the functional roles of coevolving PHD finger residues. Protein Sci 2024; 33:e5065. [PMID: 38923615 PMCID: PMC11201814 DOI: 10.1002/pro.5065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 04/21/2024] [Accepted: 05/16/2024] [Indexed: 06/28/2024]
Abstract
Although in silico folding based on coevolving residue constraints in the deep-learning era has transformed protein structure prediction, the contributions of coevolving residues to protein folding, stability, and other functions in physical contexts remain to be clarified and experimentally validated. Herein, the PHD finger module, a well-known histone reader with distinct subtypes containing subtype-specific coevolving residues, was used as a model to experimentally assess the contributions of coevolving residues and to clarify their specific roles. The results of the assessment, including proteolysis and thermal unfolding of wildtype and mutant proteins, suggested that coevolving residues have varying contributions, despite their large in silico constraints. Residue positions with large constraints were found to contribute to stability in one subtype but not others. Computational sequence design and generative model-based energy estimates of individual structures were also implemented to complement the experimental assessment. Sequence design and energy estimates distinguish coevolving residues that contribute to folding from those that do not. The results of proteolytic analysis of mutations at positions contributing to folding were consistent with those suggested by sequence design and energy estimation. Thus, we report a comprehensive assessment of the contributions of coevolving residues, as well as a strategy based on a combination of approaches that should enable detailed understanding of the residue contributions in other large protein families.
Collapse
Affiliation(s)
- Shraddha Basu
- Department of Chemistry & BiochemistrySouth Dakota State UniversityBrookingsSouth DakotaUSA
| | - Ujwal Subedi
- Department of Chemistry & BiochemistrySouth Dakota State UniversityBrookingsSouth DakotaUSA
| | - Marco Tonelli
- National Magnetic Resonance Facility at Madison (NMRFAM), University of Wisconsin‐MadisonMadisonWisconsinUSA
| | - Maral Afshinpour
- Department of Chemistry & BiochemistrySouth Dakota State UniversityBrookingsSouth DakotaUSA
| | - Nitija Tiwari
- Department of Biochemistry & Molecular BiologyUniversity of IowaIowa CityIowaUSA
| | - Ernesto J. Fuentes
- Department of Biochemistry & Molecular BiologyUniversity of IowaIowa CityIowaUSA
| | - Suvobrata Chakravarty
- Department of Chemistry & BiochemistrySouth Dakota State UniversityBrookingsSouth DakotaUSA
| |
Collapse
|
4
|
Cocco S, Posani L, Monasson R. Functional effects of mutations in proteins can be predicted and interpreted by guided selection of sequence covariation information. Proc Natl Acad Sci U S A 2024; 121:e2312335121. [PMID: 38889151 PMCID: PMC11214004 DOI: 10.1073/pnas.2312335121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 04/21/2024] [Indexed: 06/20/2024] Open
Abstract
Predicting the effects of one or more mutations to the in vivo or in vitro properties of a wild-type protein is a major computational challenge, due to the presence of epistasis, that is, of interactions between amino acids in the sequence. We introduce a computationally efficient procedure to build minimal epistatic models to predict mutational effects by combining evolutionary (homologous sequence) and few mutational-scan data. Mutagenesis measurements guide the selection of links in a sparse graphical model, while the parameters on the nodes and the edges are inferred from sequence data. We show, on 10 mutational scans, that our pipeline exhibits performances comparable to state-of-the-art deep networks trained on many more data, while requiring much less parameters and being hence more interpretable. In particular, the identified interactions adapt to the wild-type protein and to the fitness or biochemical property experimentally measured, mostly focus on key functional sites, and are not necessarily related to structural contacts. Therefore, our method is able to extract information relevant for one mutational experiment from homologous sequence data reflecting the multitude of structural and functional constraints acting on proteins throughout evolution.
Collapse
Affiliation(s)
- Simona Cocco
- Laboratory of Physics of the Ecole Normale Supérieure, CNRS UMR8023 and Paris Sciences & Lettres (PSL) Research, Sorbonne Université, 75005Paris, France
| | - Lorenzo Posani
- Laboratory of Physics of the Ecole Normale Supérieure, CNRS UMR8023 and Paris Sciences & Lettres (PSL) Research, Sorbonne Université, 75005Paris, France
| | - Rémi Monasson
- Laboratory of Physics of the Ecole Normale Supérieure, CNRS UMR8023 and Paris Sciences & Lettres (PSL) Research, Sorbonne Université, 75005Paris, France
| |
Collapse
|
5
|
Calvanese F, Lambert CN, Nghe P, Zamponi F, Weigt M. Towards parsimonious generative modeling of RNA families. Nucleic Acids Res 2024; 52:5465-5477. [PMID: 38661206 PMCID: PMC11162787 DOI: 10.1093/nar/gkae289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Revised: 03/05/2024] [Accepted: 04/05/2024] [Indexed: 04/26/2024] Open
Abstract
Generative probabilistic models emerge as a new paradigm in data-driven, evolution-informed design of biomolecular sequences. This paper introduces a novel approach, called Edge Activation Direct Coupling Analysis (eaDCA), tailored to the characteristics of RNA sequences, with a strong emphasis on simplicity, efficiency, and interpretability. eaDCA explicitly constructs sparse coevolutionary models for RNA families, achieving performance levels comparable to more complex methods while utilizing a significantly lower number of parameters. Our approach demonstrates efficiency in generating artificial RNA sequences that closely resemble their natural counterparts in both statistical analyses and SHAPE-MaP experiments, and in predicting the effect of mutations. Notably, eaDCA provides a unique feature: estimating the number of potential functional sequences within a given RNA family. For example, in the case of cyclic di-AMP riboswitches (RF00379), our analysis suggests the existence of approximately 1039 functional nucleotide sequences. While huge compared to the known <4000 natural sequences, this number represents only a tiny fraction of the vast pool of nearly 1082 possible nucleotide sequences of the same length (136 nucleotides). These results underscore the promise of sparse and interpretable generative models, such as eaDCA, in enhancing our understanding of the expansive RNA sequence space.
Collapse
Affiliation(s)
- Francesco Calvanese
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative – LCQB, Paris, France
- Laboratoire de Biophysique et Evolution, UMR CNRS-ESPCI 8231 Chimie Biologie Innovation, PSL University, Paris, France
| | - Camille N Lambert
- Laboratoire de Biophysique et Evolution, UMR CNRS-ESPCI 8231 Chimie Biologie Innovation, PSL University, Paris, France
| | - Philippe Nghe
- Laboratoire de Biophysique et Evolution, UMR CNRS-ESPCI 8231 Chimie Biologie Innovation, PSL University, Paris, France
| | - Francesco Zamponi
- Dipartimento di Fisica, Sapienza Università di Roma, Rome, Italy
- Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, Paris, France
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative – LCQB, Paris, France
| |
Collapse
|
6
|
Bibik P, Alibai S, Pandini A, Dantu SC. PyCoM: a python library for large-scale analysis of residue-residue coevolution data. Bioinformatics 2024; 40:btae166. [PMID: 38532297 PMCID: PMC11009027 DOI: 10.1093/bioinformatics/btae166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 02/02/2024] [Accepted: 03/25/2024] [Indexed: 03/28/2024] Open
Abstract
MOTIVATION Computational methods to detect correlated amino acid positions in proteins have become a valuable tool to predict intra- and inter-residue protein contacts, protein structures, and effects of mutation on protein stability and function. While there are many tools and webservers to compute coevolution scoring matrices, there is no central repository of alignments and coevolution matrices for large-scale studies and pattern detection leveraging on biological and structural annotations already available in UniProt. RESULTS We present a Python library, PyCoM, which enables users to query and analyze coevolution matrices and sequence alignments of 457 622 proteins, selected from UniProtKB/Swiss-Prot database (length ≤ 500 residues), from a precompiled coevolution matrix database (PyCoMdb). PyCoM facilitates the development of statistical analyses of residue coevolution patterns using filters on biological and structural annotations from UniProtKB/Swiss-Prot, with simple access to PyCoMdb for both novice and advanced users, supporting Jupyter Notebooks, Python scripts, and a web API access. The resource is open source and will help in generating data-driven computational models and methods to study and understand protein structures, stability, function, and design. AVAILABILITY AND IMPLEMENTATION PyCoM code is freely available from https://github.com/scdantu/pycom and PyCoMdb and the Jupyter Notebook tutorials are freely available from https://pycom.brunel.ac.uk.
Collapse
Affiliation(s)
- Philipp Bibik
- Department of Computer Science, Brunel University London, Uxbridge UB8 3PH, United Kingdom
| | - Sabriyeh Alibai
- Department of Computer Science, Brunel University London, Uxbridge UB8 3PH, United Kingdom
| | - Alessandro Pandini
- Department of Computer Science, Brunel University London, Uxbridge UB8 3PH, United Kingdom
| | - Sarath Chandra Dantu
- Department of Computer Science, Brunel University London, Uxbridge UB8 3PH, United Kingdom
| |
Collapse
|
7
|
Judge A, Sankaran B, Hu L, Palaniappan M, Birgy A, Prasad BVV, Palzkill T. Network of epistatic interactions in an enzyme active site revealed by large-scale deep mutational scanning. Proc Natl Acad Sci U S A 2024; 121:e2313513121. [PMID: 38483989 PMCID: PMC10962969 DOI: 10.1073/pnas.2313513121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 02/14/2024] [Indexed: 03/19/2024] Open
Abstract
Cooperative interactions between amino acids are critical for protein function. A genetic reflection of cooperativity is epistasis, which is when a change in the amino acid at one position changes the sequence requirements at another position. To assess epistasis within an enzyme active site, we utilized CTX-M β-lactamase as a model system. CTX-M hydrolyzes β-lactam antibiotics to provide antibiotic resistance, allowing a simple functional selection for rapid sorting of modified enzymes. We created all pairwise mutations across 17 active site positions in the β-lactamase enzyme and quantitated the function of variants against two β-lactam antibiotics using next-generation sequencing. Context-dependent sequence requirements were determined by comparing the antibiotic resistance function of double mutations across the CTX-M active site to their predicted function based on the constituent single mutations, revealing both positive epistasis (synergistic interactions) and negative epistasis (antagonistic interactions) between amino acid substitutions. The resulting trends demonstrate that positive epistasis is present throughout the active site, that epistasis between residues is mediated through substrate interactions, and that residues more tolerant to substitutions serve as generic compensators which are responsible for many cases of positive epistasis. Additionally, we show that a key catalytic residue (Glu166) is amenable to compensatory mutations, and we characterize one such double mutant (E166Y/N170G) that acts by an altered catalytic mechanism. These findings shed light on the unique biochemical factors that drive epistasis within an enzyme active site and will inform enzyme engineering efforts by bridging the gap between amino acid sequence and catalytic function.
Collapse
Affiliation(s)
- Allison Judge
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
| | - Banumathi Sankaran
- Department of Molecular Biophysics and Integrated Bioimaging, Berkeley Center for Structural Biology Lawrence Berkeley National Laboratory, Berkeley, CA94720
| | - Liya Hu
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
| | - Murugesan Palaniappan
- Department of Pathology and Immunology, Center for Drug Discovery, Baylor College of Medicine, Houston, TX77030
| | - André Birgy
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
- Infections, Antimicrobials, Modelling, Evolution, UMR 1137, French Insitute for Medical Research (INSERM), Faculty of Health, Université Paris Cité, Paris75006, France
| | - B. V. Venkataram Prasad
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
| | - Timothy Palzkill
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
| |
Collapse
|
8
|
Wang X, Li A, Li X, Cui H. Empowering Protein Engineering through Recombination of Beneficial Substitutions. Chemistry 2024; 30:e202303889. [PMID: 38288640 DOI: 10.1002/chem.202303889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Indexed: 02/24/2024]
Abstract
Directed evolution stands as a seminal technology for generating novel protein functionalities, a cornerstone in biocatalysis, metabolic engineering, and synthetic biology. Today, with the development of various mutagenesis methods and advanced analytical machines, the challenge of diversity generation and high-throughput screening platforms is largely solved, and one of the remaining challenges is: how to empower the potential of single beneficial substitutions with recombination to achieve the epistatic effect. This review overviews experimental and computer-assisted recombination methods in protein engineering campaigns. In addition, integrated and machine learning-guided strategies were highlighted to discuss how these recombination approaches contribute to generating the screening library with better diversity, coverage, and size. A decision tree was finally summarized to guide the further selection of proper recombination strategies in practice, which was beneficial for accelerating protein engineering.
Collapse
Affiliation(s)
- Xinyue Wang
- School of Food Science and Pharmaceutical Engineering, Nanjing Normal University, No. 2 Xuelin Road, Nanjing, 210097, China
| | - Anni Li
- School of Food Science and Pharmaceutical Engineering, Nanjing Normal University, No. 2 Xuelin Road, Nanjing, 210097, China
| | - Xiujuan Li
- School of Food Science and Pharmaceutical Engineering, Nanjing Normal University, No. 2 Xuelin Road, Nanjing, 210097, China
| | - Haiyang Cui
- School of Life Sciences, Nanjing Normal University, No. 2 Xuelin Road, Nanjing, 210097, China
| |
Collapse
|
9
|
Fang T, Szklarczyk D, Hachilif R, von Mering C. Enhancing coevolutionary signals in protein-protein interaction prediction through clade-wise alignment integration. Sci Rep 2024; 14:6009. [PMID: 38472223 PMCID: PMC10933411 DOI: 10.1038/s41598-024-55655-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 02/26/2024] [Indexed: 03/14/2024] Open
Abstract
Protein-protein interactions (PPIs) play essential roles in most biological processes. The binding interfaces between interacting proteins impose evolutionary constraints that have successfully been employed to predict PPIs from multiple sequence alignments (MSAs). To construct MSAs, critical choices have to be made: how to ensure the reliable identification of orthologs, and how to optimally balance the need for large alignments versus sufficient alignment quality. Here, we propose a divide-and-conquer strategy for MSA generation: instead of building a single, large alignment for each protein, multiple distinct alignments are constructed under distinct clades in the tree of life. Coevolutionary signals are searched separately within these clades, and are only subsequently integrated using machine learning techniques. We find that this strategy markedly improves overall prediction performance, concomitant with better alignment quality. Using the popular DCA algorithm to systematically search pairs of such alignments, a genome-wide all-against-all interaction scan in a bacterial genome is demonstrated. Given the recent successes of AlphaFold in predicting direct PPIs at atomic detail, a discover-and-refine approach is proposed: our method could provide a fast and accurate strategy for pre-screening the entire genome, submitting to AlphaFold only promising interaction candidates-thus reducing false positives as well as computation time.
Collapse
Affiliation(s)
- Tao Fang
- Department of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Damian Szklarczyk
- Department of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Radja Hachilif
- Department of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Christian von Mering
- Department of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland.
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland.
| |
Collapse
|
10
|
Bhadola P, Deo N. Exploring complexity of class-A Beta-lactamase family using physiochemical-based multiplex networks. Sci Rep 2023; 13:20626. [PMID: 37996629 PMCID: PMC10667273 DOI: 10.1038/s41598-023-48128-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 11/22/2023] [Indexed: 11/25/2023] Open
Abstract
The Beta-lactamase protein family is vital in countering Beta-lactam antibiotics, a widely used antimicrobial. To enhance our understanding of this family, we adopted a novel approach employing a multiplex network representation of its multiple sequence alignment. Each network layer, derived from the physiochemical properties of amino acids, unveils distinct insights into the intricate interactions among nodes, thereby enabling the identification of key motifs. Nodes with identical property signs tend to aggregate, providing evidence of the presence of consequential functional and evolutionary constraints shaping the Beta-lactamase family. We further investigate the distribution of evolutionary links across various layers. We observe that polarity manifests the highest number of unique links at lower thresholds, followed by hydrophobicity and polarizability, wherein hydrophobicity exerts dominance at higher thresholds. Further, the combinations of polarizability and volume, exhibit multiple simultaneous connections at all thresholds. The combination of hydrophobicity, polarizability, and volume uncovers shared links exclusive to these layers, implying substantial evolutionary impacts that may have functional or structural implications. By assessing the multi-degree of nodes, we unveil the hierarchical influence of properties at each position, identifying crucial properties responsible for the protein's functionality and providing valuable insights into potential targets for modulating enzymatic activity.
Collapse
Affiliation(s)
- Pradeep Bhadola
- Centre for Theoretical Physics & Natural Philosophy, Mahidol University, Nakhonsawan Campus, Phayuha Khiri, NakhonSawan, 60130, Thailand.
| | - Nivedita Deo
- Department of Physics and Astrophysics, University of Delhi, Delhi, 110007, India.
| |
Collapse
|
11
|
Li X, Chen B, Chen W, Pu Z, Qi X, Yang L, Wu J, Yu H. Customized multiple sequence alignment as an effective strategy to improve performance of Taq DNA polymerase. Appl Microbiol Biotechnol 2023; 107:6507-6525. [PMID: 37658164 DOI: 10.1007/s00253-023-12744-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 08/06/2023] [Accepted: 08/24/2023] [Indexed: 09/03/2023]
Abstract
Engineering Taq DNA polymerase (TaqPol) for improved activity, stability and sensitivity was critical for its wide applications. Multiple sequence alignment (MSA) has been widely used in engineering enzymes for improved properties. Here, we first designed TaqPol mutations based on MSA of 2756 sequences from both thermophilic and non-thermophilic organisms. Two double mutations were generated including a variant H676F/R677G showing a decrease in both activity and stability, and a variant Y686R/E687K showing an improved activity, but a decreased stability. Mutations targeted on coevolutionary residues of Arg677 and Tyr686 were then applied to rescue stability or activity loss of the double mutants, which achieved a partial success. Sequence analysis revealed that the two mutations are abundant in non-thermophilic sequences but not in thermophilic homologues. Then, a small-scale MSA containing sequences from only thermophilic organisms was applied to predict 13 single variants and two of them, E507Q and E734N showed a simultaneous increase in both stability and activity, even in sensitivity. A customized MSA was hence more effective in engineering a thermophilic enzyme and could be used in engineering other enzymes. Molecular dynamics simulations revealed the impact of mutations on the protein dynamics and interactions between TaqPol and substrates. KEY POINTS: • The pool of sequence for alignment is critical to engineering Taq DNA polymerase. • The variants with low properties can be rescued by mutations in coevolving network. • Improving binding with DNA can improve DNA polymerase stability and activity.
Collapse
Affiliation(s)
- Xinjia Li
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, 310027, Zhejiang, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou, 311200, Zhejiang, China
| | - Binbin Chen
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, 310027, Zhejiang, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou, 311200, Zhejiang, China
| | - Wanyi Chen
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, 310027, Zhejiang, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou, 311200, Zhejiang, China
| | - Zhongji Pu
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, 310027, Zhejiang, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou, 311200, Zhejiang, China
| | - Xin Qi
- Building No.4, Zhongguancun Dongsheng International Science Park, No. 1 North Yongtaizhuang Road, Haidian District, Beijing, 100192, China
| | - Lirong Yang
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, 310027, Zhejiang, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou, 311200, Zhejiang, China
| | - Jianping Wu
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, 310027, Zhejiang, China
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou, 311200, Zhejiang, China
| | - Haoran Yu
- Institute of Bioengineering, College of Chemical and Biological Engineering, Zhejiang University, Hangzhou, 310027, Zhejiang, China.
- ZJU-Hangzhou Global Scientific and Technological Innovation Centre, Hangzhou, 311200, Zhejiang, China.
| |
Collapse
|
12
|
Bastolla U, Abia D, Piette O. PC_ali: a tool for improved multiple alignments and evolutionary inference based on a hybrid protein sequence and structure similarity score. Bioinformatics 2023; 39:btad630. [PMID: 37847775 PMCID: PMC10628387 DOI: 10.1093/bioinformatics/btad630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 08/01/2023] [Accepted: 10/17/2023] [Indexed: 10/19/2023] Open
Abstract
MOTIVATION Evolutionary inference depends crucially on the quality of multiple sequence alignments (MSA), which is problematic for distantly related proteins. Since protein structure is more conserved than sequence, it seems natural to use structure alignments for distant homologs. However, structure alignments may not be suitable for inferring evolutionary relationships. RESULTS Here we examined four protein similarity measures that depend on sequence and structure (fraction of aligned residues, sequence identity, fraction of superimposed residues, and contact overlap), finding that they are intimately correlated but none of them provides a complete and unbiased picture of conservation in proteins. Therefore, we propose the new hybrid protein sequence and structure similarity score PC_sim based on their main principal component. The corresponding divergence measure PC_div shows the strongest correlation with divergences obtained from individual similarities, suggesting that it infers accurate evolutionary divergences. We developed the program PC_ali that constructs protein MSAs either de novo or modifying an input MSA, using a similarity matrix based on PC_sim. The program constructs a starting MSA based on the maximal cliques of the graph of these PAs and it refines it through progressive alignments along the tree reconstructed with PC_div. Compared with eight state-of-the-art multiple structure or sequence alignment tools, PC_ali achieves higher or equal aligned fraction and structural scores, sequence identity higher than structure aligners although lower than sequence aligners, highest score PC_sim, and highest similarity with the MSAs produced by other tools and with the reference MSA Balibase. AVAILABILITY AND IMPLEMENTATION https://github.com/ugobas/PC_ali.
Collapse
Affiliation(s)
- Ugo Bastolla
- Centro de Biologia Molecular “Severo Ochoa” (CBMSO), CSIC-UAM Cantoblanco, 28049 Madrid, Spain
| | - David Abia
- Bioinformatics Facility CBMSO, CSIC-UAM Cantoblanco, 28049 Madrid, Spain
| | - Oscar Piette
- Centro de Biologia Molecular “Severo Ochoa” (CBMSO), CSIC-UAM Cantoblanco, 28049 Madrid, Spain
| |
Collapse
|
13
|
Alderson TR, Pritišanac I, Kolarić Đ, Moses AM, Forman-Kay JD. Systematic identification of conditionally folded intrinsically disordered regions by AlphaFold2. Proc Natl Acad Sci U S A 2023; 120:e2304302120. [PMID: 37878721 PMCID: PMC10622901 DOI: 10.1073/pnas.2304302120] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 08/30/2023] [Indexed: 10/27/2023] Open
Abstract
The AlphaFold Protein Structure Database contains predicted structures for millions of proteins. For the majority of human proteins that contain intrinsically disordered regions (IDRs), which do not adopt a stable structure, it is generally assumed that these regions have low AlphaFold2 confidence scores that reflect low-confidence structural predictions. Here, we show that AlphaFold2 assigns confident structures to nearly 15% of human IDRs. By comparison to experimental NMR data for a subset of IDRs that are known to conditionally fold (i.e., upon binding or under other specific conditions), we find that AlphaFold2 often predicts the structure of the conditionally folded state. Based on databases of IDRs that are known to conditionally fold, we estimate that AlphaFold2 can identify conditionally folding IDRs at a precision as high as 88% at a 10% false positive rate, which is remarkable considering that conditionally folded IDR structures were minimally represented in its training data. We find that human disease mutations are nearly fivefold enriched in conditionally folded IDRs over IDRs in general and that up to 80% of IDRs in prokaryotes are predicted to conditionally fold, compared to less than 20% of eukaryotic IDRs. These results indicate that a large majority of IDRs in the proteomes of human and other eukaryotes function in the absence of conditional folding, but the regions that do acquire folds are more sensitive to mutations. We emphasize that the AlphaFold2 predictions do not reveal functionally relevant structural plasticity within IDRs and cannot offer realistic ensemble representations of conditionally folded IDRs.
Collapse
Affiliation(s)
- T. Reid Alderson
- Department of Biochemistry, University of Toronto, Toronto, ONM5S 1A8, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ONM5S 1A8, Canada
| | - Iva Pritišanac
- Department of Cell and Systems Biology, University of Toronto, Toronto, ONM5S 35G, Canada
- Molecular Medicine Program, The Hospital for Sick Children, Toronto, ONM5G 0A4, Canada
- Department of Molecular Biology and Biochemistry, Gottfried Schatz Research Center for Cell Signaling, Metabolism and Aging, Medical University of Graz, Graz8010, Austria
| | - Đesika Kolarić
- Department of Molecular Biology and Biochemistry, Gottfried Schatz Research Center for Cell Signaling, Metabolism and Aging, Medical University of Graz, Graz8010, Austria
| | - Alan M. Moses
- Department of Cell and Systems Biology, University of Toronto, Toronto, ONM5S 35G, Canada
| | - Julie D. Forman-Kay
- Department of Biochemistry, University of Toronto, Toronto, ONM5S 1A8, Canada
- Molecular Medicine Program, The Hospital for Sick Children, Toronto, ONM5G 0A4, Canada
| |
Collapse
|
14
|
Gu X, Li L, Li S, Shi W, Zhong X, Su Y, Wang T. Adaptive evolution and co-evolution of chloroplast genomes in Pteridaceae species occupying different habitats: overlapping residues are always highly mutated. BMC PLANT BIOLOGY 2023; 23:511. [PMID: 37880608 PMCID: PMC10598918 DOI: 10.1186/s12870-023-04523-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 10/13/2023] [Indexed: 10/27/2023]
Abstract
BACKGROUND The evolution of protein residues depends on the mutation rates of their encoding nucleotides, but it may also be affected by co-evolution with other residues. Chloroplasts function as environmental sensors, transforming fluctuating environmental signals into different physiological responses. We reasoned that habitat diversity may affect their rate and mode of evolution, which might be evidenced in the chloroplast genome. The Pteridaceae family of ferns occupy an unusually broad range of ecological niches, which provides an ideal system for analysis. RESULTS We conducted adaptive evolution and intra-molecular co-evolution analyses of Pteridaceae chloroplast DNAs (cpDNAs). The results indicate that the residues undergoing adaptive evolution and co-evolution were mostly independent, with only a few residues being simultaneously involved in both processes, and these overlapping residues tend to exhibit high mutations. Additionally, our data showed that Pteridaceae chloroplast genes are under purifying selection. Regardless of whether we grouped species by lineage (which corresponded with ecological niches), we determined that positively selected residues mainly target photosynthetic genes. CONCLUSIONS Our work provides evidence for the adaptive evolution of Pteridaceae cpDNAs, especially photosynthetic genes, to different habitats and sheds light on the adaptive evolution and co-evolution of proteins.
Collapse
Affiliation(s)
- Xiaolin Gu
- College of Life Sciences, South China Agricultural University, Guangzhou, 510642, China
| | - Lingling Li
- College of Life Sciences, South China Agricultural University, Guangzhou, 510642, China
| | - Sicong Li
- College of Natural Resources and Environment, South China Agricultural University, Guangzhou, 510642, China
| | - Wanxin Shi
- College of Life Sciences, South China Agricultural University, Guangzhou, 510642, China
| | - Xiaona Zhong
- College of Life Sciences, South China Agricultural University, Guangzhou, 510642, China
| | - Yingjuan Su
- School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, China.
- Research Institute of Sun Yat-sen University in Shenzhen, Shenzhen, 518057, China.
| | - Ting Wang
- College of Life Sciences, South China Agricultural University, Guangzhou, 510642, China.
| |
Collapse
|
15
|
Marshall LR, Bhattacharya S, Korendovych IV. Fishing for Catalysis: Experimental Approaches to Narrowing Search Space in Directed Evolution of Enzymes. JACS AU 2023; 3:2402-2412. [PMID: 37772192 PMCID: PMC10523367 DOI: 10.1021/jacsau.3c00315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 08/07/2023] [Accepted: 08/08/2023] [Indexed: 09/30/2023]
Abstract
Directed evolution has transformed protein engineering offering a path to rapid improvement of protein properties. Yet, in practice it is limited by the hyper-astronomic protein sequence search space, and approaches to identify mutagenic hot spots, i.e., locations where mutations are most likely to have a productive impact, are needed. In this perspective, we categorize and discuss recent progress in the experimental approaches (broadly defined as structural, bioinformatic, and dynamic) to hot spot identification. Recent successes in harnessing protein dynamics and machine learning approaches provide new opportunities for the field and will undoubtedly help directed evolution reach its full potential.
Collapse
Affiliation(s)
- Liam R. Marshall
- Department of Chemistry, Syracuse
University, 111 College Place, Syracuse, New York 13224, United States
| | - Sagar Bhattacharya
- Department of Chemistry, Syracuse
University, 111 College Place, Syracuse, New York 13224, United States
| | - Ivan V. Korendovych
- Department of Chemistry, Syracuse
University, 111 College Place, Syracuse, New York 13224, United States
| |
Collapse
|
16
|
Yang A, Jude KM, Lai B, Minot M, Kocyla AM, Glassman CR, Nishimiya D, Kim YS, Reddy ST, Khan AA, Garcia KC. Deploying synthetic coevolution and machine learning to engineer protein-protein interactions. Science 2023; 381:eadh1720. [PMID: 37499032 PMCID: PMC10403280 DOI: 10.1126/science.adh1720] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 06/16/2023] [Indexed: 07/29/2023]
Abstract
Fine-tuning of protein-protein interactions occurs naturally through coevolution, but this process is difficult to recapitulate in the laboratory. We describe a platform for synthetic protein-protein coevolution that can isolate matched pairs of interacting muteins from complex libraries. This large dataset of coevolved complexes drove a systems-level analysis of molecular recognition between Z domain-affibody pairs spanning a wide range of structures, affinities, cross-reactivities, and orthogonalities, and captured a broad spectrum of coevolutionary networks. Furthermore, we harnessed pretrained protein language models to expand, in silico, the amino acid diversity of our coevolution screen, predicting remodeled interfaces beyond the reach of the experimental library. The integration of these approaches provides a means of simulating protein coevolution and generating protein complexes with diverse molecular recognition properties for biotechnology and synthetic biology.
Collapse
Affiliation(s)
- Aerin Yang
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Kevin M. Jude
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Ben Lai
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
| | - Mason Minot
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
| | - Anna M. Kocyla
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Caleb R. Glassman
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Daisuke Nishimiya
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Yoon Seok Kim
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Sai T. Reddy
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
| | - Aly A. Khan
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
- Departments of Pathology, and Family Medicine, University of Chicago, Chicago, IL 60637, USA
| | - K. Christopher Garcia
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|
17
|
Wolf K, Kosinski J, Gibson TJ, Wesch N, Dötsch V, Genuardi M, Cordisco EL, Zeuzem S, Brieger A, Plotz G. A conserved motif in the disordered linker of human MLH1 is vital for DNA mismatch repair and its function is diminished by a cancer family mutation. Nucleic Acids Res 2023; 51:6307-6320. [PMID: 37224528 PMCID: PMC10325900 DOI: 10.1093/nar/gkad418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 04/26/2023] [Accepted: 05/17/2023] [Indexed: 05/26/2023] Open
Abstract
DNA mismatch repair (MMR) is essential for correction of DNA replication errors. Germline mutations of the human MMR gene MLH1 are the major cause of Lynch syndrome, a heritable cancer predisposition. In the MLH1 protein, a non-conserved, intrinsically disordered region connects two conserved, catalytically active structured domains of MLH1. This region has as yet been regarded as a flexible spacer, and missense alterations in this region have been considered non-pathogenic. However, we have identified and investigated a small motif (ConMot) in this linker which is conserved in eukaryotes. Deletion of the ConMot or scrambling of the motif abolished mismatch repair activity. A mutation from a cancer family within the motif (p.Arg385Pro) also inactivated MMR, suggesting that ConMot alterations can be causative for Lynch syndrome. Intriguingly, the mismatch repair defect of the ConMot variants could be restored by addition of a ConMot peptide containing the deleted sequence. This is the first instance of a DNA mismatch repair defect conferred by a mutation that can be overcome by addition of a small molecule. Based on the experimental data and AlphaFold2 predictions, we suggest that the ConMot may bind close to the C-terminal MLH1-PMS2 endonuclease and modulate its activation during the MMR process.
Collapse
Affiliation(s)
- Karla Wolf
- Department of Internal Medicine 1, University Hospital, Goethe University, Frankfurt am Main, 60590, Germany
| | - Jan Kosinski
- European Molecular Biology Laboratory (EMBL), Centre for Structural Systems Biology (CSSB), Hamburg, 22607, Germany
| | - Toby J Gibson
- European Molecular Biology Laboratory (EMBL), Structural and Computational Biology Unit, Heidelberg, 69117, Germany
| | - Nicole Wesch
- Institute of Biophysical Chemistry and Center for Biomolecular Magnetic Resonance, Goethe University, Frankfurt am Main, 60438, Germany
| | - Volker Dötsch
- Institute of Biophysical Chemistry and Center for Biomolecular Magnetic Resonance, Goethe University, Frankfurt am Main, 60438, Germany
| | - Maurizio Genuardi
- UOC Genetica Medica, Fondazione Policlinico Universitario A. Gemelli IRCCS, Rome00168, Italy
| | - Emanuela Lucci Cordisco
- Dipartimento di Scienze della Vita e di Sanità Pubblica, Università Cattolica del Sacro Cuore, Rome00168, Italy
| | - Stefan Zeuzem
- Department of Internal Medicine 1, University Hospital, Goethe University, Frankfurt am Main, 60590, Germany
| | - Angela Brieger
- Department of Internal Medicine 1, University Hospital, Goethe University, Frankfurt am Main, 60590, Germany
| | - Guido Plotz
- Department of Internal Medicine 1, University Hospital, Goethe University, Frankfurt am Main, 60590, Germany
| |
Collapse
|
18
|
Ramakrishnan G, Baakman C, Heijl S, Vroling B, van Horck R, Hiraki J, Xue LC, Huynen MA. Understanding structure-guided variant effect predictions using 3D convolutional neural networks. Front Mol Biosci 2023; 10:1204157. [PMID: 37475887 PMCID: PMC10354367 DOI: 10.3389/fmolb.2023.1204157] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 06/22/2023] [Indexed: 07/22/2023] Open
Abstract
Predicting pathogenicity of missense variants in molecular diagnostics remains a challenge despite the available wealth of data, such as evolutionary information, and the wealth of tools to integrate that data. We describe DeepRank-Mut, a configurable framework designed to extract and learn from physicochemically relevant features of amino acids surrounding missense variants in 3D space. For each variant, various atomic and residue-level features are extracted from its structural environment, including sequence conservation scores of the surrounding amino acids, and stored in multi-channel 3D voxel grids which are then used to train a 3D convolutional neural network (3D-CNN). The resultant model gives a probabilistic estimate of whether a given input variant is disease-causing or benign. We find that the performance of our 3D-CNN model, on independent test datasets, is comparable to other widely used resources which also combine sequence and structural features. Based on the 10-fold cross-validation experiments, we achieve an average accuracy of 0.77 on the independent test datasets. We discuss the contribution of the variant neighborhood in the model's predictive power, in addition to the impact of individual features on the model's performance. Two key features: evolutionary information of residues in the variant neighborhood and their solvent accessibilities were observed to influence the predictions. We also highlight how predictions are impacted by the underlying disease mechanisms of missense mutations and offer insights into understanding these to improve pathogenicity predictions. Our study presents aspects to take into consideration when adopting deep learning approaches for protein structure-guided pathogenicity predictions.
Collapse
Affiliation(s)
- Gayatri Ramakrishnan
- Department of Medical Biosciences, Radboud University Medical Center, Nijmegen, Netherlands
| | - Coos Baakman
- Department of Medical Biosciences, Radboud University Medical Center, Nijmegen, Netherlands
| | | | | | | | | | - Li C. Xue
- Department of Medical Biosciences, Radboud University Medical Center, Nijmegen, Netherlands
| | - Martijn A. Huynen
- Department of Medical Biosciences, Radboud University Medical Center, Nijmegen, Netherlands
| |
Collapse
|
19
|
Lynch M. Mutation pressure, drift, and the pace of molecular coevolution. Proc Natl Acad Sci U S A 2023; 120:e2306741120. [PMID: 37364099 PMCID: PMC10319038 DOI: 10.1073/pnas.2306741120] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 05/09/2023] [Indexed: 06/28/2023] Open
Abstract
Most aspects of the molecular biology of cells involve tightly coordinated intermolecular interactions requiring specific recognition at the nucleotide and/or amino acid levels. This has led to long-standing interest in the degree to which constraints on interacting molecules result in conserved vs. accelerated rates of sequence evolution, with arguments commonly being made that molecular coevolution can proceed at rates exceeding the neutral expectation. Here, a fairly general model is introduced to evaluate the degree to which the rate of evolution at functionally interacting sites is influenced by effective population sizes (Ne), mutation rates, strength of selection, and the magnitude of recombination between sites. This theory is of particular relevance to matters associated with interactions between organelle- and nuclear-encoded proteins, as the two genomic environments often exhibit dramatic differences in the power of mutation and drift. Although genes within low Ne environments can drive the rate of evolution of partner genes experiencing higher Ne, rates exceeding the neutral expectation require that the former also have an elevated mutation rate. Testable predictions, some counterintuitive, are presented on how patterns of coevolutionary rates should depend on the relative intensities of drift, selection, and mutation.
Collapse
Affiliation(s)
- Michael Lynch
- Center for Mechanisms of Evolution, Biodesign Institute, Arizona State University, Tempe, AZ85287
| |
Collapse
|
20
|
Tamborski J, Seong K, Liu F, Staskawicz BJ, Krasileva KV. Altering Specificity and Autoactivity of Plant Immune Receptors Sr33 and Sr50 Via a Rational Engineering Approach. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2023; 36:434-446. [PMID: 36867580 PMCID: PMC10561695 DOI: 10.1094/mpmi-07-22-0154-r] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Many resistance genes deployed against pathogens in crops are intracellular nucleotide-binding (NB) leucine-rich repeat (LRR) receptors (NLRs). The ability to rationally engineer the specificity of NLRs will be crucial in the response to newly emerging crop diseases. Successful attempts to modify NLR recognition have been limited to untargeted approaches or depended on previously available structural information or knowledge of pathogen-effector targets. However, this information is not available for most NLR-effector pairs. Here, we demonstrate the precise prediction and subsequent transfer of residues involved in effector recognition between two closely related NLRs without their experimentally determined structure or detailed knowledge about their pathogen effector targets. By combining phylogenetics, allele diversity analysis, and structural modeling, we successfully predicted residues mediating interaction of Sr50 with its cognate effector AvrSr50 and transferred recognition specificity of Sr50 to the closely related NLR Sr33. We created synthetic versions of Sr33 that contain amino acids from Sr50, including Sr33syn, which gained the ability to recognize AvrSr50 with 12 amino-acid substitutions. Furthermore, we discovered that sites in the LRR domain needed to transfer recognition specificity to Sr33 also influence autoactivity in Sr50. Structural modeling suggests these residues interact with a part of the NB-ARC domain, which we named the NB-ARC latch, to possibly maintain the inactive state of the receptor. Our approach demonstrates rational modifications of NLRs, which could be useful to enhance existing elite crop germplasm. [Formula: see text] Copyright © 2023 The Author(s). This is an open access article distributed under the CC BY-NC-ND 4.0 International license.
Collapse
Affiliation(s)
- Janina Tamborski
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA 94720, U.S.A
| | - Kyungyong Seong
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA 94720, U.S.A
| | - Furong Liu
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA 94720, U.S.A
- Innovative Genomics Institute, University of California Berkeley, 2151 Berkeley Way, Berkeley, CA 94720, U.S.A
| | - Brian J. Staskawicz
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA 94720, U.S.A
- Innovative Genomics Institute, University of California Berkeley, 2151 Berkeley Way, Berkeley, CA 94720, U.S.A
| | - Ksenia V. Krasileva
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA 94720, U.S.A
- Innovative Genomics Institute, University of California Berkeley, 2151 Berkeley Way, Berkeley, CA 94720, U.S.A
| |
Collapse
|
21
|
La Sala G, Pfleger C, Käck H, Wissler L, Nevin P, Böhm K, Janet JP, Schimpl M, Stubbs CJ, De Vivo M, Tyrchan C, Hogner A, Gohlke H, Frolov AI. Combining structural and coevolution information to unveil allosteric sites. Chem Sci 2023; 14:7057-7067. [PMID: 37389247 PMCID: PMC10306073 DOI: 10.1039/d2sc06272k] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 06/02/2023] [Indexed: 07/01/2023] Open
Abstract
Understanding allosteric regulation in biomolecules is of great interest to pharmaceutical research and computational methods emerged during the last decades to characterize allosteric coupling. However, the prediction of allosteric sites in a protein structure remains a challenging task. Here, we integrate local binding site information, coevolutionary information, and information on dynamic allostery into a structure-based three-parameter model to identify potentially hidden allosteric sites in ensembles of protein structures with orthosteric ligands. When tested on five allosteric proteins (LFA-1, p38-α, GR, MAT2A, and BCKDK), the model successfully ranked all known allosteric pockets in the top three positions. Finally, we identified a novel druggable site in MAT2A confirmed by X-ray crystallography and SPR and a hitherto unknown druggable allosteric site in BCKDK validated by biochemical and X-ray crystallography analyses. Our model can be applied in drug discovery to identify allosteric pockets.
Collapse
Affiliation(s)
- Giuseppina La Sala
- Medicinal Chemistry, Research and Early Development, Cardiovascular, Renal and Metabolism (CVRM), BioPharmaceuticals R&D, AstraZeneca Gothenburg Sweden
| | - Christopher Pfleger
- Mathematisch-Naturwissenschaftliche Fakultät, Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf 40225 Düsseldorf Germany
| | - Helena Käck
- Mechanistic and Structural Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca Gothenburg Sweden
| | - Lisa Wissler
- Mechanistic and Structural Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca Gothenburg Sweden
| | - Philip Nevin
- Discovery Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca Gothenburg Sweden
| | - Kerstin Böhm
- Discovery Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca Gothenburg Sweden
| | - Jon Paul Janet
- Medicinal Chemistry, Research and Early Development, Cardiovascular, Renal and Metabolism (CVRM), BioPharmaceuticals R&D, AstraZeneca Gothenburg Sweden
| | - Marianne Schimpl
- Mechanistic and Structural Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca Cambridge UK
| | - Christopher J Stubbs
- Mechanistic and Structural Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca Cambridge UK
| | - Marco De Vivo
- Laboratory of Molecular Modeling and Drug Design, Istituto Italiano di Tecnologia Via Morego 30 16163 Genoa Italy
| | - Christian Tyrchan
- Medicinal Chemistry, Research and Early Development, Respiratory & Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca Gothenburg Sweden
| | - Anders Hogner
- Medicinal Chemistry, Research and Early Development, Cardiovascular, Renal and Metabolism (CVRM), BioPharmaceuticals R&D, AstraZeneca Gothenburg Sweden
| | - Holger Gohlke
- Mathematisch-Naturwissenschaftliche Fakultät, Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf 40225 Düsseldorf Germany
- John von Neumann Institute for Computing (NIC), Jülich Supercomputing Centre (JSC), Institute of Biological Information Processing (IBI-7: Structural Biochemistry), Institute of Bio- and Geosciences (IBG-4: Bioinformatics) Forschungszentrum Jülich GmbH 52425 Jülich Germany
| | - Andrey I Frolov
- Medicinal Chemistry, Research and Early Development, Cardiovascular, Renal and Metabolism (CVRM), BioPharmaceuticals R&D, AstraZeneca Gothenburg Sweden
| |
Collapse
|
22
|
He R, Zhang J, Shao Y, Gu S, Song C, Qian L, Yin WB, Li Z. Knowledge-guided data mining on the standardized architecture of NRPS: Subtypes, novel motifs, and sequence entanglements. PLoS Comput Biol 2023; 19:e1011100. [PMID: 37186644 DOI: 10.1371/journal.pcbi.1011100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 05/25/2023] [Accepted: 04/12/2023] [Indexed: 05/17/2023] Open
Abstract
Non-ribosomal peptide synthetase (NRPS) is a diverse family of biosynthetic enzymes for the assembly of bioactive peptides. Despite advances in microbial sequencing, the lack of a consistent standard for annotating NRPS domains and modules has made data-driven discoveries challenging. To address this, we introduced a standardized architecture for NRPS, by using known conserved motifs to partition typical domains. This motif-and-intermotif standardization allowed for systematic evaluations of sequence properties from a large number of NRPS pathways, resulting in the most comprehensive cross-kingdom C domain subtype classifications to date, as well as the discovery and experimental validation of novel conserved motifs with functional significance. Furthermore, our coevolution analysis revealed important barriers associated with re-engineering NRPSs and uncovered the entanglement between phylogeny and substrate specificity in NRPS sequences. Our findings provide a comprehensive and statistically insightful analysis of NRPS sequences, opening avenues for future data-driven discoveries.
Collapse
Affiliation(s)
- Ruolin He
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Jinyu Zhang
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, PR China
- Savaid Medical School, University of Chinese Academy of Sciences, Beijing, PR China
| | - Yuanzhe Shao
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Shaohua Gu
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Chen Song
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Long Qian
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| | - Wen-Bing Yin
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Sciences, Beijing, PR China
- Savaid Medical School, University of Chinese Academy of Sciences, Beijing, PR China
| | - Zhiyuan Li
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, China
| |
Collapse
|
23
|
Durham J, Zhang J, Humphreys IR, Pei J, Cong Q. Recent advances in predicting and modeling protein-protein interactions. Trends Biochem Sci 2023; 48:527-538. [PMID: 37061423 DOI: 10.1016/j.tibs.2023.03.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 03/03/2023] [Accepted: 03/17/2023] [Indexed: 04/17/2023]
Abstract
Protein-protein interactions (PPIs) drive biological processes, and disruption of PPIs can cause disease. With recent breakthroughs in structure prediction and a deluge of genomic sequence data, computational methods to predict PPIs and model spatial structures of protein complexes are now approaching the accuracy of experimental approaches for permanent interactions and show promise for elucidating transient interactions. As we describe here, the key to this success is rich evolutionary information deciphered from thousands of homologous sequences that coevolve in interacting partners. This covariation signal, revealed by sophisticated statistical and machine learning (ML) algorithms, predicts physiological interactions. Accurate artificial intelligence (AI)-based modeling of protein structures promises to provide accurate 3D models of PPIs at a proteome-wide scale.
Collapse
Affiliation(s)
- Jesse Durham
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Jing Zhang
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Ian R Humphreys
- Department of Biochemistry, University of Washington, Seattle, WA, USA; Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Jimin Pei
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Qian Cong
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA.
| |
Collapse
|
24
|
Budzynski L, Pagnani A. Small-coupling expansion for multiple sequence alignment. Phys Rev E 2023; 107:044125. [PMID: 37198812 DOI: 10.1103/physreve.107.044125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Accepted: 03/27/2023] [Indexed: 05/19/2023]
Abstract
The alignment of biological sequences such as DNA, RNA, and proteins, is one of the basic tools that allow to detect evolutionary patterns, as well as functional or structural characterizations between homologous sequences in different organisms. Typically, state-of-the-art bioinformatics tools are based on profile models that assume the statistical independence of the different sites of the sequences. Over the last years, it has become increasingly clear that homologous sequences show complex patterns of long-range correlations over the primary sequence as a consequence of the natural evolution process that selects genetic variants under the constraint of preserving the functional or structural determinants of the sequence. Here, we present an alignment algorithm based on message passing techniques that overcomes the limitations of profile models. Our method is based on a perturbative small-coupling expansion of the free energy of the model that assumes a linear chain approximation as the zeroth-order of the expansion. We test the potentiality of the algorithm against standard competing strategies on several biological sequences.
Collapse
Affiliation(s)
- Louise Budzynski
- DISAT, Politecnico di Torino, Corso Duca degli Abruzzi, 24, I-10129, Torino, Italy
- Italian Institute for Genomic Medicine, IRCCS Candiolo, SP-142, I-10060, Candiolo, Italy
| | - Andrea Pagnani
- DISAT, Politecnico di Torino, Corso Duca degli Abruzzi, 24, I-10129, Torino, Italy
- Italian Institute for Genomic Medicine, IRCCS Candiolo, SP-142, I-10060, Candiolo, Italy
- INFN, Sezione di Torino, Torino, Via Pietro Giuria, 1 10125 Torino Italy
| |
Collapse
|
25
|
Gandarilla-Pérez CA, Pinilla S, Bitbol AF, Weigt M. Combining phylogeny and coevolution improves the inference of interaction partners among paralogous proteins. PLoS Comput Biol 2023; 19:e1011010. [PMID: 36996234 PMCID: PMC10089317 DOI: 10.1371/journal.pcbi.1011010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 04/11/2023] [Accepted: 03/08/2023] [Indexed: 04/01/2023] Open
Abstract
Predicting protein-protein interactions from sequences is an important goal of computational biology. Various sources of information can be used to this end. Starting from the sequences of two interacting protein families, one can use phylogeny or residue coevolution to infer which paralogs are specific interaction partners within each species. We show that these two signals can be combined to improve the performance of the inference of interaction partners among paralogs. For this, we first align the sequence-similarity graphs of the two families through simulated annealing, yielding a robust partial pairing. We next use this partial pairing to seed a coevolution-based iterative pairing algorithm. This combined method improves performance over either separate method. The improvement obtained is striking in the difficult cases where the average number of paralogs per species is large or where the total number of sequences is modest.
Collapse
Affiliation(s)
- Carlos A Gandarilla-Pérez
- Facultad de Física, Universidad de la Habana, San Lázaro y L, Vedado, Habana, Cuba
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative (LCQB, UMR 7238), Paris, France
| | - Sergio Pinilla
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative (LCQB, UMR 7238), Paris, France
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire Jean Perrin (UMR 8237), Paris, France
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative (LCQB, UMR 7238), Paris, France
| |
Collapse
|
26
|
Si Y, Yan C. Improved inter-protein contact prediction using dimensional hybrid residual networks and protein language models. Brief Bioinform 2023; 24:7033302. [PMID: 36759333 DOI: 10.1093/bib/bbad039] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2022] [Revised: 01/13/2023] [Accepted: 01/18/2023] [Indexed: 02/11/2023] Open
Abstract
The knowledge of contacting residue pairs between interacting proteins is very useful for the structural characterization of protein-protein interactions (PPIs). However, accurately identifying the tens of contacting ones from hundreds of thousands of inter-protein residue pairs is extremely challenging, and performances of the state-of-the-art inter-protein contact prediction methods are still quite limited. In this study, we developed a deep learning method for inter-protein contact prediction, which is referred to as DRN-1D2D_Inter. Specifically, we employed pretrained protein language models to generate structural information-enriched input features to residual networks formed by dimensional hybrid residual blocks to perform inter-protein contact prediction. Extensively bechmarking DRN-1D2D_Inter on multiple datasets, including both heteromeric PPIs and homomeric PPIs, we show DRN-1D2D_Inter consistently and significantly outperformed two state-of-the-art inter-protein contact prediction methods, including GLINTER and DeepHomo, although both the latter two methods leveraged the native structures of interacting proteins in the prediction, and DRN-1D2D_Inter made the prediction purely from sequences. We further show that applying the predicted contacts as constraints for protein-protein docking can significantly improve its performance for protein complex structure prediction.
Collapse
Affiliation(s)
- Yunda Si
- School of Physics, Huazhong University of Science and Technology, China
| | - Chengfei Yan
- School of Physics, Huazhong University of Science and Technology, China
| |
Collapse
|
27
|
Yang Z, Zeng X, Zhao Y, Chen R. AlphaFold2 and its applications in the fields of biology and medicine. Signal Transduct Target Ther 2023; 8:115. [PMID: 36918529 PMCID: PMC10011802 DOI: 10.1038/s41392-023-01381-z] [Citation(s) in RCA: 60] [Impact Index Per Article: 60.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 12/27/2022] [Accepted: 02/16/2023] [Indexed: 03/16/2023] Open
Abstract
AlphaFold2 (AF2) is an artificial intelligence (AI) system developed by DeepMind that can predict three-dimensional (3D) structures of proteins from amino acid sequences with atomic-level accuracy. Protein structure prediction is one of the most challenging problems in computational biology and chemistry, and has puzzled scientists for 50 years. The advent of AF2 presents an unprecedented progress in protein structure prediction and has attracted much attention. Subsequent release of structures of more than 200 million proteins predicted by AF2 further aroused great enthusiasm in the science community, especially in the fields of biology and medicine. AF2 is thought to have a significant impact on structural biology and research areas that need protein structure information, such as drug discovery, protein design, prediction of protein function, et al. Though the time is not long since AF2 was developed, there are already quite a few application studies of AF2 in the fields of biology and medicine, with many of them having preliminarily proved the potential of AF2. To better understand AF2 and promote its applications, we will in this article summarize the principle and system architecture of AF2 as well as the recipe of its success, and particularly focus on reviewing its applications in the fields of biology and medicine. Limitations of current AF2 prediction will also be discussed.
Collapse
Affiliation(s)
- Zhenyu Yang
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Xiaoxi Zeng
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
| | - Yi Zhao
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.
| | - Runsheng Chen
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
- Pingshan Translational Medicine Center, Shenzhen Bay Laboratory, Shenzhen, 518118, China.
| |
Collapse
|
28
|
Xie J, Zhang W, Zhu X, Deng M, Lai L. Coevolution-based prediction of key allosteric residues for protein function regulation. eLife 2023; 12:81850. [PMID: 36799896 PMCID: PMC9981151 DOI: 10.7554/elife.81850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 02/16/2023] [Indexed: 02/18/2023] Open
Abstract
Allostery is fundamental to many biological processes. Due to the distant regulation nature, how allosteric mutations, modifications, and effector binding impact protein function is difficult to forecast. In protein engineering, remote mutations cannot be rationally designed without large-scale experimental screening. Allosteric drugs have raised much attention due to their high specificity and possibility of overcoming existing drug-resistant mutations. However, optimization of allosteric compounds remains challenging. Here, we developed a novel computational method KeyAlloSite to predict allosteric site and to identify key allosteric residues (allo-residues) based on the evolutionary coupling model. We found that protein allosteric sites are strongly coupled to orthosteric site compared to non-functional sites. We further inferred key allo-residues by pairwise comparing the difference of evolutionary coupling scores of each residue in the allosteric pocket with the functional site. Our predicted key allo-residues are in accordance with previous experimental studies for typical allosteric proteins like BCR-ABL1, Tar, and PDZ3, as well as key cancer mutations. We also showed that KeyAlloSite can be used to predict key allosteric residues distant from the catalytic site that are important for enzyme catalysis. Our study demonstrates that weak coevolutionary couplings contain important information of protein allosteric regulation function. KeyAlloSite can be applied in studying the evolution of protein allosteric regulation, designing and optimizing allosteric drugs, and performing functional protein design and enzyme engineering.
Collapse
Affiliation(s)
- Juan Xie
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking UniversityBeijingChina
| | - Weilin Zhang
- BNLMS, Peking-Tsinghua Center for Life Sciences at the College of Chemistry and Molecular Engineering, Peking UniversityBeijingChina
| | - Xiaolei Zhu
- School of Sciences, Anhui Agricultural UniversityHefeiChina
| | - Minghua Deng
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking UniversityBeijingChina
- School of Mathematical Sciences, Peking UniversityBeijingChina
- Center for Statistical Science, Peking UniversityBeijingChina
| | - Luhua Lai
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking UniversityBeijingChina
- BNLMS, Peking-Tsinghua Center for Life Sciences at the College of Chemistry and Molecular Engineering, Peking UniversityBeijingChina
- Research Unit of Drug Design Method, Chinese Academy of Medical Sciences (2021RU014)BeijingChina
| |
Collapse
|
29
|
Wei Q, Liu J, Guo F, Wang Z, Zhang X, Yuan L, Ali K, Qiang F, Wen Y, Li W, Zheng B, Bai Q, Li G, Ren H, Wu G. Kinase regulators evolved into two families by gain and loss of ability to bind plant steroid receptors. PLANT PHYSIOLOGY 2023; 191:1167-1185. [PMID: 36494097 PMCID: PMC9922406 DOI: 10.1093/plphys/kiac568] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 11/29/2022] [Indexed: 06/17/2023]
Abstract
All biological functions evolve by fixing beneficial mutations and removing deleterious ones. Therefore, continuously fixing and removing the same essential function to separately diverge monophyletic gene families sounds improbable. Yet, here we report that brassinosteroid insensitive1 kinase inhibitor1 (BKI1)/membrane-associated kinase regulators (MAKRs) regulating a diverse function evolved into BKI1 and MAKR families from a common ancestor by respectively enhancing and losing ability to bind brassinosteroid receptor brassinosteroid insensitive1 (BRI1). The BKI1 family includes BKI1, MAKR1/BKI1-like (BKL) 1, and BKL2, while the MAKR family contains MAKR2-6. Seedless plants contain only BKL2. In seed plants, MAKR1/BKL1 and MAKR3, duplicates of BKL2, gained and lost the ability to bind BRI1, respectively. In angiosperms, BKL2 lost the ability to bind BRI1 to generate MAKR2, while BKI1 and MAKR6 were duplicates of MAKR1/BKL1 and MAKR3, respectively. In dicots, MAKR4 and MAKR5 were duplicates of MAKR3 and MAKR2, respectively. Importantly, BKI1 localized in the plasma membrane, but BKL2 localized to the nuclei while MAKR1/BKL1 localized throughout the whole cell. Importantly, BKI1 strongly and MAKR1/BKL1 weakly inhibited plant growth, but BKL2 and the MAKR family did not inhibit plant growth. Functional study of the chimeras of their N- and C-termini showed that only the BKI1 family was partially reconstructable, supporting stepwise evolution by a seesaw mechanism between their C- and N-termini to alternately gain an ability to bind and inhibit BRI1, respectively. Nevertheless, the C-terminal BRI1-interacting motif best defines the divergence of BKI1/MAKRs. Therefore, BKI1 and MAKR families evolved by gradually gaining and losing the same function, respectively, extremizing divergent evolution and adding insights into gene (BKI1/MAKR) duplication and divergence.
Collapse
|
30
|
Nithya C, Kiran M, Nagarajaram HA. Dissection of hubs and bottlenecks in a protein-protein interaction network. Comput Biol Chem 2023; 102:107802. [PMID: 36603332 DOI: 10.1016/j.compbiolchem.2022.107802] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 11/20/2022] [Accepted: 12/08/2022] [Indexed: 12/23/2022]
Abstract
Analysis of degree centrality in conjunction with betweenness centrality of proteins in a human protein-protein interaction network revealed three categories of centrally important proteins: a) proteins with high degree and betweenness (hub-bottlenecks denoted as MX), b) proteins with high betweenness and low degree (non-hub-bottlenecks/pure bottlenecks denoted as PB) and c) proteins with high degree and low betweenness (hub-non-bottlenecks/pure hubs denoted as PH). When subjected to a detailed statistical analysis of their molecular-level properties, the proteins belonging to each of these categories were found to be associated with distinct canonical molecular properties, i.e., "molecular markers". The MX proteins are a) conformationally versatile, mainly comprising of essential proteins, b) the targets for interactions by the proteins of viral and bacterial pathogens, c) evolutionally constrained, involved in multiple pathways, enriched with disease genes and d) involved in the functions such as protein stabilization, phosphorylation, and mRNA slicing processes. PB proteins are a) enriched with extracellular and cancer-related proteins, b) enriched with the approved drug targets and c) involved in cell-cell signaling processes. Finally, PH are a) structurally versatile, b) enriched with essential proteins primarily involved in housekeeping processes (transcription and replication). The fact that the proteins belonging to these three categories form three distinct sets in terms of their molecular properties reveals the existence of trichotomy among hubs and bottlenecks, and this knowledge is of paramount importance while prioritizing protein targets for further studies such as drug design and disease association studies based on their network centrality values.
Collapse
Affiliation(s)
- Chandramohan Nithya
- Department of Biotechnology and Bioinformatics, School of Life Sciences, University of Hyderabad, Hyderabad, Telangana 500046, India
| | - Manjari Kiran
- Department of Systems and Computational Biology, School of Life Sciences, University of Hyderabad, Hyderabad, Telangana 500046, India
| | | |
Collapse
|
31
|
Schafer JW, Porter LL. Evolutionary selection of proteins with two folds. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.18.524637. [PMID: 36789442 PMCID: PMC9928049 DOI: 10.1101/2023.01.18.524637] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Although most globular proteins fold into a single stable structure 1 , an increasing number have been shown to remodel their secondary and tertiary structures in response to cellular stimuli 2 . State-of-the-art algorithms 3-5 predict that these fold-switching proteins assume only one stable structure 6,7 , missing their functionally critical alternative folds. Why these algorithms predict a single fold is unclear, but all of them infer protein structure from coevolved amino acid pairs. Here, we hypothesize that coevolutionary signatures are being missed. Suspecting that over-represented single-fold sequences may be masking these signatures, we developed an approach to search both highly diverse protein superfamilies-composed of single-fold and fold-switching variants-and protein subfamilies with more fold-switching variants. This approach successfully revealed coevolution of amino acid pairs uniquely corresponding to both conformations of 56/58 fold-switching proteins from distinct families. Then, using a set of coevolved amino acid pairs predicted by our approach, we successfully biased AlphaFold2 5 to predict two experimentally consistent conformations of a candidate protein with unsolved structure. The discovery of widespread dual-fold coevolution indicates that fold-switching sequences have been preserved by natural selection, implying that their functionalities provide evolutionary advantage and paving the way for predictions of diverse protein structures from single sequences.
Collapse
Affiliation(s)
- Joseph W. Schafer
- National Library of Medicine, National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | - Lauren L. Porter
- National Library of Medicine, National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
- National Heart, Lung, and Blood Institute, Biochemistry and Biophysics Center, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
32
|
Sloan DB, Warren JM, Williams AM, Kuster SA, Forsythe ES. Incompatibility and Interchangeability in Molecular Evolution. Genome Biol Evol 2023; 15:evac184. [PMID: 36583227 PMCID: PMC9839398 DOI: 10.1093/gbe/evac184] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 12/20/2022] [Accepted: 12/22/2022] [Indexed: 12/31/2022] Open
Abstract
There is remarkable variation in the rate at which genetic incompatibilities in molecular interactions accumulate. In some cases, minor changes-even single-nucleotide substitutions-create major incompatibilities when hybridization forces new variants to function in a novel genetic background from an isolated population. In other cases, genes or even entire functional pathways can be horizontally transferred between anciently divergent evolutionary lineages that span the tree of life with little evidence of incompatibilities. In this review, we explore whether there are general principles that can explain why certain genes are prone to incompatibilities while others maintain interchangeability. We summarize evidence pointing to four genetic features that may contribute to greater resistance to functional replacement: (1) function in multisubunit enzyme complexes and protein-protein interactions, (2) sensitivity to changes in gene dosage, (3) rapid rate of sequence evolution, and (4) overall importance to cell viability, which creates sensitivity to small perturbations in molecular function. We discuss the relative levels of support for these different hypotheses and lay out future directions that may help explain the striking contrasts in patterns of incompatibility and interchangeability throughout the history of molecular evolution.
Collapse
Affiliation(s)
- Daniel B Sloan
- Department of Biology, Colorado State University, Fort Collins, Colorado
| | - Jessica M Warren
- Center for Mechanisms of Evolution, Biodesign Institute and School of Life Sciences, Arizona State University, Tempe, Arizona
| | - Alissa M Williams
- Department of Biological Sciences, Vanderbilt University, Nashville, Tennessee
| | - Shady A Kuster
- Department of Biology, Colorado State University, Fort Collins, Colorado
| | - Evan S Forsythe
- Department of Biology, Colorado State University, Fort Collins, Colorado
| |
Collapse
|
33
|
Shea A, Bartz J, Zhang L, Dong X. Predicting mutational function using machine learning. MUTATION RESEARCH. REVIEWS IN MUTATION RESEARCH 2023; 791:108457. [PMID: 36965820 PMCID: PMC10239318 DOI: 10.1016/j.mrrev.2023.108457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 03/11/2023] [Accepted: 03/20/2023] [Indexed: 03/27/2023]
Abstract
Genetic variations are one of the major causes of phenotypic variations between human individuals. Although beneficial as being the substrate of evolution, germline mutations may cause diseases, including Mendelian diseases and complex diseases such as diabetes and heart diseases. Mutations occurring in somatic cells are a main cause of cancer and likely cause age-related phenotypes and other age-related diseases. Because of the high abundance of genetic variations in the human genome, i.e., millions of germline variations per human subject and thousands of additional somatic mutations per cell, it is technically challenging to experimentally verify the function of every possible mutation and their interactions. Significant progress has been made to solve this problem using computational approaches, especially machine learning (ML). Here, we review the progress and achievements made in recent years in this field of research. We classify the computational models in two ways: one according to their prediction goals including protein structural alterations, gene expression changes, and disease risks, and the other according to their methodologies, including non-machine learning methods, classical machine learning methods, and deep neural network methods. For models in each category, we discuss their architecture, prediction accuracy, and potential limitations. This review provides new insights into the applications and future directions of computational approaches in understanding the role of mutations in aging and disease.
Collapse
Affiliation(s)
- Anthony Shea
- Institute on the Biology of Aging and Metabolism, University of Minnesota, Minneapolis, MN 55455, USA; Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, MN 55455, USA
| | - Josh Bartz
- Institute on the Biology of Aging and Metabolism, University of Minnesota, Minneapolis, MN 55455, USA; Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, MN 55455, USA; Bioinformatics and Computational Biology Program, University of Minnesota, Minneapolis, MN 55455, USA
| | - Lei Zhang
- Institute on the Biology of Aging and Metabolism, University of Minnesota, Minneapolis, MN 55455, USA; Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA
| | - Xiao Dong
- Institute on the Biology of Aging and Metabolism, University of Minnesota, Minneapolis, MN 55455, USA; Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, MN 55455, USA.
| |
Collapse
|
34
|
Ibrahim AH, Karabulut OC, Karpuzcu BA, Türk E, Süzek BE. A correlation coefficient-based feature selection approach for virus-host protein-protein interaction prediction. PLoS One 2023; 18:e0285168. [PMID: 37130110 PMCID: PMC10153705 DOI: 10.1371/journal.pone.0285168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 04/17/2023] [Indexed: 05/03/2023] Open
Abstract
Prediction of virus-host protein-protein interactions (PPI) is a broad research area where various machine-learning-based classifiers are developed. Transforming biological data into machine-usable features is a preliminary step in constructing these virus-host PPI prediction tools. In this study, we have adopted a virus-host PPI dataset and a reduced amino acids alphabet to create tripeptide features and introduced a correlation coefficient-based feature selection. We applied feature selection across several correlation coefficient metrics and statistically tested their relevance in a structural context. We compared the performance of feature-selection models against that of the baseline virus-host PPI prediction models created using different classification algorithms without the feature selection. We also tested the performance of these baseline models against the previously available tools to ensure their predictive power is acceptable. Here, the Pearson coefficient provides the best performance with respect to the baseline model as measured by AUPR; a drop of 0.003 in AUPR while achieving a 73.3% (from 686 to 183) reduction in the number of tripeptides features for random forest. The results suggest our correlation coefficient-based feature selection approach, while decreasing the computation time and space complexity, has a limited impact on the prediction performance of virus-host PPI prediction tools.
Collapse
Affiliation(s)
- Ahmed Hassan Ibrahim
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Onur Can Karabulut
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Betül Asiye Karpuzcu
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Erdem Türk
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey
- Department of Computer Engineering, Faculty of Engineering, Muğla Sıtkı Koçman University, Muğla, Turkey
| | - Barış Ethem Süzek
- Bioinformatics Graduate Program, Graduate School of Natural and Applied Sciences, Muğla Sıtkı Koçman University, Muğla, Turkey
- Department of Computer Engineering, Faculty of Engineering, Muğla Sıtkı Koçman University, Muğla, Turkey
- Georgetown University Medical Center, Biochemistry and Molecular & Cellular Biology, Washington DC, United States of America
| |
Collapse
|
35
|
Choudhuri I, Biswas A, Haldane A, Levy RM. Contingency and Entrenchment of Drug-Resistance Mutations in HIV Viral Proteins. J Phys Chem B 2022; 126:10622-10636. [PMID: 36493468 PMCID: PMC9841799 DOI: 10.1021/acs.jpcb.2c06123] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The ability of HIV-1 to rapidly mutate leads to antiretroviral therapy (ART) failure among infected patients. Drug-resistance mutations (DRMs), which cause a fitness penalty to intrinsic viral fitness, are compensated by accessory mutations with favorable epistatic interactions which cause an evolutionary trapping effect, but the kinetics of this overall process has not been well characterized. Here, using a Potts Hamiltonian model describing epistasis combined with kinetic Monte Carlo simulations of evolutionary trajectories, we explore how epistasis modulates the evolutionary dynamics of HIV DRMs. We show how the occurrence of a drug-resistance mutation is contingent on favorable epistatic interactions with many other residues of the sequence background and that subsequent mutations entrench DRMs. We measure the time-autocorrelation of fluctuations in the likelihood of DRMs due to epistatic coupling with the sequence background, which reveals the presence of two evolutionary processes controlling DRM kinetics with two distinct time scales. Further analysis of waiting times for the evolutionary trapping effect to reverse reveals that the sequences which entrench (trap) a DRM are responsible for the slower time scale. We also quantify the overall strength of epistatic effects on the evolutionary kinetics for different mutations and show these are much larger for DRM positions than polymorphic positions, and we also show that trapping of a DRM is often caused by the collective effect of many accessory mutations, rather than a few strongly coupled ones, suggesting the importance of multiresidue sequence variations in HIV evolution. The analysis presented here provides a framework to explore the kinetic pathways through which viral proteins like HIV evolve under drug-selection pressure.
Collapse
Affiliation(s)
| | | | - Allan Haldane
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania 19122, United States; Department of Physics, Temple University, Philadelphia, Pennsylvania 19122-6008, United States
| | - Ronald M. Levy
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122, United States; Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania 19122, United States
| |
Collapse
|
36
|
Bioinformatic Analysis of Na +, K +-ATPase Regulation through Phosphorylation of the Alpha-Subunit N-Terminus. Int J Mol Sci 2022; 24:ijms24010067. [PMID: 36613508 PMCID: PMC9820343 DOI: 10.3390/ijms24010067] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Revised: 12/01/2022] [Accepted: 12/17/2022] [Indexed: 12/24/2022] Open
Abstract
The Na+, K+-ATPase is an integral membrane protein which uses the energy of ATP hydrolysis to pump Na+ and K+ ions across the plasma membrane of all animal cells. It plays crucial roles in numerous physiological processes, such as cell volume regulation, nutrient reabsorption in the kidneys, nerve impulse transmission, and muscle contraction. Recent data suggest that it is regulated via an electrostatic switch mechanism involving the interaction of its lysine-rich N-terminus with the cytoplasmic surface of its surrounding lipid membrane, which can be modulated through the regulatory phosphorylation of the conserved serine and tyrosine residues on the protein's N-terminal tail. Prior data indicate that the kinases responsible for phosphorylation belong to the protein kinase C (PKC) and Src kinase families. To provide indications of which particular enzyme of these families might be responsible, we analysed them for evidence of coevolution via the mirror tree method, utilising coevolution as a marker for a functional interaction. The results obtained showed that the most likely kinase isoforms to interact with the Na+, K+-ATPase were the θ and η isoforms of PKC and the Src kinase itself. These theoretical results will guide the direction of future experimental studies.
Collapse
|
37
|
Weaver RJ, Rabinowitz S, Thueson K, Havird JC. Genomic Signatures of Mitonuclear Coevolution in Mammals. Mol Biol Evol 2022; 39:6775223. [PMID: 36288802 PMCID: PMC9641969 DOI: 10.1093/molbev/msac233] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Mitochondrial (mt) and nuclear-encoded proteins are integrated in aerobic respiration, requiring co-functionality among gene products from fundamentally different genomes. Different evolutionary rates, inheritance mechanisms, and selection pressures set the stage for incompatibilities between interacting products of the two genomes. The mitonuclear coevolution hypothesis posits that incompatibilities may be avoided if evolution in one genome selects for complementary changes in interacting genes encoded by the other genome. Nuclear compensation, in which deleterious mtDNA changes are offset by compensatory nuclear changes, is often invoked as the primary mechanism for mitonuclear coevolution. Yet, direct evidence supporting nuclear compensation is rare. Here, we used data from 58 mammalian species representing eight orders to show strong correlations between evolutionary rates of mt and nuclear-encoded mt-targeted (N-mt) proteins, but not between mt and non-mt-targeted nuclear proteins, providing strong support for mitonuclear coevolution across mammals. N-mt genes with direct mt interactions also showed the strongest correlations. Although most N-mt genes had elevated dN/dS ratios compared to mt genes (as predicted under nuclear compensation), N-mt sites in close contact with mt proteins were not overrepresented for signs of positive selection compared to noncontact N-mt sites (contrary to predictions of nuclear compensation). Furthermore, temporal patterns of N-mt and mt amino acid substitutions did not support predictions of nuclear compensation, even in positively selected, functionally important residues with direct mitonuclear contacts. Overall, our results strongly support mitonuclear coevolution across ∼170 million years of mammalian evolution but fail to support nuclear compensation as the major mode of mitonuclear coevolution.
Collapse
Affiliation(s)
- Ryan J Weaver
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA.,Department of Natural Resource Ecology and Management, Iowa State University, Ames, IA
| | | | - Kiley Thueson
- Department of Integrative Biology, University of Texas, Austin, TX
| | - Justin C Havird
- Department of Integrative Biology, University of Texas, Austin, TX
| |
Collapse
|
38
|
Zhang H, Xu MS, Fan X, Chung WK, Shen Y. Predicting functional effect of missense variants using graph attention neural networks. NAT MACH INTELL 2022; 4:1017-1028. [PMID: 37484202 PMCID: PMC10361701 DOI: 10.1038/s42256-022-00561-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Accepted: 10/07/2022] [Indexed: 11/16/2022]
Abstract
Accurate prediction of damaging missense variants is critically important for interpreting a genome sequence. Although many methods have been developed, their performance has been limited. Recent advances in machine learning and the availability of large-scale population genomic sequencing data provide new opportunities to considerably improve computational predictions. Here we describe the graphical missense variant pathogenicity predictor (gMVP), a new method based on graph attention neural networks. Its main component is a graph with nodes that capture predictive features of amino acids and edges weighted by co-evolution strength, enabling effective pooling of information from the local protein context and functionally correlated distal positions. Evaluation of deep mutational scan data shows that gMVP outperforms other published methods in identifying damaging variants in TP53, PTEN, BRCA1 and MSH2. Furthermore, it achieves the best separation of de novo missense variants in neuro developmental disorder cases from those in controls. Finally, the model supports transfer learning to optimize gain- and loss-of-function predictions in sodium and calcium channels. In summary, we demonstrate that gMVP can improve interpretation of missense variants in clinical testing and genetic studies.
Collapse
Affiliation(s)
- Haicang Zhang
- Department of Systems Biology, Columbia University, New York, NY, USA
| | | | - Xiao Fan
- Department of Systems Biology, Columbia University, New York, NY, USA
- Department of Pediatrics, Columbia University, New York, NY, USA
| | - Wendy K. Chung
- Department of Pediatrics, Columbia University, New York, NY, USA
- Department of Medicine, Columbia University, New York, NY, USA
| | - Yufeng Shen
- Department of Systems Biology, Columbia University, New York, NY, USA
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
- JP Sulzberger Columbia Genome Center, Columbia University, New York, NY, USA
| |
Collapse
|
39
|
Lupo U, Sgarbossa D, Bitbol AF. Protein language models trained on multiple sequence alignments learn phylogenetic relationships. Nat Commun 2022; 13:6298. [PMID: 36273003 PMCID: PMC9588007 DOI: 10.1038/s41467-022-34032-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 10/07/2022] [Indexed: 12/25/2022] Open
Abstract
Self-supervised neural language models with attention have recently been applied to biological sequence data, advancing structure, function and mutational effect prediction. Some protein language models, including MSA Transformer and AlphaFold's EvoFormer, take multiple sequence alignments (MSAs) of evolutionarily related proteins as inputs. Simple combinations of MSA Transformer's row attentions have led to state-of-the-art unsupervised structural contact prediction. We demonstrate that similarly simple, and universal, combinations of MSA Transformer's column attentions strongly correlate with Hamming distances between sequences in MSAs. Therefore, MSA-based language models encode detailed phylogenetic relationships. We further show that these models can separate coevolutionary signals encoding functional and structural constraints from phylogenetic correlations reflecting historical contingency. To assess this, we generate synthetic MSAs, either without or with phylogeny, from Potts models trained on natural MSAs. We find that unsupervised contact prediction is substantially more resilient to phylogenetic noise when using MSA Transformer versus inferred Potts models.
Collapse
Affiliation(s)
- Umberto Lupo
- grid.5333.60000000121839049Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland ,grid.419765.80000 0001 2223 3006SIB Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland
| | - Damiano Sgarbossa
- grid.5333.60000000121839049Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland ,grid.419765.80000 0001 2223 3006SIB Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland
| | - Anne-Florence Bitbol
- grid.5333.60000000121839049Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland ,grid.419765.80000 0001 2223 3006SIB Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland
| |
Collapse
|
40
|
Wu C, Guo D. Computational Docking Reveals Co-Evolution of C4 Carbon Delivery Enzymes in Diverse Plants. Int J Mol Sci 2022; 23:ijms232012688. [PMID: 36293547 PMCID: PMC9604239 DOI: 10.3390/ijms232012688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 10/14/2022] [Accepted: 10/19/2022] [Indexed: 11/16/2022] Open
Abstract
Proteins are modular functionalities regulating multiple cellular activities in prokaryotes and eukaryotes. As a consequence of higher plants adapting to arid and thermal conditions, C4 photosynthesis is the carbon fixation process involving multi-enzymes working in a coordinated fashion. However, how these enzymes interact with each other and whether they co-evolve in parallel to maintain interactions in different plants remain elusive to date. Here, we report our findings on the global protein co-evolution relationship and local dynamics of co-varying site shifts in key C4 photosynthetic enzymes. We found that in most of the selected key C4 photosynthetic enzymes, global pairwise co-evolution events exist to form functional couplings. Besides, protein-protein interactions between these enzymes may suggest their unknown functionalities in the carbon delivery process. For PEPC and PPCK regulation pairs, pocket formation at the interactive interface are not necessary for their function. This feature is distinct from another well-known regulation pair in C4 photosynthesis, namely, PPDK and PPDK-RP, where the pockets are necessary. Our findings facilitate the discovery of novel protein regulation types and contribute to expanding our knowledge about C4 photosynthesis.
Collapse
|
41
|
Ravishankar K, Jiang X, Leddin EM, Morcos F, Cisneros GA. Computational compensatory mutation discovery approach: Predicting a PARP1 variant rescue mutation. Biophys J 2022; 121:3663-3673. [PMID: 35642254 PMCID: PMC9617126 DOI: 10.1016/j.bpj.2022.05.036] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2021] [Revised: 05/20/2022] [Accepted: 05/23/2022] [Indexed: 11/02/2022] Open
Abstract
The prediction of protein mutations that affect function may be exploited for multiple uses. In the context of disease variants, the prediction of compensatory mutations that reestablish functional phenotypes could aid in the development of genetic therapies. In this work, we present an integrated approach that combines coevolutionary analysis and molecular dynamics (MD) simulations to discover functional compensatory mutations. This approach is employed to investigate possible rescue mutations of a poly(ADP-ribose) polymerase 1 (PARP1) variant, PARP1 V762A, associated with lung cancer and follicular lymphoma. MD simulations show PARP1 V762A exhibits noticeable changes in structural and dynamical behavior compared with wild-type (WT) PARP1. Our integrated approach predicts A755E as a possible compensatory mutation based on coevolutionary information, and molecular simulations indicate that the PARP1 A755E/V762A double mutant exhibits similar structural and dynamical behavior to WT PARP1. Our methodology can be broadly applied to a large number of systems where single-nucleotide polymorphisms have been identified as connected to disease and can shed light on the biophysical effects of such changes as well as provide a way to discover potential mutants that could restore WT-like functionality. This can, in turn, be further utilized in the design of molecular therapeutics that aim to mimic such compensatory effect.
Collapse
Affiliation(s)
| | - Xianli Jiang
- Department of Biological Sciences, The University of Texas at Dallas, Richardson, Texas; Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Emmett M Leddin
- Department of Chemistry, University of North Texas, Denton, Texas
| | - Faruck Morcos
- Department of Biological Sciences, The University of Texas at Dallas, Richardson, Texas; Department of Bioengineering, The University of Texas at Dallas, Richardson, Texas; Center for Systems Biology, The University of Texas at Dallas, Richardson, Texas.
| | - G Andrés Cisneros
- Department of Chemistry, University of North Texas, Denton, Texas; Department of Physics, The University of Texas at Dallas, Richardson, Texas; Department of Chemistry, The University of Texas at Dallas, Richardson, Texas.
| |
Collapse
|
42
|
Robin V, Bodein A, Scott-Boyer MP, Leclercq M, Périn O, Droit A. Overview of methods for characterization and visualization of a protein–protein interaction network in a multi-omics integration context. Front Mol Biosci 2022; 9:962799. [PMID: 36158572 PMCID: PMC9494275 DOI: 10.3389/fmolb.2022.962799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 08/16/2022] [Indexed: 11/26/2022] Open
Abstract
At the heart of the cellular machinery through the regulation of cellular functions, protein–protein interactions (PPIs) have a significant role. PPIs can be analyzed with network approaches. Construction of a PPI network requires prediction of the interactions. All PPIs form a network. Different biases such as lack of data, recurrence of information, and false interactions make the network unstable. Integrated strategies allow solving these different challenges. These approaches have shown encouraging results for the understanding of molecular mechanisms, drug action mechanisms, and identification of target genes. In order to give more importance to an interaction, it is evaluated by different confidence scores. These scores allow the filtration of the network and thus facilitate the representation of the network, essential steps to the identification and understanding of molecular mechanisms. In this review, we will discuss the main computational methods for predicting PPI, including ones confirming an interaction as well as the integration of PPIs into a network, and we will discuss visualization of these complex data.
Collapse
Affiliation(s)
- Vivian Robin
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Antoine Bodein
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Marie-Pier Scott-Boyer
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Mickaël Leclercq
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
| | - Olivier Périn
- Digital Sciences Department, L'Oréal Advanced Research, Aulnay-sous-bois, France
| | - Arnaud Droit
- Molecular Medicine Department, CHU de Québec Research Center, Université Laval, Québec, QC, Canada
- *Correspondence: Arnaud Droit,
| |
Collapse
|
43
|
Uzoeto HO, Cosmas S, Ajima JN, Arazu AV, Didiugwu CM, Ekpo DE, Ibiang GO, Durojaye OA. Computer-aided molecular modeling and structural analysis of the human centromere protein–HIKM complex. BENI-SUEF UNIVERSITY JOURNAL OF BASIC AND APPLIED SCIENCES 2022. [DOI: 10.1186/s43088-022-00285-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
Protein–peptide and protein–protein interactions play an essential role in different functional and structural cellular organizational aspects. While Cryo-EM and X-ray crystallography generate the most complete structural characterization, most biological interactions exist in biomolecular complexes that are neither compliant nor responsive to direct experimental analysis. The development of computational docking approaches is therefore necessary. This starts from component protein structures to the prediction of their complexes, preferentially with precision close to complex structures generated by X-ray crystallography.
Results
To guarantee faithful chromosomal segregation, there must be a proper assembling of the kinetochore (a protein complex with multiple subunits) at the centromere during the process of cell division. As an important member of the inner kinetochore, defects in any of the subunits making up the CENP-HIKM complex lead to kinetochore dysfunction and an eventual chromosomal mis-segregation and cell death. Previous studies in an attempt to understand the assembly and mechanism devised by the CENP-HIKM in promoting the functionality of the kinetochore have reconstituted the protein complex from different organisms including fungi and yeast. Here, we present a detailed computational model of the physical interactions that exist between each component of the human CENP-HIKM, while validating each modeled structure using orthologs with existing crystal structures from the protein data bank.
Conclusions
Results from this study substantiate the existing hypothesis that the human CENP-HIK complex shares a similar architecture with its fungal and yeast orthologs, and likewise validate the binding mode of CENP-M to the C-terminus of the human CENP-I based on existing experimental reports.
Graphical abstract
Collapse
|
44
|
Ben-Tal N, Kolodny R. Homologues not needed: Structure prediction from a protein language model. Structure 2022; 30:1047-1049. [PMID: 35931059 DOI: 10.1016/j.str.2022.07.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
Abstract
Accurate protein structure predictors use clusters of homologues, which disregard sequence specific effects. In this issue of Structure, Weißenow and colleagues report a deep learning-based tool, EMBER2, that efficiently predicts the distances in a protein structure from its amino acid sequence only. This approach should enable the analysis of mutation effects.
Collapse
Affiliation(s)
- Nir Ben-Tal
- Department of Biochemistry and Molecular Biology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel.
| | - Rachel Kolodny
- Department of Computer Science, University of Haifa, Mount Carmel, Haifa, 3498838, Israel.
| |
Collapse
|
45
|
Timonina DS, Suplatov DA. Analysis of Multiple Protein Alignments Using 3D-Structural Information on the Orientation of Amino Acid Side-Chains. Mol Biol 2022. [DOI: 10.1134/s0026893322040136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
46
|
Magi Meconi G, Sasselli IR, Bianco V, Onuchic JN, Coluzza I. Key aspects of the past 30 years of protein design. REPORTS ON PROGRESS IN PHYSICS. PHYSICAL SOCIETY (GREAT BRITAIN) 2022; 85:086601. [PMID: 35704983 DOI: 10.1088/1361-6633/ac78ef] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 06/15/2022] [Indexed: 06/15/2023]
Abstract
Proteins are the workhorse of life. They are the building infrastructure of living systems; they are the most efficient molecular machines known, and their enzymatic activity is still unmatched in versatility by any artificial system. Perhaps proteins' most remarkable feature is their modularity. The large amount of information required to specify each protein's function is analogically encoded with an alphabet of just ∼20 letters. The protein folding problem is how to encode all such information in a sequence of 20 letters. In this review, we go through the last 30 years of research to summarize the state of the art and highlight some applications related to fundamental problems of protein evolution.
Collapse
Affiliation(s)
- Giulia Magi Meconi
- Computational Biophysics Lab, Center for Cooperative Research in Biomaterials (CIC biomaGUNE), Basque Research and Technology Alliance (BRTA), Paseo de Miramon 182, 20014, Donostia-San Sebastián, Spain
| | - Ivan R Sasselli
- Computational Biophysics Lab, Center for Cooperative Research in Biomaterials (CIC biomaGUNE), Basque Research and Technology Alliance (BRTA), Paseo de Miramon 182, 20014, Donostia-San Sebastián, Spain
| | | | - Jose N Onuchic
- Center for Theoretical Biological Physics, Department of Physics & Astronomy, Department of Chemistry, Department of Biosciences, Rice University, Houston, TX 77251, United States of America
| | - Ivan Coluzza
- BCMaterials, Basque Center for Materials, Applications and Nanostructures, Bld. Martina Casiano, UPV/EHU Science Park, Barrio Sarriena s/n, 48940 Leioa, Spain
- Basque Foundation for Science, Ikerbasque, 48009, Bilbao, Spain
| |
Collapse
|
47
|
van Keulen SC, Martin J, Colizzi F, Frezza E, Trpevski D, Diaz NC, Vidossich P, Rothlisberger U, Hellgren Kotaleski J, Wade RC, Carloni P. Multiscale molecular simulations to investigate adenylyl cyclase‐based signaling in the brain. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1623] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Siri C. van Keulen
- Computational Structural Biology Group, Bijvoet Center for Biomolecular Research, Science for Life, Faculty of Science – Chemistry Utrecht University Utrecht The Netherlands
| | - Juliette Martin
- CNRS, UMR 5086 Molecular Microbiology and Structural Biochemistry University of Lyon Lyon France
| | - Francesco Colizzi
- Molecular Ocean Laboratory, Department of Marine Biology and Oceanography Institute of Marine Sciences, ICM‐CSIC Barcelona Spain
| | - Elisa Frezza
- Université Paris Cité, CiTCoM, CNRS Paris France
| | - Daniel Trpevski
- Science for Life Laboratory, School of Electrical Engineering and Computer Science KTH Royal Institute of Technology Stockholm
| | - Nuria Cirauqui Diaz
- CNRS, UMR 5086 Molecular Microbiology and Structural Biochemistry University of Lyon Lyon France
| | - Pietro Vidossich
- Molecular Modeling and Drug Discovery Lab Istituto Italiano di Tecnologia Genoa Italy
| | - Ursula Rothlisberger
- Laboratory of Computational Chemistry and Biochemistry Ecole Polytechnique Fédérale de Lausanne (EPFL) Lausanne
| | - Jeanette Hellgren Kotaleski
- Science for Life Laboratory, School of Electrical Engineering and Computer Science KTH Royal Institute of Technology Stockholm
- Department of Neuroscience Karolinska Institute Stockholm
| | - Rebecca C. Wade
- Molecular and Cellular Modeling Group Heidelberg Institute for Theoretical Studies (HITS) Heidelberg Germany
- Center for Molecular Biology (ZMBH), DKFZ‐ZMBH Alliance, and Interdisciplinary Center for Scientific Computing (IWR) Heidelberg University Heidelberg Germany
| | - Paolo Carloni
- Institute for Neuroscience and Medicine (INM‐9) and Institute for Advanced Simulations (IAS‐5) “Computational biomedicine” Forschungszentrum Jülich Jülich Germany
- INM‐11 JARA‐Institute: Molecular Neuroscience and Neuroimaging Forschungszentrum Jülich Jülich Germany
| |
Collapse
|
48
|
Zhou L, Feng T, Xu S, Gao F, Lam TT, Wang Q, Wu T, Huang H, Zhan L, Li L, Guan Y, Dai Z, Yu G. ggmsa: a visual exploration tool for multiple sequence alignment and associated data. Brief Bioinform 2022; 23:6603927. [PMID: 35671504 DOI: 10.1093/bib/bbac222] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 05/07/2022] [Accepted: 05/11/2022] [Indexed: 12/25/2022] Open
Abstract
The identification of the conserved and variable regions in the multiple sequence alignment (MSA) is critical to accelerating the process of understanding the function of genes. MSA visualizations allow us to transform sequence features into understandable visual representations. As the sequence-structure-function relationship gains increasing attention in molecular biology studies, the simple display of nucleotide or protein sequence alignment is not satisfied. A more scalable visualization is required to broaden the scope of sequence investigation. Here we present ggmsa, an R package for mining comprehensive sequence features and integrating the associated data of MSA by a variety of display methods. To uncover sequence conservation patterns, variations and recombination at the site level, sequence bundles, sequence logos, stacked sequence alignment and comparative plots are implemented. ggmsa supports integrating the correlation of MSA sequences and their phenotypes, as well as other traits such as ancestral sequences, molecular structures, molecular functions and expression levels. We also design a new visualization method for genome alignments in multiple alignment format to explore the pattern of within and between species variation. Combining these visual representations with prime knowledge, ggmsa assists researchers in discovering MSA and making decisions. The ggmsa package is open-source software released under the Artistic-2.0 license, and it is freely available on Bioconductor (https://bioconductor.org/packages/ggmsa) and Github (https://github.com/YuLab-SMU/ggmsa).
Collapse
Affiliation(s)
- Lang Zhou
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China.,Division of Laboratory Medicine, Microbiome Medicine Center, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Tingze Feng
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Shuangbin Xu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Fangluan Gao
- Institute of Plant Virology, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Tommy T Lam
- State Key Laboratory of Emerging Infectious Diseases, School of Public Health, The University of Hong Kong, Hong Kong SAR, China.,Laboratory of Data Discovery for Health Limited, 19W Hong Kong Science & Technology Parks, Hong Kong SAR, China
| | - Qianwen Wang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China.,Centre for Soybean Research of the State Key Laboratory of Agrobiotechnology and School of Life Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
| | - Tianzhi Wu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Huina Huang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China.,Zhuhai International Travel Healthcare Center, Zhuhai, Guangdong, China
| | - Li Zhan
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Lin Li
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Yi Guan
- State Key Laboratory of Emerging Infectious Diseases, School of Public Health, The University of Hong Kong, Hong Kong SAR, China.,Joint Institute of Virology (Shantou University - The University of Hong Kong), Shantou University, Shantou, China
| | - Zehan Dai
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Guangchuang Yu
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China.,Division of Laboratory Medicine, Microbiome Medicine Center, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| |
Collapse
|
49
|
Oteri F, Sarti E, Nadalin F, Carbone A. iBIS2Analyzer: a web server for a phylogeny-driven coevolution analysis of protein families. Nucleic Acids Res 2022; 50:W412-W419. [PMID: 35670671 PMCID: PMC9252744 DOI: 10.1093/nar/gkac481] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 05/20/2022] [Accepted: 05/25/2022] [Indexed: 12/27/2022] Open
Abstract
Residue coevolution within and between proteins is used as a marker of physical interaction and/or residue functional cooperation. Pairs or groups of coevolving residues are extracted from multiple sequence alignments based on a variety of computational approaches. However, coevolution signals emerging in subsets of sequences might be lost if the full alignment is considered. iBIS2Analyzer is a web server dedicated to a phylogeny-driven coevolution analysis of protein families with different evolutionary pressure. It is based on the iterative version, iBIS2, of the coevolution analysis method BIS, Blocks in Sequences. iBIS2 is designed to iteratively select and analyse subtrees in phylogenetic trees, possibly large and comprising thousands of sequences. With iBIS2Analyzer, openly accessible at http://ibis2analyzer.lcqb.upmc.fr/, the user visualizes, compares and inspects clusters of coevolving residues by mapping them onto sequences, alignments or structures of choice, greatly simplifying downstream analysis steps. A rich and interactive graphic interface facilitates the biological interpretation of the results.
Collapse
Affiliation(s)
- Francesco Oteri
- Sorbonne Université, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France
| | - Edoardo Sarti
- Sorbonne Université, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France
| | - Francesca Nadalin
- Sorbonne Université, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France
| | - Alessandra Carbone
- Sorbonne Université, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France
| |
Collapse
|
50
|
Si Y, Yan C. Protein complex structure prediction powered by multiple sequence alignments of interologs from multiple taxonomic ranks and AlphaFold2. Brief Bioinform 2022; 23:6596987. [PMID: 35649388 DOI: 10.1093/bib/bbac208] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 04/17/2022] [Accepted: 05/05/2022] [Indexed: 12/19/2022] Open
Abstract
AlphaFold2 can predict protein complex structures as long as a multiple sequence alignment (MSA) of the interologs of the target protein-protein interaction (PPI) can be provided. In this study, a simplified phylogeny-based approach was applied to generate the MSA of interologs, which was then used as the input to AlphaFold2 for protein complex structure prediction. In this extensively benchmarked protocol on nonredundant PPI dataset, including 107 bacterial PPIs and 442 eukaryotic PPIs, we show complex structures of 79.5% of the bacterial PPIs and 49.8% of the eukaryotic PPIs can be successfully predicted, which yielded significantly better performance than the application of MSA of interologs prepared by two existing approaches. Considering PPIs may not be conserved in species with long evolutionary distances, we further restricted interologs in the MSA to different taxonomic ranks of the species of the target PPI in protein complex structure prediction. We found that the success rates can be increased to 87.9% for the bacterial PPIs and 56.3% for the eukaryotic PPIs if interologs in the MSA are restricted to a specific taxonomic rank of the species of each target PPI. Finally, we show that the optimal taxonomic ranks for protein complex structure prediction can be selected with the application of the predicted template modeling (TM) scores of the output models.
Collapse
Affiliation(s)
- Yunda Si
- School of Physics, Huazhong University of Science and Technology, China
| | - Chengfei Yan
- School of Physics, Huazhong University of Science and Technology, China
| |
Collapse
|