1
|
Soleymani F, Paquet E, Viktor HL, Michalowski W. Structure-based protein and small molecule generation using EGNN and diffusion models: A comprehensive review. Comput Struct Biotechnol J 2024; 23:2779-2797. [PMID: 39050782 PMCID: PMC11268121 DOI: 10.1016/j.csbj.2024.06.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 06/13/2024] [Accepted: 06/18/2024] [Indexed: 07/27/2024] Open
Abstract
Recent breakthroughs in deep learning have revolutionized protein sequence and structure prediction. These advancements are built on decades of protein design efforts, and are overcoming traditional time and cost limitations. Diffusion models, at the forefront of these innovations, significantly enhance design efficiency by automating knowledge acquisition. In the field of de novo protein design, the goal is to create entirely novel proteins with predetermined structures. Given the arbitrary positions of proteins in 3-D space, graph representations and their properties are widely used in protein generation studies. A critical requirement in protein modelling is maintaining spatial relationships under transformations (rotations, translations, and reflections). This property, known as equivariance, ensures that predicted protein characteristics adapt seamlessly to changes in orientation or position. Equivariant graph neural networks offer a solution to this challenge. By incorporating equivariant graph neural networks to learn the score of the probability density function in diffusion models, one can generate proteins with robust 3-D structural representations. This review examines the latest deep learning advancements, specifically focusing on frameworks that combine diffusion models with equivariant graph neural networks for protein generation.
Collapse
Affiliation(s)
- Farzan Soleymani
- Telfer School of Management, University of Ottawa, ON, K1N 6N5, Canada
| | - Eric Paquet
- National Research Council, 1200 Montreal Road, Ottawa, ON, K1A 0R6, Canada
- School of Electrical Engineering and Computer Science, University of Ottawa, ON, K1N 6N5, Canada
| | - Herna Lydia Viktor
- School of Electrical Engineering and Computer Science, University of Ottawa, ON, K1N 6N5, Canada
| | | |
Collapse
|
2
|
Liu H, Zhuo C, Gao J, Zeng C, Zhao Y. AI-integrated network for RNA complex structure and dynamic prediction. BIOPHYSICS REVIEWS 2024; 5:041304. [PMID: 39512332 PMCID: PMC11540444 DOI: 10.1063/5.0237319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2024] [Accepted: 10/15/2024] [Indexed: 11/15/2024]
Abstract
RNA complexes are essential components in many cellular processes. The functions of these complexes are linked to their tertiary structures, which are shaped by detailed interface information, such as binding sites, interface contact, and dynamic conformational changes. Network-based approaches have been widely used to analyze RNA complex structures. With their roots in the graph theory, these methods have a long history of providing insight into the static and dynamic properties of RNA molecules. These approaches have been effective in identifying functional binding sites and analyzing the dynamic behavior of RNA complexes. Recently, the advent of artificial intelligence (AI) has brought transformative changes to the field. These technologies have been increasingly applied to studying RNA complex structures, providing new avenues for understanding the complex interactions within RNA complexes. By integrating AI with traditional network analysis methods, researchers can build more accurate models of RNA complex structures, predict their dynamic behaviors, and even design RNA-based inhibitors. In this review, we introduce the integration of network-based methodologies with AI techniques to enhance the understanding of RNA complex structures. We examine how these advanced computational tools can be used to model and analyze the detailed interface information and dynamic behaviors of RNA molecules. Additionally, we explore the potential future directions of how AI-integrated networks can aid in the modeling and analyzing RNA complex structures.
Collapse
Affiliation(s)
- Haoquan Liu
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan 430079, China
| | - Chen Zhuo
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan 430079, China
| | - Jiaming Gao
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan 430079, China
| | - Chengwei Zeng
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan 430079, China
| | - Yunjie Zhao
- Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan 430079, China
| |
Collapse
|
3
|
Cuadrado AF, Van Damme D. Unlocking protein-protein interactions in plants: a comprehensive review of established and emerging techniques. JOURNAL OF EXPERIMENTAL BOTANY 2024; 75:5220-5236. [PMID: 38437582 DOI: 10.1093/jxb/erae088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 02/29/2024] [Indexed: 03/06/2024]
Abstract
Protein-protein interactions orchestrate plant development and serve as crucial elements for cellular and environmental communication. Understanding these interactions offers a gateway to unravel complex protein networks that will allow a better understanding of nature. Methods for the characterization of protein-protein interactions have been around over 30 years, yet the complexity of some of these interactions has fueled the development of new techniques that provide a better understanding of the underlying dynamics. In many cases, the application of these techniques is limited by the nature of the available sample. While some methods require an in vivo set-up, others solely depend on protein sequences to study protein-protein interactions via an in silico set-up. The vast number of techniques available to date calls for a way to select the appropriate tools for the study of specific interactions. Here, we classify widely spread tools and new emerging techniques for the characterization of protein-protein interactions based on sample requirements while providing insights into the information that they can potentially deliver. We provide a comprehensive overview of commonly used techniques and elaborate on the most recent developments, showcasing their implementation in plant research.
Collapse
Affiliation(s)
- Alvaro Furones Cuadrado
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, 9052 Ghent, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, 9052 Ghent, Belgium
| | - Daniël Van Damme
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, 9052 Ghent, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, 9052 Ghent, Belgium
| |
Collapse
|
4
|
Zhao N, Wu T, Wang W, Zhang L, Gong X. Review and Comparative Analysis of Methods and Advancements in Predicting Protein Complex Structure. Interdiscip Sci 2024; 16:261-288. [PMID: 38955920 DOI: 10.1007/s12539-024-00626-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 02/29/2024] [Accepted: 03/01/2024] [Indexed: 07/04/2024]
Abstract
Protein complexes perform diverse biological functions, and obtaining their three-dimensional structure is critical to understanding and grasping their functions. In many cases, it's not just two proteins interacting to form a dimer; instead, multiple proteins interact to form a multimer. Experimentally resolving protein complex structures can be quite challenging. Recently, there have been efforts and methods that build upon prior predictions of dimer structures to attempt to predict multimer structures. However, in comparison to monomeric protein structure prediction, the accuracy of protein complex structure prediction remains relatively low. This paper provides an overview of recent advancements in efficient computational models for predicting protein complex structures. We introduce protein-protein docking methods in detail and summarize their main ideas, applicable modes, and related information. To enhance prediction accuracy, other critical protein-related information is also integrated, such as predicting interchain residue contact, utilizing experimental data like cryo-EM experiments, and considering protein interactions and non-interactions. In addition, we comprehensively review computational approaches for end-to-end prediction of protein complex structures based on artificial intelligence (AI) technology and describe commonly used datasets and representative evaluation metrics in protein complexes. Finally, we analyze the formidable challenges faced in current protein complex structure prediction tasks, including the structure prediction of heteromeric complex, disordered regions in complex, antibody-antigen complex, and RNA-related complex, as well as the evaluation metrics for complex assessment. We hope that this work will provide comprehensive knowledge of complex structure predictions to contribute to future advanced predictions.
Collapse
Affiliation(s)
- Nan Zhao
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Tong Wu
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Wenda Wang
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Lunchuan Zhang
- School of Mathematics, Renmin University of China, Beijing, 100872, China.
| | - Xinqi Gong
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China.
- School of Mathematics, Renmin University of China, Beijing, 100872, China.
- Beijing Academy of Artificial Intelligence, Beijing, 100084, China.
| |
Collapse
|
5
|
Zhao H, Petrey D, Murray D, Honig B. ZEPPI: Proteome-scale sequence-based evaluation of protein-protein interaction models. Proc Natl Acad Sci U S A 2024; 121:e2400260121. [PMID: 38743624 PMCID: PMC11127014 DOI: 10.1073/pnas.2400260121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 04/18/2024] [Indexed: 05/16/2024] Open
Abstract
We introduce ZEPPI (Z-score Evaluation of Protein-Protein Interfaces), a framework to evaluate structural models of a complex based on sequence coevolution and conservation involving residues in protein-protein interfaces. The ZEPPI score is calculated by comparing metrics for an interface to those obtained from randomly chosen residues. Since contacting residues are defined by the structural model, this obviates the need to account for indirect interactions. Further, although ZEPPI relies on species-paired multiple sequence alignments, its focus on interfacial residues allows it to leverage quite shallow alignments. ZEPPI can be implemented on a proteome-wide scale and is applied here to millions of structural models of dimeric complexes in the Escherichia coli and human interactomes found in the PrePPI database. PrePPI's scoring function is based primarily on the evaluation of protein-protein interfaces, and ZEPPI adds a new feature to this analysis through the incorporation of evolutionary information. ZEPPI performance is evaluated through applications to experimentally determined complexes and to decoys from the CASP-CAPRI experiment. As we discuss, the standard CAPRI scores used to evaluate docking models are based on model quality and not on the ability to give yes/no answers as to whether two proteins interact. ZEPPI is able to detect weak signals from PPI models that the CAPRI scores define as incorrect and, similarly, to identify potential PPIs defined as low confidence by the current PrePPI scoring function. A number of examples that illustrate how the combination of PrePPI and ZEPPI can yield functional hypotheses are provided.
Collapse
Affiliation(s)
- Haiqing Zhao
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY10032
| | - Donald Petrey
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY10032
| | - Diana Murray
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY10032
| | - Barry Honig
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY10032
- Department of Biochemistry and Molecular Biophysics, Columbia University Irving Medical Center, New York, NY10032
- Department of Medicine, Columbia University, New York, NY10032
- Zuckerman Institute, Columbia University, New York, NY10027
| |
Collapse
|
6
|
Omelchenko AA, Siwek JC, Chhibbar P, Arshad S, Nazarali I, Nazarali K, Rosengart A, Rahimikollu J, Tilstra J, Shlomchik MJ, Koes DR, Joglekar AV, Das J. Sliding Window INteraction Grammar (SWING): a generalized interaction language model for peptide and protein interactions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.01.592062. [PMID: 38746274 PMCID: PMC11092674 DOI: 10.1101/2024.05.01.592062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
The explosion of sequence data has allowed the rapid growth of protein language models (pLMs). pLMs have now been employed in many frameworks including variant-effect and peptide-specificity prediction. Traditionally, for protein-protein or peptide-protein interactions (PPIs), corresponding sequences are either co-embedded followed by post-hoc integration or the sequences are concatenated prior to embedding. Interestingly, no method utilizes a language representation of the interaction itself. We developed an interaction LM (iLM), which uses a novel language to represent interactions between protein/peptide sequences. Sliding Window Interaction Grammar (SWING) leverages differences in amino acid properties to generate an interaction vocabulary. This vocabulary is the input into a LM followed by a supervised prediction step where the LM's representations are used as features. SWING was first applied to predicting peptide:MHC (pMHC) interactions. SWING was not only successful at generating Class I and Class II models that have comparable prediction to state-of-the-art approaches, but the unique Mixed Class model was also successful at jointly predicting both classes. Further, the SWING model trained only on Class I alleles was predictive for Class II, a complex prediction task not attempted by any existing approach. For de novo data, using only Class I or Class II data, SWING also accurately predicted Class II pMHC interactions in murine models of SLE (MRL/lpr model) and T1D (NOD model), that were validated experimentally. To further evaluate SWING's generalizability, we tested its ability to predict the disruption of specific protein-protein interactions by missense mutations. Although modern methods like AlphaMissense and ESM1b can predict interfaces and variant effects/pathogenicity per mutation, they are unable to predict interaction-specific disruptions. SWING was successful at accurately predicting the impact of both Mendelian mutations and population variants on PPIs. This is the first generalizable approach that can accurately predict interaction-specific disruptions by missense mutations with only sequence information. Overall, SWING is a first-in-class generalizable zero-shot iLM that learns the language of PPIs.
Collapse
Affiliation(s)
- Alisa A. Omelchenko
- Center for Systems immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, PA, USA
- The joint CMU-Pitt PhD program in computational biology, School of Medicine, University of Pittsburgh, PA, USA
| | - Jane C. Siwek
- Center for Systems immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, PA, USA
- The joint CMU-Pitt PhD program in computational biology, School of Medicine, University of Pittsburgh, PA, USA
| | - Prabal Chhibbar
- Center for Systems immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Integrative systems biology PhD program, School of Medicine, University of Pittsburgh, PA, USA
| | - Sanya Arshad
- Center for Systems immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Iliyan Nazarali
- Center for Systems immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Kiran Nazarali
- Center for Systems immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - AnnaElaine Rosengart
- Center for Systems immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Javad Rahimikollu
- Center for Systems immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, PA, USA
- The joint CMU-Pitt PhD program in computational biology, School of Medicine, University of Pittsburgh, PA, USA
| | - Jeremy Tilstra
- Department of Immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Division of Rheumatology and Clinical Immunology, Department of Medicine, School of Medicine, University of Pittsburgh, PA, USA
| | - Mark J. Shlomchik
- Department of Immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - David R. Koes
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, PA, USA
| | - Alok V. Joglekar
- Center for Systems immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, PA, USA
| | - Jishnu Das
- Center for Systems immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, PA, USA
| |
Collapse
|
7
|
MacGowan SA, Madeira F, Britto-Borges T, Barton GJ. A unified analysis of evolutionary and population constraint in protein domains highlights structural features and pathogenic sites. Commun Biol 2024; 7:447. [PMID: 38605212 PMCID: PMC11009406 DOI: 10.1038/s42003-024-06117-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 03/27/2024] [Indexed: 04/13/2024] Open
Abstract
Protein evolution is constrained by structure and function, creating patterns in residue conservation that are routinely exploited to predict structure and other features. Similar constraints should affect variation across individuals, but it is only with the growth of human population sequencing that this has been tested at scale. Now, human population constraint has established applications in pathogenicity prediction, but it has not yet been explored for structural inference. Here, we map 2.4 million population variants to 5885 protein families and quantify residue-level constraint with a new Missense Enrichment Score (MES). Analysis of 61,214 structures from the PDB spanning 3661 families shows that missense depleted sites are enriched in buried residues or those involved in small-molecule or protein binding. MES is complementary to evolutionary conservation and a combined analysis allows a new classification of residues according to a conservation plane. This approach finds functional residues that are evolutionarily diverse, which can be related to specificity, as well as family-wide conserved sites that are critical for folding or function. We also find a possible contrast between lethal and non-lethal pathogenic sites, and a surprising clinical variant hot spot at a subset of missense enriched positions.
Collapse
Affiliation(s)
- Stuart A MacGowan
- Division of Computational Biology School of Life Sciences University of Dundee, Dow Street Dundee, DD1 5EH, Scotland, UK
| | - Fábio Madeira
- Division of Computational Biology School of Life Sciences University of Dundee, Dow Street Dundee, DD1 5EH, Scotland, UK
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Thiago Britto-Borges
- Division of Computational Biology School of Life Sciences University of Dundee, Dow Street Dundee, DD1 5EH, Scotland, UK
- Section of Bioinformatics and Systems Cardiology, Department of Internal Medicine III and Klaus Tschira Institute for Integrative Computational Cardiology, Heidelberg University Hospital, Heidelberg, Germany
| | - Geoffrey J Barton
- Division of Computational Biology School of Life Sciences University of Dundee, Dow Street Dundee, DD1 5EH, Scotland, UK.
| |
Collapse
|
8
|
Si Y, Yan C. Protein language model-embedded geometric graphs power inter-protein contact prediction. eLife 2024; 12:RP92184. [PMID: 38564241 PMCID: PMC10987090 DOI: 10.7554/elife.92184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024] Open
Abstract
Accurate prediction of contacting residue pairs between interacting proteins is very useful for structural characterization of protein-protein interactions. Although significant improvement has been made in inter-protein contact prediction recently, there is still a large room for improving the prediction accuracy. Here we present a new deep learning method referred to as PLMGraph-Inter for inter-protein contact prediction. Specifically, we employ rotationally and translationally invariant geometric graphs obtained from structures of interacting proteins to integrate multiple protein language models, which are successively transformed by graph encoders formed by geometric vector perceptrons and residual networks formed by dimensional hybrid residual blocks to predict inter-protein contacts. Extensive evaluation on multiple test sets illustrates that PLMGraph-Inter outperforms five top inter-protein contact prediction methods, including DeepHomo, GLINTER, CDPred, DeepHomo2, and DRN-1D2D_Inter, by large margins. In addition, we also show that the prediction of PLMGraph-Inter can complement the result of AlphaFold-Multimer. Finally, we show leveraging the contacts predicted by PLMGraph-Inter as constraints for protein-protein docking can dramatically improve its performance for protein complex structure prediction.
Collapse
Affiliation(s)
- Yunda Si
- School of Physics, Huazhong University of Science and TechnologyWuhanChina
| | - Chengfei Yan
- School of Physics, Huazhong University of Science and TechnologyWuhanChina
| |
Collapse
|
9
|
Lin P, Li H, Huang SY. Deep learning in modeling protein complex structures: From contact prediction to end-to-end approaches. Curr Opin Struct Biol 2024; 85:102789. [PMID: 38402744 DOI: 10.1016/j.sbi.2024.102789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 01/16/2024] [Accepted: 02/06/2024] [Indexed: 02/27/2024]
Abstract
Protein-protein interactions play crucial roles in many biological processes. Traditionally, protein complex structures are normally built by protein-protein docking. With the rapid development of artificial intelligence and its great success in monomer protein structure prediction, deep learning has widely been applied to modeling protein-protein complex structures through inter-protein contact prediction and end-to-end approaches in the past few years. This article reviews the recent advances of deep-learning-based approaches in modeling protein-protein complex structures as well as their advantages and limitations. Challenges and possible future directions are also briefly discussed in applying deep learning for the prediction of protein complex structures.
Collapse
Affiliation(s)
- Peicong Lin
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, PR China
| | - Hao Li
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, PR China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, PR China.
| |
Collapse
|
10
|
Bibik P, Alibai S, Pandini A, Dantu SC. PyCoM: a python library for large-scale analysis of residue-residue coevolution data. Bioinformatics 2024; 40:btae166. [PMID: 38532297 PMCID: PMC11009027 DOI: 10.1093/bioinformatics/btae166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 02/02/2024] [Accepted: 03/25/2024] [Indexed: 03/28/2024] Open
Abstract
MOTIVATION Computational methods to detect correlated amino acid positions in proteins have become a valuable tool to predict intra- and inter-residue protein contacts, protein structures, and effects of mutation on protein stability and function. While there are many tools and webservers to compute coevolution scoring matrices, there is no central repository of alignments and coevolution matrices for large-scale studies and pattern detection leveraging on biological and structural annotations already available in UniProt. RESULTS We present a Python library, PyCoM, which enables users to query and analyze coevolution matrices and sequence alignments of 457 622 proteins, selected from UniProtKB/Swiss-Prot database (length ≤ 500 residues), from a precompiled coevolution matrix database (PyCoMdb). PyCoM facilitates the development of statistical analyses of residue coevolution patterns using filters on biological and structural annotations from UniProtKB/Swiss-Prot, with simple access to PyCoMdb for both novice and advanced users, supporting Jupyter Notebooks, Python scripts, and a web API access. The resource is open source and will help in generating data-driven computational models and methods to study and understand protein structures, stability, function, and design. AVAILABILITY AND IMPLEMENTATION PyCoM code is freely available from https://github.com/scdantu/pycom and PyCoMdb and the Jupyter Notebook tutorials are freely available from https://pycom.brunel.ac.uk.
Collapse
Affiliation(s)
- Philipp Bibik
- Department of Computer Science, Brunel University London, Uxbridge UB8 3PH, United Kingdom
| | - Sabriyeh Alibai
- Department of Computer Science, Brunel University London, Uxbridge UB8 3PH, United Kingdom
| | - Alessandro Pandini
- Department of Computer Science, Brunel University London, Uxbridge UB8 3PH, United Kingdom
| | - Sarath Chandra Dantu
- Department of Computer Science, Brunel University London, Uxbridge UB8 3PH, United Kingdom
| |
Collapse
|
11
|
Judge A, Sankaran B, Hu L, Palaniappan M, Birgy A, Prasad BVV, Palzkill T. Network of epistatic interactions in an enzyme active site revealed by large-scale deep mutational scanning. Proc Natl Acad Sci U S A 2024; 121:e2313513121. [PMID: 38483989 PMCID: PMC10962969 DOI: 10.1073/pnas.2313513121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 02/14/2024] [Indexed: 03/19/2024] Open
Abstract
Cooperative interactions between amino acids are critical for protein function. A genetic reflection of cooperativity is epistasis, which is when a change in the amino acid at one position changes the sequence requirements at another position. To assess epistasis within an enzyme active site, we utilized CTX-M β-lactamase as a model system. CTX-M hydrolyzes β-lactam antibiotics to provide antibiotic resistance, allowing a simple functional selection for rapid sorting of modified enzymes. We created all pairwise mutations across 17 active site positions in the β-lactamase enzyme and quantitated the function of variants against two β-lactam antibiotics using next-generation sequencing. Context-dependent sequence requirements were determined by comparing the antibiotic resistance function of double mutations across the CTX-M active site to their predicted function based on the constituent single mutations, revealing both positive epistasis (synergistic interactions) and negative epistasis (antagonistic interactions) between amino acid substitutions. The resulting trends demonstrate that positive epistasis is present throughout the active site, that epistasis between residues is mediated through substrate interactions, and that residues more tolerant to substitutions serve as generic compensators which are responsible for many cases of positive epistasis. Additionally, we show that a key catalytic residue (Glu166) is amenable to compensatory mutations, and we characterize one such double mutant (E166Y/N170G) that acts by an altered catalytic mechanism. These findings shed light on the unique biochemical factors that drive epistasis within an enzyme active site and will inform enzyme engineering efforts by bridging the gap between amino acid sequence and catalytic function.
Collapse
Affiliation(s)
- Allison Judge
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
| | - Banumathi Sankaran
- Department of Molecular Biophysics and Integrated Bioimaging, Berkeley Center for Structural Biology Lawrence Berkeley National Laboratory, Berkeley, CA94720
| | - Liya Hu
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
| | - Murugesan Palaniappan
- Department of Pathology and Immunology, Center for Drug Discovery, Baylor College of Medicine, Houston, TX77030
| | - André Birgy
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
- Infections, Antimicrobials, Modelling, Evolution, UMR 1137, French Insitute for Medical Research (INSERM), Faculty of Health, Université Paris Cité, Paris75006, France
| | - B. V. Venkataram Prasad
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
| | - Timothy Palzkill
- Verna and Marrs McLean Department of Biochemistry and Molecular Pharmacology, Baylor College of Medicine, Houston, TX77030
| |
Collapse
|
12
|
Peng CX, Liang F, Xia YH, Zhao KL, Hou MH, Zhang GJ. Recent Advances and Challenges in Protein Structure Prediction. J Chem Inf Model 2024; 64:76-95. [PMID: 38109487 DOI: 10.1021/acs.jcim.3c01324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2023]
Abstract
Artificial intelligence has made significant advances in the field of protein structure prediction in recent years. In particular, DeepMind's end-to-end model, AlphaFold2, has demonstrated the capability to predict three-dimensional structures of numerous unknown proteins with accuracy levels comparable to those of experimental methods. This breakthrough has opened up new possibilities for understanding protein structure and function as well as accelerating drug discovery and other applications in the field of biology and medicine. Despite the remarkable achievements of artificial intelligence in the field, there are still some challenges and limitations. In this Review, we discuss the recent progress and some of the challenges in protein structure prediction. These challenges include predicting multidomain protein structures, protein complex structures, multiple conformational states of proteins, and protein folding pathways. Furthermore, we highlight directions in which further improvements can be conducted.
Collapse
Affiliation(s)
- Chun-Xiang Peng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Fang Liang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Yu-Hao Xia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Kai-Long Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Ming-Hua Hou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
13
|
Krupa MA, Krupa P. Free-Docking and Template-Based Docking: Physics Versus Knowledge-Based Docking. Methods Mol Biol 2024; 2780:27-41. [PMID: 38987462 DOI: 10.1007/978-1-0716-3985-6_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Docking methods can be used to predict the orientations of two or more molecules with respect of each other using a plethora of various algorithms, which can be based on the physics of interactions or can use information from databases and templates. The usability of these approaches depends on the type and size of the molecules, whose relative orientation will be estimated. The two most important limitations are (i) the computational cost of the prediction and (ii) the availability of the structural information for similar complexes. In general, if there is enough information about similar systems, knowledge-based and template-based methods can significantly reduce the computational cost while providing high accuracy of the prediction. However, if the information about the system topology and interactions between its partners is scarce, physics-based methods are more reliable or even the only choice. In this chapter, knowledge-, template-, and physics-based methods will be compared and briefly discussed providing examples of their usability with a special emphasis on physics-based protein-protein, protein-peptide, and protein-fullerene docking in the UNRES coarse-grained model.
Collapse
Affiliation(s)
- Magdalena A Krupa
- Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland
| | - Paweł Krupa
- Institute of Physics, Polish Academy of Sciences, Warsaw, Poland.
| |
Collapse
|
14
|
Ozden B, Kryshtafovych A, Karaca E. The impact of AI-based modeling on the accuracy of protein assembly prediction: Insights from CASP15. Proteins 2023; 91:1636-1657. [PMID: 37861057 PMCID: PMC10873090 DOI: 10.1002/prot.26598] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 09/12/2023] [Accepted: 09/14/2023] [Indexed: 10/21/2023]
Abstract
In CASP15, 87 predictors submitted around 11 000 models on 41 assembly targets. The community demonstrated exceptional performance in overall fold and interface contact predictions, achieving an impressive success rate of 90% (compared to 31% in CASP14). This remarkable accomplishment is largely due to the incorporation of DeepMind's AF2-Multimer approach into custom-built prediction pipelines. To evaluate the added value of participating methods, we compared the community models to the baseline AF2-Multimer predictor. In over 1/3 of cases, the community models were superior to the baseline predictor. The main reasons for this improved performance were the use of custom-built multiple sequence alignments, optimized AF2-Multimer sampling, and the manual assembly of AF2-Multimer-built subcomplexes. The best three groups, in order, are Zheng, Venclovas, and Wallner. Zheng and Venclovas reached a 73.2% success rate over all (41) cases, while Wallner attained 69.4% success rate over 36 cases. Nonetheless, challenges remain in predicting structures with weak evolutionary signals, such as nanobody-antigen, antibody-antigen, and viral complexes. Expectedly, modeling large complexes also remains challenging due to their high memory compute demands. In addition to the assembly category, we assessed the accuracy of modeling interdomain interfaces in the tertiary structure prediction targets. Models on seven targets featuring 17 unique interfaces were analyzed. Best predictors achieved a 76.5% success rate, with the UM-TBM group being the leader. In the interdomain category, we observed that the predictors faced challenges, as in the case of the assembly category, when the evolutionary signal for a given domain pair was weak or the structure was large. Overall, CASP15 witnessed unprecedented improvement in interface modeling, reflecting the AI revolution seen in CASP14.
Collapse
Affiliation(s)
- Burcu Ozden
- Izmir Biomedicine and Genome Center, Izmir, Türkiye
- Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir, Türkiye
| | - Andriy Kryshtafovych
- Protein Structure Prediction Center, Genome and Biomedical Sciences Facilities, University of California, Davis, California, USA
| | - Ezgi Karaca
- Izmir Biomedicine and Genome Center, Izmir, Türkiye
- Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir, Türkiye
| |
Collapse
|
15
|
Cao W, Wu LY, Xia XY, Chen X, Wang ZX, Pan XM. A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins. Sci Rep 2023; 13:20304. [PMID: 37985846 PMCID: PMC10662474 DOI: 10.1038/s41598-023-47496-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 11/14/2023] [Indexed: 11/22/2023] Open
Abstract
Because of the limited effectiveness of prevailing phylogenetic methods when applied to highly divergent protein sequences, the phylogenetic analysis problem remains challenging. Here, we propose a sequence-based evolutionary distance algorithm termed sequence distance (SD), which innovatively incorporates site-to-site correlation within protein sequences into the distance estimation. In protein superfamilies, SD can effectively distinguish evolutionary relationships both within and between protein families, producing phylogenetic trees that closely align with those based on structural information, even with sequence identity less than 20%. SD is highly correlated with the similarity of the protein structure, and can calculate evolutionary distances for thousands of protein pairs within seconds using a single CPU, which is significantly faster than most protein structure prediction methods that demand high computational resources and long run times. The development of SD will significantly advance phylogenetics, providing researchers with a more accurate and reliable tool for exploring evolutionary relationships.
Collapse
Affiliation(s)
- Wei Cao
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Lu-Yun Wu
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Xia-Yu Xia
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Xiang Chen
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Zhi-Xin Wang
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China.
| | - Xian-Ming Pan
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
16
|
Fongang B, Wadop YN, Zhu Y, Wagner EJ, Kudlicki A, Rowicka M. Coevolution combined with molecular dynamics simulations provides structural and mechanistic insights into the interactions between the integrator complex subunits. Comput Struct Biotechnol J 2023; 21:5686-5697. [PMID: 38074468 PMCID: PMC10700540 DOI: 10.1016/j.csbj.2023.11.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 11/10/2023] [Accepted: 11/10/2023] [Indexed: 01/18/2024] Open
Abstract
Finding the 3D structure of large, multi-subunit complexes is difficult, despite recent advances in cryo-EM technology, due to remaining challenges to expressing and purifying subunits. Computational approaches that predict protein-protein interactions, including Direct Coupling Analysis (DCA), represent an attractive alternative for dissecting interactions within protein complexes. However, they are readily applicable only to small proteins due to high computational complexity and a high number of false positives. To solve this problem, we proposed a modified DCA approach, a powerful tool to predict the most likely interfaces of protein complexes. Since our modified approach cannot provide structural and mechanistic details of interacting peptides, we combine it with Molecular Dynamics (MD) simulations. To illustrate this novel approach, we predict interacting domains and structural details of interactions of two Integrator complex subunits, INTS9 and INTS11. Our predictions of interacting residues of INTS9/INTS11 are highly consistent with crystallographic structure. We then expand our procedure to two complexes whose structures are not well-studied: 1) The heterodimer formed by the Cleavage and Polyadenylation Specificity Factor 100-kD (CPSF100) and 73-kD (CPSF73); 2) The heterotrimer formed by INTS4/INTS9/INTS11. Experimental data supports our predictions of interactions within these two complexes, demonstrating that combining DCA and MD simulations is a powerful approach to revealing structural insights of large protein complexes.
Collapse
Affiliation(s)
- Bernard Fongang
- Glenn Biggs Institute for Alzheimer's & Neurodegenerative Diseases, The University of Texas Health Science Center at San Antonio, San Antonio, TX, United States
- Department of Biochemistry and Structural Biology, The University of Texas Health Science Center at San Antonio, San Antonio, TX, United States
- Department of Population Health Sciences, The University of Texas Health Science Center at San Antonio, San Antonio, TX, United States
- Institute for Translational Sciences, The University of Texas Medical Branch, Galveston, TX, United States
| | - Yannick N. Wadop
- Glenn Biggs Institute for Alzheimer's & Neurodegenerative Diseases, The University of Texas Health Science Center at San Antonio, San Antonio, TX, United States
- Institute for Translational Sciences, The University of Texas Medical Branch, Galveston, TX, United States
| | - Yingjie Zhu
- Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, Galveston, TX, United States
- Institute for Translational Sciences, The University of Texas Medical Branch, Galveston, TX, United States
| | - Eric J. Wagner
- Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, Galveston, TX, United States
- Department of Biochemistry and Biophysics, The University of Rochester Medical Center, Rochester, NY, United States
- Institute for Translational Sciences, The University of Texas Medical Branch, Galveston, TX, United States
| | - Andrzej Kudlicki
- Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, Galveston, TX, United States
- Institute for Translational Sciences, The University of Texas Medical Branch, Galveston, TX, United States
- Informatics Service Center, The University of Texas Medical Branch, Galveston, TX, United States
| | - Maga Rowicka
- Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, Galveston, TX, United States
- Institute for Translational Sciences, The University of Texas Medical Branch, Galveston, TX, United States
| |
Collapse
|
17
|
Zhang H, Quadeer AA, McKay MR. Direct-acting antiviral resistance of Hepatitis C virus is promoted by epistasis. Nat Commun 2023; 14:7457. [PMID: 37978179 PMCID: PMC10656532 DOI: 10.1038/s41467-023-42550-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 10/13/2023] [Indexed: 11/19/2023] Open
Abstract
Direct-acting antiviral agents (DAAs) provide efficacious therapeutic treatments for chronic Hepatitis C virus (HCV) infection. However, emergence of drug resistance mutations (DRMs) can greatly affect treatment outcomes and impede virological cure. While multiple DRMs have been observed for all currently used DAAs, the evolutionary determinants of such mutations are not currently well understood. Here, by considering DAAs targeting the nonstructural 3 (NS3) protein of HCV, we present results suggesting that epistasis plays an important role in the evolution of DRMs. Employing a sequence-based fitness landscape model whose predictions correlate highly with experimental data, we identify specific DRMs that are associated with strong epistatic interactions, and these are found to be enriched in multiple NS3-specific DAAs. Evolutionary modelling further supports that the identified DRMs involve compensatory mutational interactions that facilitate relatively easy escape from drug-induced selection pressures. Our results indicate that accounting for epistasis is important for designing future HCV NS3-targeting DAAs.
Collapse
Affiliation(s)
- Hang Zhang
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong SAR, China
| | - Ahmed Abdul Quadeer
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong SAR, China.
| | - Matthew R McKay
- Department of Electrical and Electronic Engineering, University of Melbourne, Melbourne, VIC, Australia.
- Department of Microbiology and Immunology, University of Melbourne, at The Peter Doherty Institute for Infection and Immunity, Melbourne, VIC, Australia.
| |
Collapse
|
18
|
Khumukcham SS, Penugurti V, Bugide S, Dwivedi A, Kumari A, Kesavan PS, Kalali S, Mishra YG, Ramesh VA, Nagarajaram HA, Mazumder A, Manavathi B. HPIP and RUFY3 are noncanonical guanine nucleotide exchange factors of Rab5 to regulate endocytosis-coupled focal adhesion turnover. J Biol Chem 2023; 299:105311. [PMID: 37797694 PMCID: PMC10641178 DOI: 10.1016/j.jbc.2023.105311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2023] [Revised: 09/01/2023] [Accepted: 09/15/2023] [Indexed: 10/07/2023] Open
Abstract
While the role of endocytosis in focal adhesion turnover-coupled cell migration has been established in addition to its conventional role in cellular functions, the molecular regulators and precise molecular mechanisms that underlie this process remain largely unknown. In this study, we report that proto-oncoprotein hematopoietic PBX-interacting protein (HPIP) localizes to focal adhesions as well as endosomal compartments along with RUN FYVE domain-containing protein 3 (RUFY3) and Rab5, an early endosomal protein. HPIP contains two coiled-coil domains (CC1 and CC2) that are necessary for its association with Rab5 and RUFY3 as CC domain double mutant, that is, mtHPIPΔCC1-2 failed to support it. Furthermore, we show that HPIP and RUFY3 activate Rab5 by serving as noncanonical guanine nucleotide exchange factors of Rab5. In support of this, either deletion of coiled-coil domains or silencing of HPIP or RUFY3 impairs Rab5 activation and Rab5-dependent cell migration. Mechanistic studies further revealed that loss of HPIP or RUFY3 expression severely impairs Rab5-mediated focal adhesion disassembly, FAK activation, fibronectin-associated-β1 integrin trafficking, and thus cell migration. Together, this study underscores the importance of HPIP and RUFY3 as noncanonical guanine nucleotide exchange factors of Rab5 and in integrin trafficking and focal adhesion turnover, which implicates in cell migration.
Collapse
Affiliation(s)
| | - Vasudevarao Penugurti
- Department of Biochemistry, School of Life Sciences, University of Hyderabad, Hyderabad, Telangana, India
| | - Suresh Bugide
- Department of Biochemistry, School of Life Sciences, University of Hyderabad, Hyderabad, Telangana, India
| | - Anju Dwivedi
- Department of Biochemistry, School of Life Sciences, University of Hyderabad, Hyderabad, Telangana, India
| | - Anita Kumari
- Department of Biochemistry, School of Life Sciences, University of Hyderabad, Hyderabad, Telangana, India
| | - P S Kesavan
- Department of Biological Sciences, Tata Institute of Fundamental Research (TIFR), Hyderabad, Telangana, India
| | - Sruchytha Kalali
- Department of Biochemistry, School of Life Sciences, University of Hyderabad, Hyderabad, Telangana, India
| | - Yasaswi Gayatri Mishra
- Department of Biochemistry, School of Life Sciences, University of Hyderabad, Hyderabad, Telangana, India
| | - Vakkalagadda A Ramesh
- Laboratory of Computational Biology, Centre for DNA Finger Printing and Diagnostics (CDFD), Hyderabad, Telangana, India; Laboratory of Computational Biology, Manipal Academy of Higher Education, Manipal, Karnataka, India
| | | | - Aprotim Mazumder
- Department of Biological Sciences, Tata Institute of Fundamental Research (TIFR), Hyderabad, Telangana, India
| | - Bramanandam Manavathi
- Department of Biochemistry, School of Life Sciences, University of Hyderabad, Hyderabad, Telangana, India.
| |
Collapse
|
19
|
Liu Z, Zhu YH, Shen LC, Xiao X, Qiu WR, Yu DJ. Integrating unsupervised language model with multi-view multiple sequence alignments for high-accuracy inter-chain contact prediction. Comput Biol Med 2023; 166:107529. [PMID: 37748220 DOI: 10.1016/j.compbiomed.2023.107529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 08/30/2023] [Accepted: 09/19/2023] [Indexed: 09/27/2023]
Abstract
Accurate identification of inter-chain contacts in the protein complex is critical to determine the corresponding 3D structures and understand the biological functions. We proposed a new deep learning method, ICCPred, to deduce the inter-chain contacts from the amino acid sequences of the protein complex. This pipeline was built on the designed deep residual network architecture, integrating the pre-trained language model with three multiple sequence alignments (MSAs) from different biological views. Experimental results on 709 non-redundant benchmarking protein complexes showed that the proposed ICCPred significantly increased inter-chain contact prediction accuracy compared to the state-of-the-art approaches. Detailed data analyses showed that the significant advantage of ICCPred lies in the utilization of pre-trained transformer language models which can effectively extract the complementary co-evolution diversity from three MSAs. Meanwhile, the designed deep residual network enhances the correlation between the co-evolution diversity and the patterns of inter-chain contacts. These results demonstrated a new avenue for high-accuracy deep-learning inter-chain contact prediction that is applicable to large-scale protein-protein interaction annotations from sequence alone.
Collapse
Affiliation(s)
- Zi Liu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing, 210094, China; Computer Department, Jingdezhen Ceramic University, Jingdezhen, 333403 , China
| | - Yi-Heng Zhu
- College of Artificial Intelligence, Nanjing Agricultural University, Nanjing, 210095 , China
| | - Long-Chen Shen
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing, 210094, China
| | - Xuan Xiao
- Computer Department, Jingdezhen Ceramic University, Jingdezhen, 333403 , China
| | - Wang-Ren Qiu
- Computer Department, Jingdezhen Ceramic University, Jingdezhen, 333403 , China.
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing, 210094, China.
| |
Collapse
|
20
|
Kilian M, Bischofs IB. Co-evolution at protein-protein interfaces guides inference of stoichiometry of oligomeric protein complexes by de novo structure prediction. Mol Microbiol 2023; 120:763-782. [PMID: 37777474 DOI: 10.1111/mmi.15169] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 09/10/2023] [Accepted: 09/11/2023] [Indexed: 10/02/2023]
Abstract
The quaternary structure with specific stoichiometry is pivotal to the specific function of protein complexes. However, determining the structure of many protein complexes experimentally remains a major bottleneck. Structural bioinformatics approaches, such as the deep learning algorithm Alphafold2-multimer (AF2-multimer), leverage the co-evolution of amino acids and sequence-structure relationships for accurate de novo structure and contact prediction. Pseudo-likelihood maximization direct coupling analysis (plmDCA) has been used to detect co-evolving residue pairs by statistical modeling. Here, we provide evidence that combining both methods can be used for de novo prediction of the quaternary structure and stoichiometry of a protein complex. We achieve this by augmenting the existing AF2-multimer confidence metrics with an interpretable score to identify the complex with an optimal fraction of native contacts of co-evolving residue pairs at intermolecular interfaces. We use this strategy to predict the quaternary structure and non-trivial stoichiometries of Bacillus subtilis spore germination protein complexes with unknown structures. Co-evolution at intermolecular interfaces may therefore synergize with AI-based de novo quaternary structure prediction of structurally uncharacterized bacterial protein complexes.
Collapse
Affiliation(s)
- Max Kilian
- Max-Planck-Institute for Terrestrial Microbiology, Marburg, Germany
- BioQuant Center for Quantitative Analysis of Molecular and Cellular Biosystems, Heidelberg University, Heidelberg, Germany
- Center for Molecular Biology of Heidelberg University (ZMBH), Heidelberg, Germany
| | - Ilka B Bischofs
- Max-Planck-Institute for Terrestrial Microbiology, Marburg, Germany
- BioQuant Center for Quantitative Analysis of Molecular and Cellular Biosystems, Heidelberg University, Heidelberg, Germany
- Center for Molecular Biology of Heidelberg University (ZMBH), Heidelberg, Germany
| |
Collapse
|
21
|
van Keulen SC, Bonvin AMJJ. Improving the quality of co-evolution intermolecular contact prediction with DisVis. Proteins 2023; 91:1407-1416. [PMID: 37237441 DOI: 10.1002/prot.26514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Revised: 03/29/2023] [Accepted: 04/19/2023] [Indexed: 05/28/2023]
Abstract
The steep rise in protein sequences and structures has paved the way for bioinformatics approaches to predict residue-residue interactions in protein complexes. Multiple sequence alignments are commonly used in contact predictions to identify co-evolving residues. These contacts, however, often include false positives (FPs), which may impair their use to predict three dimensional structures of biomolecular complexes and affect the accuracy of the generated models. Previously, we have developed DisVis to identify FP in mass spectrometry cross-linking data. DisVis allows to assess the accessible interaction space between two proteins consistent with a set of distance restraints. Here, we investigate if a similar approach could be applied to co-evolution predicted contacts in order to improve their precision prior to using them for modeling. We analyze co-evolution contact predictions with DisVis for a set of 26 protein-protein complexes. The DisVis-reranked and the original co-evolution contacts are then used to model the complexes with our integrative docking software HADDOCK using different filtering scenarios. Our results show that HADDOCK is robust with respect to the precision of the predicted contacts due to the 50% random contact removal during docking and can enhance the quality of docking predictions when combined with DisVis filtering for low precision contact data. DisVis can thus have a beneficial effect on low quality data, but overall HADDOCK can accommodate FP restraints without negatively impacting the quality of the resulting models. Other more precision-sensitive docking protocols might, however, benefit from the increased precision of the predicted contacts after DisVis filtering.
Collapse
Affiliation(s)
- Siri C van Keulen
- Bijvoet Centre for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Utrecht, the Netherlands
| | - Alexandre M J J Bonvin
- Bijvoet Centre for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Utrecht, the Netherlands
| |
Collapse
|
22
|
Ozden B, Kryshtafovych A, Karaca E. The Impact of AI-Based Modeling on the Accuracy of Protein Assembly Prediction: Insights from CASP15. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.10.548341. [PMID: 37503072 PMCID: PMC10369898 DOI: 10.1101/2023.07.10.548341] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
In CASP15, 87 predictors submitted around 11,000 models on 41 assembly targets. The community demonstrated exceptional performance in overall fold and interface contact prediction, achieving an impressive success rate of 90% (compared to 31% in CASP14). This remarkable accomplishment is largely due to the incorporation of DeepMind's AF2-Multimer approach into custom-built prediction pipelines. To evaluate the added value of participating methods, we compared the community models to the baseline AF2-Multimer predictor. In over 1/3 of cases the community models were superior to the baseline predictor. The main reasons for this improved performance were the use of custom-built multiple sequence alignments, optimized AF2-Multimer sampling, and the manual assembly of AF2-Multimer-built subcomplexes. The best three groups, in order, are Zheng, Venclovas and Wallner. Zheng and Venclovas reached a 73.2% success rate over all (41) cases, while Wallner attained 69.4% success rate over 36 cases. Nonetheless, challenges remain in predicting structures with weak evolutionary signals, such as nanobody-antigen, antibody-antigen, and viral complexes. Expectedly, modeling large complexes remains also challenging due to their high memory compute demands. In addition to the assembly category, we assessed the accuracy of modeling interdomain interfaces in the tertiary structure prediction targets. Models on seven targets featuring 17 unique interfaces were analyzed. Best predictors achieved the 76.5% success rate, with the UM-TBM group being the leader. In the interdomain category, we observed that the predictors faced challenges, as in the case of the assembly category, when the evolutionary signal for a given domain pair was weak or the structure was large. Overall, CASP15 witnessed unprecedented improvement in interface modeling, reflecting the AI revolution seen in CASP14.
Collapse
Affiliation(s)
- Burcu Ozden
- Izmir Biomedicine and Genome Center, Izmir, Türkiye
- Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir, Türkiye
| | - Andriy Kryshtafovych
- Protein Structure Prediction Center, Genome and Biomedical Sciences Facilities, University of California, Davis, California, USA
| | - Ezgi Karaca
- Izmir Biomedicine and Genome Center, Izmir, Türkiye
- Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir, Türkiye
| |
Collapse
|
23
|
Shome S, Jia K, Sivasankar S, Jernigan RL. Characterizing interactions in E-cadherin assemblages. Biophys J 2023; 122:3069-3077. [PMID: 37345249 PMCID: PMC10432173 DOI: 10.1016/j.bpj.2023.06.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Revised: 09/26/2022] [Accepted: 06/14/2023] [Indexed: 06/23/2023] Open
Abstract
Cadherin intermolecular interactions are critical for cell-cell adhesion and play essential roles in tissue formation and the maintenance of tissue structures. In this study, we focus on E-cadherin, a classical cadherin that connects epithelial cells, to understand how they interact in cis and trans conformations when attached to the same cell or opposing cells. We employ coevolutionary sequence analysis and molecular dynamics simulations to confirm previously known interaction sites as well as to identify new interaction sites. The sequence coevolutionary results yield a surprising result indicating that there are no strongly favored intermolecular interaction sites, which is unusual and suggests that many interaction sites may be possible, with none being strongly preferred over others. By using molecular dynamics, we test the persistence of these interactions and how they facilitate adhesion. We build several types of cadherin assemblages, with different numbers and combinations of cis and trans interfaces to understand how these conformations act to facilitate adhesion. Our results suggest that, in addition to the established interaction sites on the EC1 and EC2 domains, an additional plausible cis interface at the EC3-EC5 domain exists. Furthermore, we identify specific mutations at cis/trans binding sites that impair adhesion within E-cadherin assemblages.
Collapse
Affiliation(s)
- Sayane Shome
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa
| | - Kejue Jia
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa
| | - Sanjeevi Sivasankar
- Department of Biomedical Engineering, University of California, Davis, Davis, California
| | - Robert L Jernigan
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa.
| |
Collapse
|
24
|
Yang A, Jude KM, Lai B, Minot M, Kocyla AM, Glassman CR, Nishimiya D, Kim YS, Reddy ST, Khan AA, Garcia KC. Deploying synthetic coevolution and machine learning to engineer protein-protein interactions. Science 2023; 381:eadh1720. [PMID: 37499032 PMCID: PMC10403280 DOI: 10.1126/science.adh1720] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 06/16/2023] [Indexed: 07/29/2023]
Abstract
Fine-tuning of protein-protein interactions occurs naturally through coevolution, but this process is difficult to recapitulate in the laboratory. We describe a platform for synthetic protein-protein coevolution that can isolate matched pairs of interacting muteins from complex libraries. This large dataset of coevolved complexes drove a systems-level analysis of molecular recognition between Z domain-affibody pairs spanning a wide range of structures, affinities, cross-reactivities, and orthogonalities, and captured a broad spectrum of coevolutionary networks. Furthermore, we harnessed pretrained protein language models to expand, in silico, the amino acid diversity of our coevolution screen, predicting remodeled interfaces beyond the reach of the experimental library. The integration of these approaches provides a means of simulating protein coevolution and generating protein complexes with diverse molecular recognition properties for biotechnology and synthetic biology.
Collapse
Affiliation(s)
- Aerin Yang
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Kevin M. Jude
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Ben Lai
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
| | - Mason Minot
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
| | - Anna M. Kocyla
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Caleb R. Glassman
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Daisuke Nishimiya
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Yoon Seok Kim
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Sai T. Reddy
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
| | - Aly A. Khan
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
- Departments of Pathology, and Family Medicine, University of Chicago, Chicago, IL 60637, USA
| | - K. Christopher Garcia
- Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|
25
|
Ramakrishnan G, Baakman C, Heijl S, Vroling B, van Horck R, Hiraki J, Xue LC, Huynen MA. Understanding structure-guided variant effect predictions using 3D convolutional neural networks. Front Mol Biosci 2023; 10:1204157. [PMID: 37475887 PMCID: PMC10354367 DOI: 10.3389/fmolb.2023.1204157] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 06/22/2023] [Indexed: 07/22/2023] Open
Abstract
Predicting pathogenicity of missense variants in molecular diagnostics remains a challenge despite the available wealth of data, such as evolutionary information, and the wealth of tools to integrate that data. We describe DeepRank-Mut, a configurable framework designed to extract and learn from physicochemically relevant features of amino acids surrounding missense variants in 3D space. For each variant, various atomic and residue-level features are extracted from its structural environment, including sequence conservation scores of the surrounding amino acids, and stored in multi-channel 3D voxel grids which are then used to train a 3D convolutional neural network (3D-CNN). The resultant model gives a probabilistic estimate of whether a given input variant is disease-causing or benign. We find that the performance of our 3D-CNN model, on independent test datasets, is comparable to other widely used resources which also combine sequence and structural features. Based on the 10-fold cross-validation experiments, we achieve an average accuracy of 0.77 on the independent test datasets. We discuss the contribution of the variant neighborhood in the model's predictive power, in addition to the impact of individual features on the model's performance. Two key features: evolutionary information of residues in the variant neighborhood and their solvent accessibilities were observed to influence the predictions. We also highlight how predictions are impacted by the underlying disease mechanisms of missense mutations and offer insights into understanding these to improve pathogenicity predictions. Our study presents aspects to take into consideration when adopting deep learning approaches for protein structure-guided pathogenicity predictions.
Collapse
Affiliation(s)
- Gayatri Ramakrishnan
- Department of Medical Biosciences, Radboud University Medical Center, Nijmegen, Netherlands
| | - Coos Baakman
- Department of Medical Biosciences, Radboud University Medical Center, Nijmegen, Netherlands
| | | | | | | | | | - Li C. Xue
- Department of Medical Biosciences, Radboud University Medical Center, Nijmegen, Netherlands
| | - Martijn A. Huynen
- Department of Medical Biosciences, Radboud University Medical Center, Nijmegen, Netherlands
| |
Collapse
|
26
|
Dahmani I, Qin K, Zhang Y, Fernie AR. The formation and function of plant metabolons. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2023; 114:1080-1092. [PMID: 36906885 DOI: 10.1111/tpj.16179] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 02/26/2023] [Accepted: 03/06/2023] [Indexed: 05/31/2023]
Abstract
Metabolons are temporary structural-functional complexes of sequential enzymes of a metabolic pathway that are distinct from stable multi-enzyme complexes. Here we provide a brief history of the study of enzyme-enzyme assemblies with a particular focus on those that mediate substrate channeling in plants. Large numbers of protein complexes have been proposed for both primary and secondary metabolic pathways in plants. However, to date only four substrate channels have been demonstrated. We provide an overview of current knowledge concerning these four metabolons and explain the methodologies that are currently being applied to unravel their functions. Although the assembly of metabolons has been documented to arise through diverse mechanisms, the physical interaction within the characterized plant metabolons all appear to be driven by interaction with structural elements of the cell. We therefore pose the question as to what methodologies could be brought to bear to enhance our knowledge of plant metabolons that assemble via different mechanisms? In addressing this question, we review recent findings in non-plant systems concerning liquid droplet phase separation and enzyme chemotaxis and propose strategies via which such metabolons could be identified in plants. We additionally discuss the possibilities that could be opened up by novel approaches based on: (i) subcellular-level mass spectral imaging, (ii) proteomics, and (iii) emergent methods in structural and computational biology.
Collapse
Affiliation(s)
- Ismail Dahmani
- Max-Planck-Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476, Potsdam-Golm, Germany
| | - Kezhen Qin
- Max-Planck-Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476, Potsdam-Golm, Germany
| | - Youjun Zhang
- Max-Planck-Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476, Potsdam-Golm, Germany
- Center of Plant System Biology and Biotechnology, 4000, Plovdiv, Bulgaria
| | - Alisdair R Fernie
- Max-Planck-Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476, Potsdam-Golm, Germany
- Center of Plant System Biology and Biotechnology, 4000, Plovdiv, Bulgaria
| |
Collapse
|
27
|
Pomarici ND, Cacciato R, Kokot J, Fernández-Quintero ML, Liedl KR. Evolution of the Immunoglobulin Isotypes-Variations of Biophysical Properties among Animal Classes. Biomolecules 2023; 13:801. [PMID: 37238671 PMCID: PMC10216798 DOI: 10.3390/biom13050801] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 05/03/2023] [Accepted: 05/05/2023] [Indexed: 05/28/2023] Open
Abstract
The adaptive immune system arose around 500 million years ago in jawed fish, and, since then, it has mediated the immune defense against pathogens in all vertebrates. Antibodies play a central role in the immune reaction, recognizing and attacking external invaders. During the evolutionary process, several immunoglobulin isotypes emerged, each having a characteristic structural organization and dedicated function. In this work, we investigate the evolution of the immunoglobulin isotypes, in order to highlight the relevant features that were preserved over time and the parts that, instead, mutated. The residues that are coupled in the evolution process are often involved in intra- or interdomain interactions, meaning that they are fundamental to maintaining the immunoglobulin fold and to ensuring interactions with other domains. The explosive growth of available sequences allows us to point out the evolutionary conserved residues and compare the biophysical properties among different animal classes and isotypes. Our study offers a general overview of the evolution of immunoglobulin isotypes and advances the knowledge of their characteristic biophysical properties, as a first step in guiding protein design from evolution.
Collapse
Affiliation(s)
| | | | | | - Monica L. Fernández-Quintero
- Department of General, Inorganic and Theoretical Chemistry, Center for Molecular Biosciences Innsbruck (CMBI), University of Innsbruck, Innrain 80-82, A-6020 Innsbruck, Austria
| | - Klaus R. Liedl
- Department of General, Inorganic and Theoretical Chemistry, Center for Molecular Biosciences Innsbruck (CMBI), University of Innsbruck, Innrain 80-82, A-6020 Innsbruck, Austria
| |
Collapse
|
28
|
Wonderlick DR, Widom JR, Harms MJ. Disentangling contact and ensemble epistasis in a riboswitch. Biophys J 2023; 122:1600-1612. [PMID: 36710492 PMCID: PMC10183321 DOI: 10.1016/j.bpj.2023.01.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 01/09/2023] [Accepted: 01/24/2023] [Indexed: 01/29/2023] Open
Abstract
Mutations introduced into macromolecules often exhibit epistasis, where the effect of one mutation alters the effect of another. Knowing the mechanisms that lead to epistasis is important for understanding how macromolecules work and evolve, as well as for effective macromolecular engineering. Here, we investigate the interplay between "contact epistasis" (epistasis arising from physical interactions between mutated residues) and "ensemble epistasis" (epistasis that occurs when a mutation redistributes the conformational ensemble of a macromolecule, thus changing the effect of the second mutation). We argue that the two mechanisms can be distinguished in allosteric macromolecules by measuring epistasis at differing allosteric effector concentrations. Contact epistasis manifests as nonadditivity in the microscopic equilibrium constants describing the conformational ensemble. This epistatic effect is independent of allosteric effector concentration. Ensemble epistasis manifests as nonadditivity in thermodynamic observables-such as ligand binding-that are determined by the distribution of ensemble conformations. This epistatic effect strongly depends on allosteric effector concentration. Using this framework, we experimentally investigated the origins of epistasis in three pairwise mutant cycles introduced into the adenine riboswitch aptamer domain by measuring ligand binding as a function of allosteric effector concentration. We found evidence for both contact and ensemble epistasis in all cycles. Furthermore, we found that the two mechanisms of epistasis could interact with each other. For example, in one mutant cycle we observed 6 kcal/mol of contact epistasis in a microscopic equilibrium constant. In that same cycle, the maximum epistasis in ligand binding was only 1.5 kcal/mol: shifts in the ensemble masked the contribution of contact epistasis. Finally, our work yields simple heuristics for identifying contact and ensemble epistasis based on measurements of a biochemical observable as a function of allosteric effector concentration.
Collapse
Affiliation(s)
- Daria R Wonderlick
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, Oregon
| | - Julia R Widom
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, Oregon; Institute for Molecular Biology, University of Oregon, Eugene, Oregon; Oregon Center for Optical, Molecular, & Quantum Science, University of Oregon, Eugene, Oregon
| | - Michael J Harms
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, Oregon; Institute for Molecular Biology, University of Oregon, Eugene, Oregon.
| |
Collapse
|
29
|
Echeverria I, Braberg H, Krogan NJ, Sali A. Integrative structure determination of histones H3 and H4 using genetic interactions. FEBS J 2023; 290:2565-2575. [PMID: 35298864 PMCID: PMC9481981 DOI: 10.1111/febs.16435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 02/11/2022] [Accepted: 03/15/2022] [Indexed: 11/28/2022]
Abstract
Integrative structure modeling is increasingly used for determining the architectures of biological assemblies, especially those that are structurally heterogeneous. Recently, we reported on how to convert in vivo genetic interaction measurements into spatial restraints for structural modeling: first, phenotypic profiles are generated for each point mutation and thousands of gene deletions or environmental perturbations. Following, the phenotypic profile similarities are converted into distance restraints on the pairs of mutated residues. We illustrate the approach by determining the structure of the histone H3-H4 complex. The method is implemented in our open-source IMP program, expanding the structural biology toolbox by allowing structural characterization based on in vivo data without the need to purify the target system. We compare genetic interaction measurements to other sources of structural information, such as residue coevolution and deep-learning structure prediction of complex subunits. We also suggest that determining genetic interactions could benefit from new technologies, such as CRISPR-Cas9 approaches to gene editing, especially for mammalian cells. Finally, we highlight the opportunity for using genetic interactions to determine recalcitrant biomolecular structures, such as those of disordered proteins, transient protein assemblies, and host-pathogen protein complexes.
Collapse
Affiliation(s)
- Ignacia Echeverria
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, CA 94158, USA
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Hannes Braberg
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Nevan J. Krogan
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, CA 94158, USA
- Gladstone Institute of Data Science and Biotechnology, J. David Gladstone Institutes, San Francisco, CA 94158, USA
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Andrej Sali
- Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, CA 94158, USA
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA 94158, USA
| |
Collapse
|
30
|
Ziegler C, Martin J, Sinner C, Morcos F. Latent generative landscapes as maps of functional diversity in protein sequence space. Nat Commun 2023; 14:2222. [PMID: 37076519 PMCID: PMC10113739 DOI: 10.1038/s41467-023-37958-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Accepted: 04/05/2023] [Indexed: 04/21/2023] Open
Abstract
Variational autoencoders are unsupervised learning models with generative capabilities, when applied to protein data, they classify sequences by phylogeny and generate de novo sequences which preserve statistical properties of protein composition. While previous studies focus on clustering and generative features, here, we evaluate the underlying latent manifold in which sequence information is embedded. To investigate properties of the latent manifold, we utilize direct coupling analysis and a Potts Hamiltonian model to construct a latent generative landscape. We showcase how this landscape captures phylogenetic groupings, functional and fitness properties of several systems including Globins, β-lactamases, ion channels, and transcription factors. We provide support on how the landscape helps us understand the effects of sequence variability observed in experimental data and provides insights on directed and natural protein evolution. We propose that combining generative properties and functional predictive power of variational autoencoders and coevolutionary analysis could be beneficial in applications for protein engineering and design.
Collapse
Affiliation(s)
- Cheyenne Ziegler
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, 75080, USA
| | - Jonathan Martin
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, 75080, USA
| | - Claude Sinner
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, 75080, USA
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, 75080, USA.
- Department of Bioengineering, University of Texas at Dallas, Richardson, TX, 75080, USA.
- Center for Systems Biology, University of Texas at Dallas, Richardson, TX, 75080, USA.
| |
Collapse
|
31
|
Durham J, Zhang J, Humphreys IR, Pei J, Cong Q. Recent advances in predicting and modeling protein-protein interactions. Trends Biochem Sci 2023; 48:527-538. [PMID: 37061423 DOI: 10.1016/j.tibs.2023.03.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 03/03/2023] [Accepted: 03/17/2023] [Indexed: 04/17/2023]
Abstract
Protein-protein interactions (PPIs) drive biological processes, and disruption of PPIs can cause disease. With recent breakthroughs in structure prediction and a deluge of genomic sequence data, computational methods to predict PPIs and model spatial structures of protein complexes are now approaching the accuracy of experimental approaches for permanent interactions and show promise for elucidating transient interactions. As we describe here, the key to this success is rich evolutionary information deciphered from thousands of homologous sequences that coevolve in interacting partners. This covariation signal, revealed by sophisticated statistical and machine learning (ML) algorithms, predicts physiological interactions. Accurate artificial intelligence (AI)-based modeling of protein structures promises to provide accurate 3D models of PPIs at a proteome-wide scale.
Collapse
Affiliation(s)
- Jesse Durham
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Jing Zhang
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Ian R Humphreys
- Department of Biochemistry, University of Washington, Seattle, WA, USA; Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Jimin Pei
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Qian Cong
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX, USA; Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA; Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA.
| |
Collapse
|
32
|
O'Reilly FJ, Graziadei A, Forbrig C, Bremenkamp R, Charles K, Lenz S, Elfmann C, Fischer L, Stülke J, Rappsilber J. Protein complexes in cells by AI-assisted structural proteomics. Mol Syst Biol 2023; 19:e11544. [PMID: 36815589 PMCID: PMC10090944 DOI: 10.15252/msb.202311544] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 01/24/2023] [Accepted: 02/07/2023] [Indexed: 02/24/2023] Open
Abstract
Accurately modeling the structures of proteins and their complexes using artificial intelligence is revolutionizing molecular biology. Experimental data enable a candidate-based approach to systematically model novel protein assemblies. Here, we use a combination of in-cell crosslinking mass spectrometry and co-fractionation mass spectrometry (CoFrac-MS) to identify protein-protein interactions in the model Gram-positive bacterium Bacillus subtilis. We show that crosslinking interactions prior to cell lysis reveals protein interactions that are often lost upon cell lysis. We predict the structures of these protein interactions and others in the SubtiWiki database with AlphaFold-Multimer and, after controlling for the false-positive rate of the predictions, we propose novel structural models of 153 dimeric and 14 trimeric protein assemblies. Crosslinking MS data independently validates the AlphaFold predictions and scoring. We report and validate novel interactors of central cellular machineries that include the ribosome, RNA polymerase, and pyruvate dehydrogenase, assigning function to several uncharacterized proteins. Our approach uncovers protein-protein interactions inside intact cells, provides structural insight into their interaction interfaces, and is applicable to genetically intractable organisms, including pathogenic bacteria.
Collapse
Affiliation(s)
- Francis J O'Reilly
- Chair of BioanalyticsTechnische Universität BerlinBerlinGermany
- Present address:
Center for Structural Biology, Center for Cancer ResearchNational Cancer Institute (NCI)FrederickMDUSA
| | | | | | - Rica Bremenkamp
- Department of General Microbiology, Institute of Microbiology and GeneticsAugust‐University GöttingenGöttingenGermany
| | | | - Swantje Lenz
- Chair of BioanalyticsTechnische Universität BerlinBerlinGermany
| | - Christoph Elfmann
- Department of General Microbiology, Institute of Microbiology and GeneticsAugust‐University GöttingenGöttingenGermany
| | - Lutz Fischer
- Chair of BioanalyticsTechnische Universität BerlinBerlinGermany
| | - Jörg Stülke
- Department of General Microbiology, Institute of Microbiology and GeneticsAugust‐University GöttingenGöttingenGermany
| | - Juri Rappsilber
- Chair of BioanalyticsTechnische Universität BerlinBerlinGermany
- Wellcome Centre for Cell BiologyUniversity of EdinburghEdinburghUK
| |
Collapse
|
33
|
Si Y, Yan C. Improved inter-protein contact prediction using dimensional hybrid residual networks and protein language models. Brief Bioinform 2023; 24:7033302. [PMID: 36759333 DOI: 10.1093/bib/bbad039] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2022] [Revised: 01/13/2023] [Accepted: 01/18/2023] [Indexed: 02/11/2023] Open
Abstract
The knowledge of contacting residue pairs between interacting proteins is very useful for the structural characterization of protein-protein interactions (PPIs). However, accurately identifying the tens of contacting ones from hundreds of thousands of inter-protein residue pairs is extremely challenging, and performances of the state-of-the-art inter-protein contact prediction methods are still quite limited. In this study, we developed a deep learning method for inter-protein contact prediction, which is referred to as DRN-1D2D_Inter. Specifically, we employed pretrained protein language models to generate structural information-enriched input features to residual networks formed by dimensional hybrid residual blocks to perform inter-protein contact prediction. Extensively bechmarking DRN-1D2D_Inter on multiple datasets, including both heteromeric PPIs and homomeric PPIs, we show DRN-1D2D_Inter consistently and significantly outperformed two state-of-the-art inter-protein contact prediction methods, including GLINTER and DeepHomo, although both the latter two methods leveraged the native structures of interacting proteins in the prediction, and DRN-1D2D_Inter made the prediction purely from sequences. We further show that applying the predicted contacts as constraints for protein-protein docking can significantly improve its performance for protein complex structure prediction.
Collapse
Affiliation(s)
- Yunda Si
- School of Physics, Huazhong University of Science and Technology, China
| | - Chengfei Yan
- School of Physics, Huazhong University of Science and Technology, China
| |
Collapse
|
34
|
Hummels KR, Berry SP, Li Z, Taguchi A, Min JK, Walker S, Marks DS, Bernhardt TG. Coordination of bacterial cell wall and outer membrane biosynthesis. Nature 2023; 615:300-304. [PMID: 36859542 PMCID: PMC9995270 DOI: 10.1038/s41586-023-05750-0] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 01/23/2023] [Indexed: 03/03/2023]
Abstract
Gram-negative bacteria surround their cytoplasmic membrane with a peptidoglycan (PG) cell wall and an outer membrane (OM) with an outer leaflet composed of lipopolysaccharide (LPS)1. This complex envelope presents a formidable barrier to drug entry and is a major determinant of the intrinsic antibiotic resistance of these organisms2. The biogenesis pathways that build the surface are also targets of many of our most effective antibacterial therapies3. Understanding the molecular mechanisms underlying the assembly of the Gram-negative envelope therefore promises to aid the development of new treatments effective against the growing problem of drug-resistant infections. Although the individual pathways for PG and OM synthesis and assembly are well characterized, almost nothing is known about how the biogenesis of these essential surface layers is coordinated. Here we report the discovery of a regulatory interaction between the committed enzymes for the PG and LPS synthesis pathways in the Gram-negative pathogen Pseudomonas aeruginosa. We show that the PG synthesis enzyme MurA interacts directly and specifically with the LPS synthesis enzyme LpxC. Moreover, MurA was shown to stimulate LpxC activity in cells and in a purified system. Our results support a model in which the assembly of the PG and OM layers in many proteobacterial species is coordinated by linking the activities of the committed enzymes in their respective synthesis pathways.
Collapse
Affiliation(s)
- Katherine R Hummels
- Department of Microbiology, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
| | - Samuel P Berry
- Department of Systems Biology, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
| | - Zhaoqi Li
- Department of Microbiology, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
| | - Atsushi Taguchi
- Department of Microbiology, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- SANKEN (The Institute of Scientific and Industrial Research), Osaka University, Ibaraki, Japan
| | - Joseph K Min
- Department of Systems Biology, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
| | - Suzanne Walker
- Department of Microbiology, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
| | - Debora S Marks
- Department of Systems Biology, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
| | - Thomas G Bernhardt
- Department of Microbiology, Blavatnik Institute, Harvard Medical School, Boston, MA, USA.
- Howard Hughes Medical Institute, Boston, MA, USA.
| |
Collapse
|
35
|
Boral A, Mitra D. Heterogeneity in winged helix-turn-helix and substrate DNA interactions: Insights from theory and experiments. J Cell Biochem 2023; 124:337-358. [PMID: 36715571 DOI: 10.1002/jcb.30369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 12/29/2022] [Accepted: 01/02/2023] [Indexed: 01/31/2023]
Abstract
Specific interactions between transcription factors (TFs) and substrate DNA constitute the fundamental basis of gene expression. Unlike in TFs like basic helix-loop-helix or basic leucine zippers, prediction of substrate DNA is extremely challenging for helix-turn-helix (HTH). Experimental techniques like chromatin immunoprecipitation combined with massively parallel DNA sequencing remains a viable option. We characterize the molecular basis of heterogeneity in HTH-DNA interaction using in silico tools and thence validate them experimentally. Given the profound functional diversity in HTH, we focus primarily on winged-HTH (wHTH). We consider 180 wHTH TFs, whose experimental three-dimensional structures are available in DNA bound/unbound conformations. Starting with PDB-wide scanning and curation of data, we construct a phylogenetic tree, which distributes 180 wHTH sequences under multiple sub-groups. Structure-sequence alignment followed by detailed intra/intergroup analysis, covariation studies and extensive network theory analysis help us to gain deep insight into heterogeneous wHTH-substrate DNA interactions. A central aim of this study is to find a consensus to predict the substrate DNA sequence for wHTH, amidst heterogeneity. The strength of our exhaustive theoretical investigations including molecular docking are successfully tested through experimental characterization of wHTH TF from Sulfurimonas denitrificans.
Collapse
Affiliation(s)
- Aparna Boral
- Department of Life Sciences, Presidency University, Kolkata, West Bengal, India
| | - Devrani Mitra
- Department of Life Sciences, Presidency University, Kolkata, West Bengal, India
| |
Collapse
|
36
|
Nchourupouo KWT, Nde J, Ngouongo YJW, Zekeng SS, Fongang B. Evolutionary Couplings and Molecular Dynamic Simulations Highlight Details of GPCRs Heterodimers' Interfaces. Molecules 2023; 28:1838. [PMID: 36838825 PMCID: PMC9966702 DOI: 10.3390/molecules28041838] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 02/03/2023] [Accepted: 02/08/2023] [Indexed: 02/18/2023] Open
Abstract
A growing body of evidence suggests that only a few amino acids ("hot-spots") at the interface contribute most of the binding energy in transient protein-protein interactions. However, experimental protocols to identify these hot-spots are highly labor-intensive and expensive. Computational methods, including evolutionary couplings, have been proposed to predict the hot-spots, but they generally fail to provide details of the interacting amino acids. Here we showed that unbiased evolutionary methods followed by biased molecular dynamic simulations could achieve this goal and reveal critical elements of protein complexes. We applied the methodology to selected G-protein coupled receptors (GPCRs), known for their therapeutic properties. We used the structure-prior-assisted direct coupling analysis (SP-DCA) to predict the binding interfaces of A2aR/D2R, CB1R/D2R, A2aR/CB1R, 5HT2AR/D2R, and 5-HT2AR/mGluR2 receptor heterodimers, which all agreed with published data. In order to highlight details of the interactions, we performed molecular dynamic (MD) simulations using the newly developed AWSEM energy model. We found that these receptors interact primarily through critical residues at the C and N terminal domains and the third intracellular loop (ICL3). The MD simulations showed that these residues are energetically necessary for dimerization and revealed their native conformational state. We subsequently applied the methodology to the 5-HT2AR/5-HTR4R heterodimer, given its implication in drug addiction and neurodegenerative pathologies such as Alzheimer's disease (AD). Further, the SP-DCA analysis showed that 5-HT2AR and 5-HTR4R heterodimerize through the C-terminal domain of 5-HT2AR and ICL3 of 5-HT4R. However, elucidating the details of GPCR interactions would accelerate the discovery of druggable sites and improve our knowledge of the etiology of common diseases, including AD.
Collapse
Affiliation(s)
- Karim Widad Temgbet Nchourupouo
- Laboratory of Mechanics, Materials, and Structures, Department of Physics, Faculty of Science, University of Yaoundé I, Yaoundé P.O. Box 812, Cameroon
| | - Jules Nde
- Department of Physics, University of Washington Seattle, Seattle, WA 98105, USA
| | - Yannick Joel Wadop Ngouongo
- Glenn Biggs Institute for Alzheimer's & Neurodegenerative Diseases, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Serge Sylvain Zekeng
- Laboratory of Mechanics, Materials, and Structures, Department of Physics, Faculty of Science, University of Yaoundé I, Yaoundé P.O. Box 812, Cameroon
| | - Bernard Fongang
- Glenn Biggs Institute for Alzheimer's & Neurodegenerative Diseases, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
- Department of Biochemistry and Structural Biology, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
- Department of Population Health Sciences, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| |
Collapse
|
37
|
Li M, Kang L, Xiong Y, Wang YG, Fan G, Tan P, Hong L. SESNet: sequence-structure feature-integrated deep learning method for data-efficient protein engineering. J Cheminform 2023; 15:12. [PMID: 36737798 PMCID: PMC9898993 DOI: 10.1186/s13321-023-00688-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 01/23/2023] [Indexed: 02/05/2023] Open
Abstract
Deep learning has been widely used for protein engineering. However, it is limited by the lack of sufficient experimental data to train an accurate model for predicting the functional fitness of high-order mutants. Here, we develop SESNet, a supervised deep-learning model to predict the fitness for protein mutants by leveraging both sequence and structure information, and exploiting attention mechanism. Our model integrates local evolutionary context from homologous sequences, the global evolutionary context encoding rich semantic from the universal protein sequence space and the structure information accounting for the microenvironment around each residue in a protein. We show that SESNet outperforms state-of-the-art models for predicting the sequence-function relationship on 26 deep mutational scanning datasets. More importantly, we propose a data augmentation strategy by leveraging the data from unsupervised models to pre-train our model. After that, our model can achieve strikingly high accuracy in prediction of the fitness of protein mutants, especially for the higher order variants (> 4 mutation sites), when finetuned by using only a small number of experimental mutation data (< 50). The strategy proposed is of great practical value as the required experimental effort, i.e., producing a few tens of experimental mutation data on a given protein, is generally affordable by an ordinary biochemical group and can be applied on almost any protein.
Collapse
Affiliation(s)
- Mingchen Li
- Shanghai National Center for Applied Mathematics (SJTU Center), & Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, 200240, China
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai, 200240, China
| | - Liqi Kang
- Shanghai National Center for Applied Mathematics (SJTU Center), & Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, 200240, China
- School of Physics and Astronomy & School of Pharmacy, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yi Xiong
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yu Guang Wang
- Shanghai National Center for Applied Mathematics (SJTU Center), & Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, 200240, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200240, China
| | - Guisheng Fan
- School of Information Science and Engineering, East China University of Science and Technology, Shanghai, 200240, China
| | - Pan Tan
- Shanghai National Center for Applied Mathematics (SJTU Center), & Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, 200240, China.
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200240, China.
| | - Liang Hong
- Shanghai National Center for Applied Mathematics (SJTU Center), & Institute of Natural Sciences, Shanghai Jiao Tong University, Shanghai, 200240, China.
- Shanghai Artificial Intelligence Laboratory, Shanghai, 200240, China.
- School of Physics and Astronomy & School of Pharmacy, Shanghai Jiao Tong University, Shanghai, 200240, China.
| |
Collapse
|
38
|
Lin P, Yan Y, Huang SY. DeepHomo2.0: improved protein-protein contact prediction of homodimers by transformer-enhanced deep learning. Brief Bioinform 2023; 24:6849483. [PMID: 36440949 DOI: 10.1093/bib/bbac499] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 10/08/2022] [Accepted: 10/21/2022] [Indexed: 11/30/2022] Open
Abstract
Protein-protein interactions play an important role in many biological processes. However, although structure prediction for monomer proteins has achieved great progress with the advent of advanced deep learning algorithms like AlphaFold, the structure prediction for protein-protein complexes remains an open question. Taking advantage of the Transformer model of ESM-MSA, we have developed a deep learning-based model, named DeepHomo2.0, to predict protein-protein interactions of homodimeric complexes by leveraging the direct-coupling analysis (DCA) and Transformer features of sequences and the structure features of monomers. DeepHomo2.0 was extensively evaluated on diverse test sets and compared with eight state-of-the-art methods including protein language model-based, DCA-based and machine learning-based methods. It was shown that DeepHomo2.0 achieved a high precision of >70% with experimental monomer structures and >60% with predicted monomer structures for the top 10 predicted contacts on the test sets and outperformed the other eight methods. Moreover, even the version without using structure information, named DeepHomoSeq, still achieved a good precision of >55% for the top 10 predicted contacts. Integrating the predicted contacts into protein docking significantly improved the structure prediction of realistic Critical Assessment of Protein Structure Prediction homodimeric complexes. DeepHomo2.0 and DeepHomoSeq are available at http://huanglab.phys.hust.edu.cn/DeepHomo2/.
Collapse
Affiliation(s)
- Peicong Lin
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| | - Yumeng Yan
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| |
Collapse
|
39
|
Abstract
In the recent years, therapeutic use of antibodies has seen a huge growth, "due to their inherent proprieties and technological advances in the methods used to study and characterize them. Effective design and engineering of antibodies for therapeutic purposes are heavily dependent on knowledge of the structural principles that regulate antibody-antigen interactions. Several experimental techniques such as X-ray crystallography, cryo-electron microscopy, NMR, or mutagenesis analysis can be applied, but these are usually expensive and time-consuming. Therefore computational approaches like molecular docking may offer a valuable alternative for the characterization of antibody-antigen complexes.Here we describe a protocol for the prediction of the 3D structure of antibody-antigen complexes using the integrative modelling platform HADDOCK. The protocol consists of (1) the identification of the antibody residues belonging to the hypervariable loops which are known to be crucial for the binding and can be used to guide the docking and (2) the detailed steps to perform docking with the HADDOCK 2.4 webserver following different strategies depending on the availability of information about epitope residues.
Collapse
Affiliation(s)
- Francesco Ambrosetti
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Utrecht, The Netherlands
| | - Zuzana Jandova
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Utrecht, The Netherlands
| | - Alexandre M J J Bonvin
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Faculty of Science - Chemistry, Utrecht University, Utrecht, The Netherlands.
| |
Collapse
|
40
|
Launay R, Teppa E, Esque J, André I. Modeling Protein Complexes and Molecular Assemblies Using Computational Methods. Methods Mol Biol 2023; 2553:57-77. [PMID: 36227539 DOI: 10.1007/978-1-0716-2617-7_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Many biological molecules are assembled into supramolecular complexes that are necessary to perform functions in the cell. Better understanding and characterization of these molecular assemblies are thus essential to further elucidate molecular mechanisms and key protein-protein interactions that could be targeted to modulate the protein binding affinity or develop new binders. Experimental access to structural information on these supramolecular assemblies is often hampered by the size of these systems that make their recombinant production and characterization rather difficult. Computational methods combining both structural data, molecular modeling techniques, and sequence coevolution information can thus offer a good alternative to gain access to the structural organization of protein complexes and assemblies. Herein, we present some computational methods to predict structural models of the protein partners, to search for interacting regions using coevolution information, and to build molecular assemblies. The approach is exemplified using a case study to model the succinate-quinone oxidoreductase heterocomplex.
Collapse
Affiliation(s)
- Romain Launay
- Toulouse Biotechnology Institute, TBI, Université de Toulouse, CNRS, INRAE, INSA, Toulouse Cedex 04, France
| | - Elin Teppa
- Toulouse Biotechnology Institute, TBI, Université de Toulouse, CNRS, INRAE, INSA, Toulouse Cedex 04, France
| | - Jérémy Esque
- Toulouse Biotechnology Institute, TBI, Université de Toulouse, CNRS, INRAE, INSA, Toulouse Cedex 04, France.
| | - Isabelle André
- Toulouse Biotechnology Institute, TBI, Université de Toulouse, CNRS, INRAE, INSA, Toulouse Cedex 04, France.
| |
Collapse
|
41
|
Ozdemir ES, Nussinov R. Pathogen-driven cancers from a structural perspective: Targeting host-pathogen protein-protein interactions. Front Oncol 2023; 13:1061595. [PMID: 36910650 PMCID: PMC9997845 DOI: 10.3389/fonc.2023.1061595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 02/06/2023] [Indexed: 02/25/2023] Open
Abstract
Host-pathogen interactions (HPIs) affect and involve multiple mechanisms in both the pathogen and the host. Pathogen interactions disrupt homeostasis in host cells, with their toxins interfering with host mechanisms, resulting in infections, diseases, and disorders, extending from AIDS and COVID-19, to cancer. Studies of the three-dimensional (3D) structures of host-pathogen complexes aim to understand how pathogens interact with their hosts. They also aim to contribute to the development of rational therapeutics, as well as preventive measures. However, structural studies are fraught with challenges toward these aims. This review describes the state-of-the-art in protein-protein interactions (PPIs) between the host and pathogens from the structural standpoint. It discusses computational aspects of predicting these PPIs, including machine learning (ML) and artificial intelligence (AI)-driven, and overviews available computational methods and their challenges. It concludes with examples of how theoretical computational approaches can result in a therapeutic agent with a potential of being used in the clinics, as well as future directions.
Collapse
Affiliation(s)
- Emine Sila Ozdemir
- Cancer Early Detection Advanced Research Center, Knight Cancer Institute, Oregon Health & Science University, Portland, OR, United States
| | - Ruth Nussinov
- Cancer Innovation Laboratory, Frederick National Laboratory for Cancer Research, National Cancer Institute at Frederick, Frederick, MD, United States.,Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
42
|
Wang L, Li FL, Ma XY, Cang Y, Bai F. PPI-Miner: A Structure and Sequence Motif Co-Driven Protein-Protein Interaction Mining and Modeling Computational Method. J Chem Inf Model 2022; 62:6160-6171. [PMID: 36448715 DOI: 10.1021/acs.jcim.2c01033] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
Protein-protein interactions (PPIs) play important roles in biological processes of life, and predicting PPIs becomes a critical scientific issue of concern. Most PPIs occur through small domains or motifs (fragments), which are challenging and laborious to map by standard biochemical approaches because they generally require the cloning of several truncation mutants. Here, we present a computational method, named as PPI-Miner, to fish potential protein interacting partners utilizing protein motifs as queries. In brief, this work first developed a motif-matching algorithm designed to identify the proteins that contain sequential or structural similar motifs with the given query motif. Being aligned to the query motif, the binding mode of the discovered motif and its receptor protein will be initially determined to be used to build PPI complexes accordingly. Eventually, a PPI complex structure could be built and optimized with a designed automatic protocol. Besides discovering PPIs, PPI-Miner can also be applied to other areas, i.e., the rational design of molecular glues and protein vaccines. In this work, PPI-Miner was employed to mine the potential cereblon (CRBN) substrates from human proteome. As a result, 1,739 candidates were predicted, and 16 of them have been experimentally validated in previous studies. The source code of PPI-Miner can be obtained from the GitHub repository (https://github.com/Wang-Lin-boop/PPI-Miner), the webserver is freely available for users (https://bailab.siais.shanghaitech.edu.cn/services/ppi-miner), and the database of predicted CRBN substrates is accessible at https://bailab.siais.shanghaitech.edu.cn/services/crbn-subslib.
Collapse
Affiliation(s)
| | | | | | | | - Fang Bai
- Shanghai Clinical Research and Trial Center, Shanghai201210, China
| |
Collapse
|
43
|
Zhao XJG, Cao H. Linking research of biomedical datasets. Brief Bioinform 2022; 23:6712704. [PMID: 36151775 DOI: 10.1093/bib/bbac373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 08/03/2022] [Accepted: 08/08/2022] [Indexed: 12/14/2022] Open
Abstract
Biomedical data preprocessing and efficient computing can be as important as the statistical methods used to fit the data; data processing needs to consider application scenarios, data acquisition and individual rights and interests. We review common principles, knowledge and methods of integrated research according to the whole-pipeline processing mechanism diverse, coherent, sharing, auditable and ecological. First, neuromorphic and native algorithms integrate diverse datasets, providing linear scalability and high visualization. Second, the choice mechanism of different preprocessing, analysis and transaction methods from raw to neuromorphic was summarized on the node and coordinator platforms. Third, combination of node, network, cloud, edge, swarm and graph builds an ecosystem of cohort integrated research and clinical diagnosis and treatment. Looking forward, it is vital to simultaneously combine deep computing, mass data storage and massively parallel communication.
Collapse
Affiliation(s)
- Xiu-Ju George Zhao
- Wuhan Institute of Physics and Mathematics (WIPM), China.,Wuhan Polytechnic University, China
| | - Hui Cao
- Wuhan Polytechnic University, China
| |
Collapse
|
44
|
Wu C, Guo D. Computational Docking Reveals Co-Evolution of C4 Carbon Delivery Enzymes in Diverse Plants. Int J Mol Sci 2022; 23:12688. [PMID: 36293547 PMCID: PMC9604239 DOI: 10.3390/ijms232012688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 10/14/2022] [Accepted: 10/19/2022] [Indexed: 11/16/2022] Open
Abstract
Proteins are modular functionalities regulating multiple cellular activities in prokaryotes and eukaryotes. As a consequence of higher plants adapting to arid and thermal conditions, C4 photosynthesis is the carbon fixation process involving multi-enzymes working in a coordinated fashion. However, how these enzymes interact with each other and whether they co-evolve in parallel to maintain interactions in different plants remain elusive to date. Here, we report our findings on the global protein co-evolution relationship and local dynamics of co-varying site shifts in key C4 photosynthetic enzymes. We found that in most of the selected key C4 photosynthetic enzymes, global pairwise co-evolution events exist to form functional couplings. Besides, protein-protein interactions between these enzymes may suggest their unknown functionalities in the carbon delivery process. For PEPC and PPCK regulation pairs, pocket formation at the interactive interface are not necessary for their function. This feature is distinct from another well-known regulation pair in C4 photosynthesis, namely, PPDK and PPDK-RP, where the pockets are necessary. Our findings facilitate the discovery of novel protein regulation types and contribute to expanding our knowledge about C4 photosynthesis.
Collapse
Affiliation(s)
| | - Dianjing Guo
- State Key Laboratory of Agrobiotechnology, School of Life Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
| |
Collapse
|
45
|
Ahmed S, Chattopadhyay G, Manjunath K, Bhasin M, Singh N, Rasool M, Das S, Rana V, Khan N, Mitra D, Asok A, Singh R, Varadarajan R. Combining cysteine scanning with chemical labeling to map protein-protein interactions and infer bound structure in an intrinsically disordered region. Front Mol Biosci 2022; 9:997653. [PMID: 36275627 PMCID: PMC9585320 DOI: 10.3389/fmolb.2022.997653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 09/12/2022] [Indexed: 11/13/2022] Open
Abstract
The Mycobacterium tuberculosis genome harbours nine toxin-antitoxin (TA) systems of the mazEF family. These consist of two proteins, a toxin and an antitoxin, encoded in an operon. While the toxin has a conserved fold, the antitoxins are structurally diverse and the toxin binding region is typically intrinsically disordered before binding. We describe high throughput methodology for accurate mapping of interfacial residues and apply it to three MazEF complexes. The method involves screening one partner protein against a panel of chemically masked single cysteine mutants of its interacting partner, displayed on the surface of yeast cells. Such libraries have much lower diversity than those generated by saturation mutagenesis, simplifying library generation and data analysis. Further, because of the steric bulk of the masking reagent, labeling of virtually all exposed epitope residues should result in loss of binding, and buried residues are inaccessible to the labeling reagent. The binding residues are deciphered by probing the loss of binding to the labeled cognate partner by flow cytometry. Using this methodology, we have identified the interfacial residues for MazEF3, MazEF6 and MazEF9 TA systems of M. tuberculosis. In the case of MazEF9, where a crystal structure was available, there was excellent agreement between our predictions and the crystal structure, superior to those with AlphaFold2. We also report detailed biophysical characterization of the MazEF3 and MazEF9 TA systems and measured the relative affinities between cognate and non-cognate toxin–antitoxin partners in order to probe possible cross-talk between these systems.
Collapse
Affiliation(s)
- Shahbaz Ahmed
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | | | | | - Munmun Bhasin
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Neelam Singh
- Tuberculosis Research Laboratory, Translational Health Science and Technology Institute, Faridabad, India
| | - Mubashir Rasool
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Sayan Das
- Tuberculosis Research Laboratory, Translational Health Science and Technology Institute, Faridabad, India
| | - Varsha Rana
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Neha Khan
- Tuberculosis Research Laboratory, Translational Health Science and Technology Institute, Faridabad, India
| | - Debarghya Mitra
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Aparna Asok
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Ramandeep Singh
- Tuberculosis Research Laboratory, Translational Health Science and Technology Institute, Faridabad, India
| | - Raghavan Varadarajan
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
- *Correspondence: Raghavan Varadarajan,
| |
Collapse
|
46
|
Shola David M, Kanayeva D. Enzyme linked oligonucleotide assay for the sensitive detection of SARS-CoV-2 variants. Front Cell Infect Microbiol 2022; 12:1017542. [PMID: 36250054 PMCID: PMC9559407 DOI: 10.3389/fcimb.2022.1017542] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 09/13/2022] [Indexed: 11/13/2022] Open
Abstract
The exponential spread of COVID-19 has prompted the need to develop a simple and sensitive diagnostic tool. Aptamer-based detection assays like ELONA are promising since they are inexpensive and sensitive. Aptamers have advantages over antibodies in wide modification, small size, in vitro selection, and stability under stringent conditions, which aid in scalable and reliable detection. In this work, we used aptamers against SARS-CoV-2 RBD S protein to design a simple and sensitive ELONA detection tool. Screening CoV2-RBD-1C and CoV2-RBD-4C aptamers and optimizing assay conditions led to the development of a direct ELONA that can detect SARS-CoV-2 RBD S glycoprotein in buffer solution and 0.1 % human nasal fluid with a detection limit of 2.16 ng/mL and 1.02 ng/mL, respectively. We detected inactivated Alpha, Wuhan, and Delta variants of SARS-CoV-2 with the detection limit of 3.73, 5.72, and 6.02 TCID50/mL, respectively. Using the two aptamers as capture and reporter elements, we designed a more sensitive sandwich assay to identify the three SARS-CoV-2 variants employed in this research. As predicted, a lower detection limit was obtained. Sandwich assay LOD was 2.31 TCID50/mL for Alpha, 1.15 TCID50/mL for Wuhan, and 2.96 TCID50/mL for Delta. The sensitivity of sandwich ELONA was validated using Alpha and Wuhan variants spiked in 0.1% human nasal fluid sample condition and were detected in 1.41 and 1.79 TCID50/mL LOD, respectively. SEM was used to visualize the presence of viral particles in the Delta variant sample. The effective detection of SARS-CoV-2 in this study confirms the potential of our aptamer-based technique as a screening tool.
Collapse
|
47
|
Soleymani F, Paquet E, Viktor H, Michalowski W, Spinello D. Protein-protein interaction prediction with deep learning: A comprehensive review. Comput Struct Biotechnol J 2022; 20:5316-5341. [PMID: 36212542 PMCID: PMC9520216 DOI: 10.1016/j.csbj.2022.08.070] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 08/29/2022] [Accepted: 08/30/2022] [Indexed: 11/15/2022] Open
Abstract
Most proteins perform their biological function by interacting with themselves or other molecules. Thus, one may obtain biological insights into protein functions, disease prevalence, and therapy development by identifying protein-protein interactions (PPI). However, finding the interacting and non-interacting protein pairs through experimental approaches is labour-intensive and time-consuming, owing to the variety of proteins. Hence, protein-protein interaction and protein-ligand binding problems have drawn attention in the fields of bioinformatics and computer-aided drug discovery. Deep learning methods paved the way for scientists to predict the 3-D structure of proteins from genomes, predict the functions and attributes of a protein, and modify and design new proteins to provide desired functions. This review focuses on recent deep learning methods applied to problems including predicting protein functions, protein-protein interaction and their sites, protein-ligand binding, and protein design.
Collapse
Affiliation(s)
- Farzan Soleymani
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON, Canada
| | - Eric Paquet
- National Research Council, 1200 Montreal Road, Ottawa, ON K1A 0R6, Canada
| | - Herna Viktor
- School of Electrical Engineering and Computer Science, University of Ottawa, ON, Canada
| | | | - Davide Spinello
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON, Canada
| |
Collapse
|
48
|
Pei J, Zhang J, Cong Q. Human mitochondrial protein complexes revealed by large-scale coevolution analysis and deep learning-based structure modeling. Bioinformatics 2022; 38:4301-4311. [PMID: 35881696 DOI: 10.1093/bioinformatics/btac527] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 05/27/2022] [Accepted: 07/22/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Recent development of deep-learning methods has led to a breakthrough in the prediction accuracy of 3D protein structures. Extending these methods to protein pairs is expected to allow large-scale detection of protein-protein interactions (PPIs) and modeling protein complexes at the proteome level. RESULTS We applied RoseTTAFold and AlphaFold, two of the latest deep-learning methods for structure predictions, to analyze coevolution of human proteins residing in mitochondria, an organelle of vital importance in many cellular processes including energy production, metabolism, cell death and antiviral response. Variations in mitochondrial proteins have been linked to a plethora of human diseases and genetic conditions. RoseTTAFold, with high computational speed, was used to predict the coevolution of about 95% of mitochondrial protein pairs. Top-ranked pairs were further subject to modeling of the complex structures by AlphaFold, which also produced contact probability with high precision and in many cases consistent with RoseTTAFold. Most top-ranked pairs with high contact probability were supported by known PPIs and/or similarities to experimental structural complexes. For high-scoring pairs without experimental complex structures, our coevolution analyses and structural models shed light on the details of their interfaces, including CHCHD4-AIFM1, MTERF3-TRUB2, FMC1-ATPAF2 and ECSIT-NDUFAF1. We also identified novel PPIs (PYURF-NDUFAF5, LYRM1-MTRF1L and COA8-COX10) for several proteins without experimentally characterized interaction partners, leading to predictions of their molecular functions and the biological processes they are involved in. AVAILABILITY AND IMPLEMENTATION Data of mitochondrial proteins and their interactions are available at: http://conglab.swmed.edu/mitochondria. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jimin Pei
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.,Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.,Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Jing Zhang
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.,Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.,Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Qian Cong
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.,Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.,Harold C. Simmons Comprehensive Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| |
Collapse
|
49
|
Yin R, Feng BY, Varshney A, Pierce BG. Benchmarking AlphaFold for protein complex modeling reveals accuracy determinants. Protein Sci 2022; 31:e4379. [PMID: 35900023 PMCID: PMC9278006 DOI: 10.1002/pro.4379] [Citation(s) in RCA: 139] [Impact Index Per Article: 46.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2022] [Revised: 06/06/2022] [Accepted: 06/09/2022] [Indexed: 12/17/2022]
Abstract
High-resolution experimental structural determination of protein-protein interactions has led to valuable mechanistic insights, yet due to the massive number of interactions and experimental limitations there is a need for computational methods that can accurately model their structures. Here we explore the use of the recently developed deep learning method, AlphaFold, to predict structures of protein complexes from sequence. With a benchmark of 152 diverse heterodimeric protein complexes, multiple implementations and parameters of AlphaFold were tested for accuracy. Remarkably, many cases (43%) had near-native models (medium or high critical assessment of predicted interactions accuracy) generated as top-ranked predictions by AlphaFold, greatly surpassing the performance of unbound protein-protein docking (9% success rate for near-native top-ranked models), however AlphaFold modeling of antibody-antigen complexes within our set was unsuccessful. We identified sequence and structural features associated with lack of AlphaFold success, and we also investigated the impact of multiple sequence alignment input. Benchmarking of a multimer-optimized version of AlphaFold (AlphaFold-Multimer) with a set of recently released antibody-antigen structures confirmed a low rate of success for antibody-antigen complexes (11% success), and we found that T cell receptor-antigen complexes are likewise not accurately modeled by that algorithm, showing that adaptive immune recognition poses a challenge for the current AlphaFold algorithm and model. Overall, our study demonstrates that end-to-end deep learning can accurately model many transient protein complexes, and highlights areas of improvement for future developments to reliably model any protein-protein interaction of interest.
Collapse
Affiliation(s)
- Rui Yin
- Institute for Bioscience and Biotechnology ResearchUniversity of MarylandRockvilleMarylandUSA
- Department of Cell Biology and Molecular GeneticsUniversity of MarylandCollege ParkMarylandUSA
| | - Brandon Y. Feng
- Department of Computer ScienceUniversity of MarylandCollege ParkMarylandUSA
| | - Amitabh Varshney
- Department of Computer ScienceUniversity of MarylandCollege ParkMarylandUSA
| | - Brian G. Pierce
- Institute for Bioscience and Biotechnology ResearchUniversity of MarylandRockvilleMarylandUSA
- Department of Cell Biology and Molecular GeneticsUniversity of MarylandCollege ParkMarylandUSA
- Marlene and Stewart Greenebaum Comprehensive Cancer CenterUniversity of Maryland School of MedicineBaltimoreMarylandUSA
| |
Collapse
|
50
|
Liu Z, Yu DJ. cpxDeepMSA: A Deep Cascade Algorithm for Constructing Multiple Sequence Alignments of Protein–Protein Interactions. Int J Mol Sci 2022; 23:ijms23158459. [PMID: 35955594 PMCID: PMC9369210 DOI: 10.3390/ijms23158459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Revised: 07/18/2022] [Accepted: 07/28/2022] [Indexed: 12/10/2022] Open
Abstract
Protein–protein interactions (PPIs) are fundamental to many biological processes. The coevolution-based prediction of interacting residues has made great strides in protein complexes that are known to interact. A multiple sequence alignment (MSA) is the basis of coevolution analysis. MSAs have recently made significant progress in the protein monomer sequence analysis. However, no standard or efficient pipelines are available for the sensitive protein complex MSA (cpxMSA) collection. How to generate cpxMSA is one of the most challenging problems of sequence coevolution analysis. Although several methods have been developed to address this problem, no standalone program exists. Furthermore, the number of built-in properties is limited; hence, it is often difficult for users to analyze sequence coevolution according to their desired cpxMSA. In this article, we developed a novel cpxMSA approach (cpxDeepMSA. We used different protein monomer databases and incorporated the three strategies (genomic distance, phylogeny information, and STRING interaction network) used to join the monomer MSA results of protein complexes, which can prevent using a single method fail to the joint two-monomer MSA causing the cpxMSA construction failure. We anticipate that the cpxDeepMSA algorithm will become a useful high-throughput tool in protein complex structure predictions, inter-protein residue-residue contacts, and the biological sequence coevolution analysis.
Collapse
|