1
|
Xu Q, Dunbrack R. The protein common assembly database (ProtCAD)-a comprehensive structural resource of protein complexes. Nucleic Acids Res 2023; 51:D466-D478. [PMID: 36300618 PMCID: PMC9825537 DOI: 10.1093/nar/gkac937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 10/04/2022] [Accepted: 10/11/2022] [Indexed: 01/29/2023] Open
Abstract
Proteins often act through oligomeric interactions with other proteins. X-ray crystallography and cryo-electron microscopy provide detailed information on the structures of biological assemblies, defined as the most likely biologically relevant structures derived from experimental data. In crystal structures, the most relevant assembly may be ambiguously determined, since multiple assemblies observed in the crystal lattice may be plausible. It is estimated that 10-15% of PDB entries may have incorrect or ambiguous assembly annotations. Accurate assemblies are required for understanding functional data and training of deep learning methods for predicting assembly structures. As with any other kind of biological data, replication via multiple independent experiments provides important validation for the determination of biological assembly structures. Here we present the Protein Common Assembly Database (ProtCAD), which presents clusters of protein assembly structures observed in independent structure determinations of homologous proteins in the Protein Data Bank (PDB). ProtCAD is searchable by PDB entry, UniProt identifiers, or Pfam domain designations and provides downloads of coordinate files, PyMol scripts, and publicly available assembly annotations for each cluster of assemblies. About 60% of PDB entries contain assemblies in clusters of at least 2 independent experiments. All clusters and coordinates are available on ProtCAD web site (http://dunbrack2.fccc.edu/protcad).
Collapse
Affiliation(s)
- Qifang Xu
- Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, PA 19111, USA
| | - Roland L Dunbrack
- Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, PA 19111, USA
| |
Collapse
|
2
|
Dey S, Prilusky J, Levy ED. QSalignWeb: A Server to Predict and Analyze Protein Quaternary Structure. Front Mol Biosci 2022; 8:787510. [PMID: 35071324 PMCID: PMC8769216 DOI: 10.3389/fmolb.2021.787510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 12/02/2021] [Indexed: 11/16/2022] Open
Abstract
The identification of physiologically relevant quaternary structures (QSs) in crystal lattices is challenging. To predict the physiological relevance of a particular QS, QSalign searches for homologous structures in which subunits interact in the same geometry. This approach proved accurate but was limited to structures already present in the Protein Data Bank (PDB). Here, we introduce a webserver (www.QSalign.org) allowing users to submit homo-oligomeric structures of their choice to the QSalign pipeline. Given a user-uploaded structure, the sequence is extracted and used to search homologs based on sequence similarity and PFAM domain architecture. If structural conservation is detected between a homolog and the user-uploaded QS, physiological relevance is inferred. The web server also generates alternative QSs with PISA and processes them the same way as the query submitted to widen the predictions. The result page also shows representative QSs in the protein family of the query, which is informative if no QS conservation was detected or if the protein appears monomeric. These representative QSs can also serve as a starting point for homology modeling.
Collapse
Affiliation(s)
- Sucharita Dey
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Jaime Prilusky
- Department of Life Sciences and Core Facilities, Weizmann Institute of Science, Rehovot, Israel
| | - Emmanuel D. Levy
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
3
|
PDB-wide identification of physiological hetero-oligomeric assemblies based on conserved quaternary structure geometry. Structure 2021; 29:1303-1311.e3. [PMID: 34520740 PMCID: PMC8575123 DOI: 10.1016/j.str.2021.07.012] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 03/22/2021] [Accepted: 07/23/2021] [Indexed: 11/21/2022]
Abstract
An accurate understanding of biomolecular mechanisms and diseases requires information on protein quaternary structure (QS). A critical challenge in inferring QS information from crystallography data is distinguishing biological interfaces from fortuitous crystal-packing contacts. Here, we employ QS conservation across homologs to infer the biological relevance of hetero-oligomers. We compare the structures and compositions of hetero-oligomers, which allow us to annotate 7,810 complexes as physiologically relevant, 1,060 as likely errors, and 1,432 with comparative information on subunit stoichiometry and composition. Excluding immunoglobulins, these annotations encompass over 51% of hetero-oligomers in the PDB. We curate a dataset of 577 hetero-oligomeric complexes to benchmark these annotations, which reveals an accuracy >94%. When homology information is not available, we compare QS across repositories (PDB, PISA, and EPPIC) to derive confidence estimates. This work provides high-quality annotations along with a large benchmark dataset of hetero-assemblies.
Collapse
|
4
|
Gaber A, Pavšič M. Modeling and Structure Determination of Homo-Oligomeric Proteins: An Overview of Challenges and Current Approaches. Int J Mol Sci 2021; 22:9081. [PMID: 34445785 PMCID: PMC8396596 DOI: 10.3390/ijms22169081] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 08/20/2021] [Accepted: 08/20/2021] [Indexed: 12/12/2022] Open
Abstract
Protein homo-oligomerization is a very common phenomenon, and approximately half of proteins form homo-oligomeric assemblies composed of identical subunits. The vast majority of such assemblies possess internal symmetry which can be either exploited to help or poses challenges during structure determination. Moreover, aspects of symmetry are critical in the modeling of protein homo-oligomers either by docking or by homology-based approaches. Here, we first provide a brief overview of the nature of protein homo-oligomerization. Next, we describe how the symmetry of homo-oligomers is addressed by crystallographic and non-crystallographic symmetry operations, and how biologically relevant intermolecular interactions can be deciphered from the ordered array of molecules within protein crystals. Additionally, we describe the most important aspects of protein homo-oligomerization in structure determination by NMR. Finally, we give an overview of approaches aimed at modeling homo-oligomers using computational methods that specifically address their internal symmetry and allow the incorporation of other experimental data as spatial restraints to achieve higher model reliability.
Collapse
|
5
|
Liu HF, Liu R. Structure-based prediction of post-translational modification cross-talk within proteins using complementary residue- and residue pair-based features. Brief Bioinform 2021; 21:609-620. [PMID: 30649184 DOI: 10.1093/bib/bby123] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Revised: 11/26/2018] [Accepted: 11/30/2018] [Indexed: 02/07/2023] Open
Abstract
Post-translational modification (PTM)-based regulation can be mediated not only by the modification of a single residue but also by the interplay of different modifications. Accurate prediction of PTM cross-talk is a highly challenging issue and is in its infant stage. Especially, less attention has been paid to the structural preferences (except intrinsic disorder and spatial proximity) of cross-talk pairs and the characteristics of individual residues involved in cross-talk, which may restrict the improvement of the prediction accuracy. Here we report a structure-based algorithm called PCTpred to improve the PTM cross-talk prediction. The comprehensive residue- and residue pair-based features were designed for paired PTM sites at the sequence and structural levels. Through feature selection, we reserved 23 newly introduced descriptors and 3 traditional descriptors to develop a sequence-based predictor PCTseq and a structure-based predictor PCTstr, both of which were integrated to construct our final prediction model. According to pair- and protein-based evaluations, PCTpred yielded area under the curve values of approximately 0.9 and 0.8, respectively. Even when removing the distance preference of samples or using the input of modeled structures, our prediction performance was maintained or moderately reduced. PCTpred displayed stable and reliable improvements over the state-of-the-art methods based on various evaluations. The source code and data set are freely available at https://github.com/Liulab-HZAU/PCTpred or http://liulab.hzau.edu.cn/PCTpred/.
Collapse
Affiliation(s)
- Hui-Fang Liu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
| | - Rong Liu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
| |
Collapse
|
6
|
Biological vs. Crystallographic Protein Interfaces: An Overview of Computational Approaches for Their Classification. CRYSTALS 2020. [DOI: 10.3390/cryst10020114] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Complexes between proteins are at the basis of almost every process in cells. Their study, from a structural perspective, has a pivotal role in understanding biological functions and, importantly, in drug development. X-ray crystallography represents the broadest source for the experimental structural characterization of protein-protein complexes. Correctly identifying the biologically relevant interface from the crystallographic ones is, however, not trivial and can be prone to errors. Over the past two decades, computational methodologies have been developed to study the differences of those interfaces and automatically classify them as biological or crystallographic. Overall, protein-protein interfaces show differences in terms of composition, energetics and evolutionary conservation between biological and crystallographic ones. Based on those observations, a number of computational methods have been developed for this classification problem, which can be grouped into three main categories: Energy-, empirical knowledge- and machine learning-based approaches. In this review, we give a comprehensive overview of the training datasets and methods so far implemented, providing useful links and a brief description of each method.
Collapse
|
7
|
Sala D, Cerofolini L, Fragai M, Giachetti A, Luchinat C, Rosato A. A protocol to automatically calculate homo-oligomeric protein structures through the integration of evolutionary constraints and NMR ambiguous contacts. Comput Struct Biotechnol J 2019; 18:114-124. [PMID: 31969972 PMCID: PMC6961069 DOI: 10.1016/j.csbj.2019.12.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 11/20/2019] [Accepted: 12/06/2019] [Indexed: 12/15/2022] Open
Abstract
Protein assemblies are involved in many important biological processes. Solid-state NMR (SSNMR) spectroscopy is a technique suitable for the structural characterization of samples with high molecular weight and thus can be applied to such assemblies. A significant bottleneck in terms of both effort and time required is the manual identification of unambiguous intermolecular contacts. This is particularly challenging for homo-oligomeric complexes, where simple uniform labeling may not be effective. We tackled this challenge by exploiting coevolution analysis to extract information on homo-oligomeric interfaces from NMR-derived ambiguous contacts. After removing the evolutionary couplings (ECs) that are already satisfied by the 3D structure of the monomer, the predicted ECs are matched with the automatically generated list of experimental contacts. This approach provides a selection of potential interface residues that is used directly in monomer-monomer docking calculations. We validated the protocol on tetrameric L-asparaginase II and dimeric Sod1.
Collapse
Affiliation(s)
- Davide Sala
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Linda Cerofolini
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Marco Fragai
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| | - Andrea Giachetti
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Claudio Luchinat
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| | - Antonio Rosato
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| |
Collapse
|
8
|
Hadi-Alijanvand H. Soft regions of protein surface are potent for stable dimer formation. J Biomol Struct Dyn 2019; 38:3587-3598. [PMID: 31476974 DOI: 10.1080/07391102.2019.1662328] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
By having knowledge about the characteristics of protein interaction interfaces, we will be able to manipulate protein complexes for therapies. Dimer state is considered as the primary alphabet of the most proteins' quaternary structure. The properties of binding interface between subunits and of noninterface region define the specificity and stability of the intended protein complex. Considering some topological properties and amino acids' affinity for binding in interfaces of protein dimers, we construct the interface-specific recurrence plots. The data obtained from recurrence quantitative analysis, and accessibility-related metrics help us to classify the protein dimers into four distinct classes. Some mechanical properties of binding interfaces are computed for each predefined class of the dimers. The computed mechanical characteristics of binding patch region are compared with those of nonbinding region of proteins. Our observations indicate that the mechanical properties of protein binding sites have a decisive impact on determining the dimer stability. We introduce a new concept in analyzing protein structure by considering mechanical properties of protein structure. We conclude that the interface region between subunits of stable dimers is usually mechanically softer than the interface of unstable protein dimers. AbbreviationsAABaverage affinity for bindingANManisotropic network modelAPCaffinity propagation clusteringASAaccessible surface areaCCDinter residues distanceCSCcomplex stability codeDMdistance matrixΔGdissPISA-computed dissociation free energyGNMGaussian normal mode analysisNMAnormal mode analysisPBPprotein binding patchPISAproteins, interfaces, structures and assembliesrASArelative accessible area in respect to unfolded state of residuesRMrecurrence matrixrPrelative protrusionRPrecurrence plotRQArecurrence quantitative analysisSEMstandard error of meanCommunicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Hamid Hadi-Alijanvand
- Department of Biological Sciences, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, Iran
| |
Collapse
|
9
|
Accurate Classification of Biological and non-Biological Interfaces in Protein Crystal Structures using Subtle Covariation Signals. Sci Rep 2019; 9:12603. [PMID: 31471543 PMCID: PMC6717244 DOI: 10.1038/s41598-019-48913-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2017] [Accepted: 08/14/2019] [Indexed: 11/08/2022] Open
Abstract
Proteins often work as oligomers or multimers in vivo. Therefore, elucidating their oligomeric or multimeric form (quaternary structure) is crucially important to ascertain their function. X-ray crystal structures of numerous proteins have been accumulated, providing information related to their biological units. Extracting information of biological units from protein crystal structures represents a meaningful task for modern biology. Nevertheless, although many methods have been proposed for identifying biological units appearing in protein crystal structures, it is difficult to distinguish biological protein-protein interfaces from crystallographic ones. Therefore, our simple but highly accurate classifier was developed to infer biological units in protein crystal structures using large amounts of protein sequence information and a modern contact prediction method to exploit covariation signals (CSs) in proteins. We demonstrate that our proposed method is promising even for weak signals of biological interfaces. We also discuss the relation between classification accuracy and conservation of biological units, and illustrate how the selection of sequences included in multiple sequence alignments as sources for obtaining CSs affects the results. With increased amounts of sequence data, the proposed method is expected to become increasingly useful.
Collapse
|