1
|
Grudman S, Fajardo JE, Fiser A. Optimal selection of suitable templates in protein interface prediction. Bioinformatics 2023; 39:btad510. [PMID: 37603727 PMCID: PMC10491951 DOI: 10.1093/bioinformatics/btad510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 07/11/2023] [Accepted: 08/18/2023] [Indexed: 08/23/2023] Open
Abstract
MOTIVATION Molecular-level classification of protein-protein interfaces can greatly assist in functional characterization and rational drug design. The most accurate protein interface predictions rely on finding homologous proteins with known interfaces since most interfaces are conserved within the same protein family. The accuracy of these template-based prediction approaches depends on the correct choice of suitable templates. Choosing the right templates in the immunoglobulin superfamily (IgSF) is challenging because its members share low sequence identity and display a wide range of alternative binding sites despite structural homology. RESULTS We present a new approach to predict protein interfaces. First, template-specific, informative evolutionary profiles are established using a mutual information-based approach. Next, based on the similarity of residue level conservation scores derived from the evolutionary profiles, a query protein is hierarchically clustered with all available template proteins in its superfamily with known interface definitions. Once clustered, a subset of the most closely related templates is selected, and an interface prediction is made. These initial interface predictions are subsequently refined by extensive docking. This method was benchmarked on 51 IgSF proteins and can predict nontrivial interfaces of IgSF proteins with an average and median F-score of 0.64 and 0.78, respectively. We also provide a way to assess the confidence of the results. The average and median F-scores increase to 0.8 and 0.81, respectively, if 27% of low confidence cases and 17% of medium confidence cases are removed. Lastly, we provide residue level interface predictions, protein complexes, and confidence measurements for singletons in the IgSF. AVAILABILITY AND IMPLEMENTATION Source code is freely available at: https://gitlab.com/fiserlab.org/interdct_with_refinement.
Collapse
Affiliation(s)
- Steven Grudman
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - J Eduardo Fajardo
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Andras Fiser
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| |
Collapse
|
2
|
Mafakher L, Rismani E, Rahimi H, Enayatkhani M, Azadmanesh K, Teimoori-Toolabi L. Computational design of antagonist peptides based on the structure of secreted frizzled-related protein-1 (SFRP1) aiming to inhibit Wnt signaling pathway. J Biomol Struct Dyn 2020; 40:2169-2188. [DOI: 10.1080/07391102.2020.1835718] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Ladan Mafakher
- Molecular Medicine Department, Biotechnology Research Center, Pasteur Institute of Iran, Tehran, Iran
| | - Elham Rismani
- Molecular Medicine Department, Biotechnology Research Center, Pasteur Institute of Iran, Tehran, Iran
| | - Hamzeh Rahimi
- Molecular Medicine Department, Biotechnology Research Center, Pasteur Institute of Iran, Tehran, Iran
| | - Maryam Enayatkhani
- Molecular Medicine Department, Biotechnology Research Center, Pasteur Institute of Iran, Tehran, Iran
| | | | - Ladan Teimoori-Toolabi
- Molecular Medicine Department, Biotechnology Research Center, Pasteur Institute of Iran, Tehran, Iran
| |
Collapse
|
3
|
Motoyama T, Hiramatsu N, Asano Y, Nakano S, Ito S. Protein Sequence Selection Method That Enables Full Consensus Design of Artificial l-Threonine 3-Dehydrogenases with Unique Enzymatic Properties. Biochemistry 2020; 59:3823-3833. [PMID: 32945652 DOI: 10.1021/acs.biochem.0c00570] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Exponentially increasing protein sequence data enables artificial enzyme design using sequence-based protein design methods, including full-consensus protein design (FCD). The success of artificial enzyme design is strongly dependent on the nature of the sequences used. Hence, sequences must be selected from databases and curated libraries prepared to enable a successful design by FCD. In this study, we proposed a selection approach regarding several key residues as sequence motifs. We used l-threonine 3-dehydrogenase (TDH) as a model to test the validity of this approach. In the classification, four residues (143, 174, 188, and 214) were used as key residues. We classified thousands of TDH homologous sequences into five groups containing hundreds of sequences. Utilizing sequences in the libraries, we designed five artificial TDHs by FCD. Among the five, we successfully expressed four in soluble form. Biochemical analysis of artificial TDHs indicated that their enzymatic properties vary; half of the maximum measured enzyme activity (t1/2) and activation energies were distributed from 53 to 65 °C and from 38 to 125 kJ/mol, respectively. The artificial TDHs had unique kinetic parameters, distinct from one another. Structural analysis indicates that consensus mutations are mainly introduced in the secondary or outer shell. The functional diversity of the artificial TDHs is due to the accumulation of mutations that affect their physicochemical properties. Taken together, our findings indicate that our proposed approach can help generate artificial enzymes with unique enzymatic properties.
Collapse
Affiliation(s)
- Tomoharu Motoyama
- Graduate School of Integrated Pharmaceutical and Nutritional Sciences, University of Shizuoka, 52-1 Yada, Suruga-ku, Shizuoka 422-8526, Japan
| | - Nozomi Hiramatsu
- Graduate School of Integrated Pharmaceutical and Nutritional Sciences, University of Shizuoka, 52-1 Yada, Suruga-ku, Shizuoka 422-8526, Japan
| | - Yasuhisa Asano
- Biotechnology Research Center and Department of Biotechnology, Toyama Prefectural University, 5180 Kurokawa, Imizu, Toyama 939-0398, Japan
| | - Shogo Nakano
- Graduate School of Integrated Pharmaceutical and Nutritional Sciences, University of Shizuoka, 52-1 Yada, Suruga-ku, Shizuoka 422-8526, Japan
| | - Sohei Ito
- Graduate School of Integrated Pharmaceutical and Nutritional Sciences, University of Shizuoka, 52-1 Yada, Suruga-ku, Shizuoka 422-8526, Japan
| |
Collapse
|
4
|
Gil N, Fajardo EJ, Fiser A. Discovery of receptor-ligand interfaces in the immunoglobulin superfamily. Proteins 2019; 88:135-142. [PMID: 31298437 DOI: 10.1002/prot.25778] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2019] [Revised: 06/21/2019] [Accepted: 07/06/2019] [Indexed: 12/13/2022]
Abstract
Cell-surface-anchored immunoglobulin superfamily (IgSF) proteins are widespread throughout the human proteome, forming crucial components of diverse biological processes including immunity, cell-cell adhesion, and carcinogenesis. IgSF proteins generally function through protein-protein interactions carried out between extracellular, membrane-bound proteins on adjacent cells, known as trans-binding interfaces. These protein-protein interactions constitute a class of pharmaceutical targets important in the treatment of autoimmune diseases, chronic infections, and cancer. A molecular-level understanding of IgSF protein-protein interactions would greatly benefit further drug development. A critical step toward this goal is the reliable identification of IgSF trans-binding interfaces. We propose a novel combination of structure and sequence information to identify trans-binding interfaces in IgSF proteins. We developed a structure-based binding interface prediction approach that can identify broad regions of the protein surface that encompass the binding interfaces and suggests that IgSF proteins possess binding supersites. These interfaces could theoretically be pinpointed using sequence-based conservation analysis, with performance approaching the theoretical upper limit of binding interface prediction accuracy, but achieving this in practice is limited by the current ability to identify an appropriate multiple sequence alignment for conservation analysis. However, an important contribution of combining the two orthogonal methods is that agreement between these approaches can estimate the reliability of the predictions. This approach was benchmarked on the set of 22 IgSF proteins with experimentally solved structures in complex with their ligands. Additionally, we provide structure-based predictions and reliability scores for the 62 IgSF proteins with known structure but yet uncharacterized binding interfaces.
Collapse
Affiliation(s)
- Nelson Gil
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York.,Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New York
| | - Eduardo J Fajardo
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York.,Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New York
| | - Andras Fiser
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York.,Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New York
| |
Collapse
|
5
|
Viswanathan R, Fajardo E, Steinberg G, Haller M, Fiser A. Protein-protein binding supersites. PLoS Comput Biol 2019; 15:e1006704. [PMID: 30615604 PMCID: PMC6336348 DOI: 10.1371/journal.pcbi.1006704] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Revised: 01/17/2019] [Accepted: 12/05/2018] [Indexed: 11/19/2022] Open
Abstract
The lack of a deep understanding of how proteins interact remains an important roadblock in advancing efforts to identify binding partners and uncover the corresponding regulatory mechanisms of the functions they mediate. Understanding protein-protein interactions is also essential for designing specific chemical modifications to develop new reagents and therapeutics. We explored the hypothesis of whether protein interaction sites serve as generic biding sites for non-cognate protein ligands, just as it has been observed for small-molecule-binding sites in the past. Using extensive computational docking experiments on a test set of 241 protein complexes, we found that indeed there is a strong preference for non-cognate ligands to bind to the cognate binding site of a receptor. This observation appears to be robust to variations in docking programs, types of non-cognate protein probes, sizes of binding patches, relative sizes of binding patches and full-length proteins, and the exploration of obligate and non-obligate complexes. The accuracy of the docking scoring function appears to play a role in defining the correct site. The frequency of interaction of unrelated probes recognizing the binding interface was utilized in a simple prediction algorithm that showed accuracy competitive with other state of the art methods.
Collapse
Affiliation(s)
- Raji Viswanathan
- Department of Chemistry, Yeshiva University, New York, NY, United States of America
| | - Eduardo Fajardo
- Departments of Systems & Computational Biology, and Biochemistry, Albert Einstein College of Medicine, Bronx, NY, United States of America
| | - Gabriel Steinberg
- Department of Chemistry, Yeshiva University, New York, NY, United States of America
| | - Matthew Haller
- Department of Chemistry, Yeshiva University, New York, NY, United States of America
| | - Andras Fiser
- Departments of Systems & Computational Biology, and Biochemistry, Albert Einstein College of Medicine, Bronx, NY, United States of America
- * E-mail:
| |
Collapse
|
6
|
Gil N, Fiser A. The choice of sequence homologs included in multiple sequence alignments has a dramatic impact on evolutionary conservation analysis. Bioinformatics 2019; 35:12-19. [PMID: 29947739 PMCID: PMC6298051 DOI: 10.1093/bioinformatics/bty523] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Revised: 04/20/2018] [Accepted: 06/26/2018] [Indexed: 11/12/2022] Open
Abstract
Motivation The analysis of sequence conservation patterns has been widely utilized to identify functionally important (catalytic and ligand-binding) protein residues for over a half-century. Despite decades of development, on average state-of-the-art non-template-based functional residue prediction methods must predict ∼25% of a protein's total residues to correctly identify half of the protein's functional site residues. The overwhelming proportion of false positives results in reported 'F-Scores' of ∼0.3. We investigated the limits of current approaches, focusing on the so-far neglected impact of the specific choice of homologs included in multiple sequence alignments (MSAs). Results The limits of conservation-based functional residue prediction were explored by surveying the binding sites of 1023 proteins. A straightforward conservation analysis of MSAs composed of randomly selected homologs sampled from a PSI-BLAST search achieves average F-Scores of ∼0.3, a performance matching that reported by state-of-the-art methods, which often consider additional features for the prediction in a machine learning setting. Interestingly, we found that a simple combinatorial MSA sampling algorithm will in almost every case produce an MSA with an optimal set of homologs whose conservation analysis reaches average F-Scores of ∼0.6, doubling state-of-the-art performance. We also show that this is nearly at the theoretical limit of possible performance given the agreement between different binding site definitions. Additionally, we showcase the progress in this direction made by Selection of Alignment by Maximal Mutual Information (SAMMI), an information-theory-based approach to identifying biologically informative MSAs. This work highlights the importance and the unused potential of optimally composed MSAs for conservation analysis. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nelson Gil
- Department of Systems & Computational Biology, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Andras Fiser
- Department of Systems & Computational Biology, Albert Einstein College of Medicine, Bronx, NY, USA
| |
Collapse
|