1
|
Soleymani F, Paquet E, Viktor H, Michalowski W, Spinello D. Protein-protein interaction prediction with deep learning: A comprehensive review. Comput Struct Biotechnol J 2022; 20:5316-5341. [PMID: 36212542 PMCID: PMC9520216 DOI: 10.1016/j.csbj.2022.08.070] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 08/29/2022] [Accepted: 08/30/2022] [Indexed: 11/15/2022] Open
Abstract
Most proteins perform their biological function by interacting with themselves or other molecules. Thus, one may obtain biological insights into protein functions, disease prevalence, and therapy development by identifying protein-protein interactions (PPI). However, finding the interacting and non-interacting protein pairs through experimental approaches is labour-intensive and time-consuming, owing to the variety of proteins. Hence, protein-protein interaction and protein-ligand binding problems have drawn attention in the fields of bioinformatics and computer-aided drug discovery. Deep learning methods paved the way for scientists to predict the 3-D structure of proteins from genomes, predict the functions and attributes of a protein, and modify and design new proteins to provide desired functions. This review focuses on recent deep learning methods applied to problems including predicting protein functions, protein-protein interaction and their sites, protein-ligand binding, and protein design.
Collapse
Affiliation(s)
- Farzan Soleymani
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON, Canada
| | - Eric Paquet
- National Research Council, 1200 Montreal Road, Ottawa, ON K1A 0R6, Canada
| | - Herna Viktor
- School of Electrical Engineering and Computer Science, University of Ottawa, ON, Canada
| | | | - Davide Spinello
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON, Canada
| |
Collapse
|
2
|
Kumar G, Srinivasan N, Sandhya S. Profiles of Natural and Designed Protein-Like Sequences Effectively Bridge Protein Sequence Gaps: Implications in Distant Homology Detection. Methods Mol Biol 2022; 2449:149-167. [PMID: 35507261 DOI: 10.1007/978-1-0716-2095-3_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Sequence-based approaches are fundamental to guide experimental investigations in obtaining structural and/or functional insights into uncharacterized protein families. Powerful profile-based sequence search methods rely on a sequence space continuum to identify non-trivial relationships through homology detection. The computational design of protein-like sequences that serve as "artificial linkers" is useful in identifying relationships between distant members of a structural fold. Such sequences act as intermediates and guide homology searches between distantly related proteins. Here, we describe an approach that represents natural intermediate sequences and designed protein-like sequences as HMM (Hidden Markov Models) profiles, to improve the sensitivity of existing search methods. Searches made within the "Profile database" were shown to recognize the parent structural fold for 90% of the search queries at query coverage better than 60%. For 1040 protein families with no available structure, fold associations were made through searches in the database of natural and designed sequence profiles. Most of the associations were made with the Alpha-alpha superhelix, Transmembrane beta-barrels, TIM barrel, and Immunoglobulin-like beta-sandwich folds. For 11 domain families of unknown functions, we provide confident fold associations using the profiles of designed sequences and a consensus from other fold recognition methods. For two DUFs (Domain families of Unknown Functions), we performed detailed functional annotation through comparisons with characterized templates of families of known function.
Collapse
Affiliation(s)
- Gayatri Kumar
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, India
| | | | - Sankaran Sandhya
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, India.
- Department of Biotechnology, Faculty of Life and Allied Health Sciences, M.S. Ramaiah University of Applied Sciences, Bangalore, Karnataka, India.
| |
Collapse
|
3
|
Janaki C, Gowri VS, Srinivasan N. Master Blaster: an approach to sensitive identification of remotely related proteins. Sci Rep 2021; 11:8746. [PMID: 33888741 PMCID: PMC8062480 DOI: 10.1038/s41598-021-87833-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Accepted: 04/06/2021] [Indexed: 11/11/2022] Open
Abstract
Genome sequencing projects unearth sequences of all the protein sequences encoded in a genome. As the first step, homology detection is employed to obtain clues to structure and function of these proteins. However, high evolutionary divergence between homologous proteins challenges our ability to detect distant relationships. In the past, an approach involving multiple Position Specific Scoring Matrices (PSSMs) was found to be more effective than traditional single PSSMs. Cascaded search is another successful approach where hits of a search are queried to detect more homologues. We propose a protocol, ‘Master Blaster’, which combines the principles adopted in these two approaches to enhance our ability to detect remote homologues even further. Assessment of the approach was performed using known relationships available in the SCOP70 database, and the results were compared against that of PSI-BLAST and HHblits, a hidden Markov model-based method. Compared to PSI-BLAST, Master Blaster resulted in 10% improvement with respect to detection of cross superfamily connections, nearly 35% improvement in cross family and more than 80% improvement in intra family connections. From the results it was observed that HHblits is more sensitive in detecting remote homologues compared to Master Blaster. However, there are true hits from 46-folds for which Master Blaster reported homologs that are not reported by HHblits even using the optimal parameters indicating that for detecting remote homologues, use of multiple methods employing a combination of different approaches can be more effective in detecting remote homologs. Master Blaster stand-alone code is available for download in the supplementary archive.
Collapse
Affiliation(s)
- Chintalapati Janaki
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, 560012, India.,Centre for Development of Advanced Computing, Knowledge Park, Byappanahalli, Bangalore, 560038, India
| | - Venkatraman S Gowri
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, 560012, India.,Department of Chemistry, Auxilium College, Gandhinagar, Vellore, 632006, India
| | | |
Collapse
|
4
|
Cheung NJ, Yu W. Sibe: a computation tool to apply protein sequence statistics to predict folding and design in silico. BMC Bioinformatics 2019; 20:455. [PMID: 31492097 PMCID: PMC6728967 DOI: 10.1186/s12859-019-2984-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Accepted: 07/03/2019] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND Evolutionary information contained in the amino acid sequences of proteins specifies the biological function and fold, but exactly what information contained in the protein sequence drives both of these processes? Considerable progress has been made to answer this fundamental question, but it remains challenging to explore the potential space of cooperative interactions between amino acids. Statistical analysis plays a significant role in studying such interactions and its use has expanded in recent years to studies ranging from coevolution-guided rational protein design to protein folding in silico. RESULTS Here we describe a computational tool named Sibe for use in studies of protein sequence, folding, and design using evolutionary coupling between amino acids as a driving factor. In this study, Sibe is used to identify positionally conserved couplings between pairwise amino acids and aid rational protein design. In this process, pairwise couplings are filtered according to the relative entropy computed from the positional conservations and grouped into several 'blocks', which could contribute to driving protein folding and design. A human β2-adrenergic receptor (β2AR) was used to demonstrate that those 'blocks' contribute the rational design for specifying functional residues. Sibe also provides folding modules based on both the positionally conserved couplings and well-established statistical potentials for simulating protein folding in silico and predicting tertiary structure. Our results show that statistically inferences of basic evolutionary principles, such as conservations and coupled-mutations, can be used to rapidly design a diverse set of proteins and study protein folding. CONCLUSIONS The developed software Sibe provides a computational tool for systematical analysis from protein primary to its tertiary structure using the evolutionary couplings as a driving factor. Sibe, written in C++, accounts for compatibility with the 'big data' era in biological science, and it primarily focuses on protein sequence analysis, but it is also applicable to extend to other modeling and predictions of experimental measurements.
Collapse
Affiliation(s)
- Ngaam J. Cheung
- Department of Brain and Cognitive Science, DGIST, Daegu, 42988 South Korea
- Cavendish Laboratory, Department of Physics, University of Cambridge, Cambridge, CB3 0HA UK
| | - Wookyung Yu
- Department of Brain and Cognitive Science, DGIST, Daegu, 42988 South Korea
- Core Protein Resources Center, DGIST, Daegu, 42988 South Korea
| |
Collapse
|
5
|
Enhanced catalytic activities and modified substrate preferences for taxoid 10β-O-acetyl transferase mutants by engineering catalytic histidine residues. Biotechnol Lett 2018; 40:1245-1251. [PMID: 29869304 DOI: 10.1007/s10529-018-2573-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2018] [Accepted: 05/18/2018] [Indexed: 12/14/2022]
Abstract
OBJECTIVES Taxoid 10β-O-acetyl transferase (DBAT) was redesigned to enhance its catalytic activity and substrate preference for baccatin III and taxol biosynthesis. RESULTS Residues H162, D166 and R363 were determined as potential sites within the catalytic pocket of DBAT for molecular docking and site-directed mutagenesis to modify the activity of DBAT. Enzymatic activity assays revealed that the kcat/KM values of mutant H162A/R363H, D166H, R363H, D166H/R363H acting on 10-deacetylbaccatin III were about 3, 15, 26 and 60 times higher than that of the wild type of DBAT, respectively. Substrate preference assays indicated that these mutants (H162A/R363H, D166H, R363H, D166H/R363H) could transfer acetyl group from unnatural acetyl donor (e.g. vinyl acetate, sec-butyl acetate, isobutyl acetate, amyl acetate and isoamyl acetate) to 10-deacetylbaccatin III. CONCLUSION Taxoid 10β-O-acetyl transferase mutants with redesigned active sites displayed increased catalytic activities and modified substrate preferences, indicating their possible application in the enzymatic synthesis of baccatin III and taxol.
Collapse
|
6
|
Kumar G, Mudgal R, Srinivasan N, Sandhya S. Use of designed sequences in protein structure recognition. Biol Direct 2018; 13:8. [PMID: 29776380 PMCID: PMC5960202 DOI: 10.1186/s13062-018-0209-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2017] [Accepted: 04/18/2018] [Indexed: 12/13/2022] Open
Abstract
Background Knowledge of the protein structure is a pre-requisite for improved understanding of molecular function. The gap in the sequence-structure space has increased in the post-genomic era. Grouping related protein sequences into families can aid in narrowing the gap. In the Pfam database, structure description is provided for part or full-length proteins of 7726 families. For the remaining 52% of the families, information on 3-D structure is not yet available. We use the computationally designed sequences that are intermediately related to two protein domain families, which are already known to share the same fold. These strategically designed sequences enable detection of distant relationships and here, we have employed them for the purpose of structure recognition of protein families of yet unknown structure. Results We first measured the success rate of our approach using a dataset of protein families of known fold and achieved a success rate of 88%. Next, for 1392 families of yet unknown structure, we made structural assignments for part/full length of the proteins. Fold association for 423 domains of unknown function (DUFs) are provided as a step towards functional annotation. Conclusion The results indicate that knowledge-based filling of gaps in protein sequence space is a lucrative approach for structure recognition. Such sequences assist in traversal through protein sequence space and effectively function as ‘linkers’, where natural linkers between distant proteins are unavailable. Reviewers This article was reviewed by Oliviero Carugo, Christine Orengo and Srikrishna Subramanian. Electronic supplementary material The online version of this article (10.1186/s13062-018-0209-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Gayatri Kumar
- Lab 103, Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, 560012, India
| | - Richa Mudgal
- Lab 103, Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, 560012, India.,Present address: Institute for Research in Biomedicine (IRB), Parc Cientific de Barcelona, C/ Baldiri Reixac 10, 08028, Barcelona, Spain
| | - Narayanaswamy Srinivasan
- Lab 103, Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, 560012, India.
| | - Sankaran Sandhya
- Lab 103, Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, 560012, India.
| |
Collapse
|
7
|
Wang J, Cao H, Zhang JZH, Qi Y. Computational Protein Design with Deep Learning Neural Networks. Sci Rep 2018; 8:6349. [PMID: 29679026 PMCID: PMC5910428 DOI: 10.1038/s41598-018-24760-x] [Citation(s) in RCA: 75] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2017] [Accepted: 04/10/2018] [Indexed: 12/19/2022] Open
Abstract
Computational protein design has a wide variety of applications. Despite its remarkable success, designing a protein for a given structure and function is still a challenging task. On the other hand, the number of solved protein structures is rapidly increasing while the number of unique protein folds has reached a steady number, suggesting more structural information is being accumulated on each fold. Deep learning neural network is a powerful method to learn such big data set and has shown superior performance in many machine learning fields. In this study, we applied the deep learning neural network approach to computational protein design for predicting the probability of 20 natural amino acids on each residue in a protein. A large set of protein structures was collected and a multi-layer neural network was constructed. A number of structural properties were extracted as input features and the best network achieved an accuracy of 38.3%. Using the network output as residue type restraints improves the average sequence identity in designing three natural proteins using Rosetta. Moreover, the predictions from our network show ~3% higher sequence identity than a previous method. Results from this study may benefit further development of computational protein design methods.
Collapse
Affiliation(s)
- Jingxue Wang
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai, 200062, China
| | - Huali Cao
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai, 200062, China
| | - John Z H Zhang
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai, 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai, 200062, China.,Department of Chemistry, New York University, NY, NY, 10003, USA.,Collaborative Innovation Center of Extreme Optics, Shanxi University, Taiyuan, Shanxi, 030006, China
| | - Yifei Qi
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai, 200062, China. .,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai, 200062, China.
| |
Collapse
|
8
|
Zhong HA, Santos EM, Vasileiou C, Zheng Z, Geiger JH, Borhan B, Merz KM. Free-Energy-Based Protein Design: Re-Engineering Cellular Retinoic Acid Binding Protein II Assisted by the Moveable-Type Approach. J Am Chem Soc 2018; 140:3483-3486. [PMID: 29480012 DOI: 10.1021/jacs.7b10368] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
How to fine-tune the binding free energy of a small-molecule to a receptor site by altering the amino acid residue composition is a key question in protein engineering. Indeed, the ultimate solution to this problem, to chemical accuracy (±1 kcal/mol), will result in profound and wide-ranging applications in protein design. Numerous tools have been developed to address this question using knowledge-based models to more computationally intensive molecular dynamics simulations-based free energy calculations, but while some success has been achieved there remains room for improvement in terms of overall accuracy and in the speed of the methodology. Here we report a fast, knowledge-based movable-type (MT)-based approach to estimate the absolute and relative free energy of binding as influenced by mutations in a small-molecule binding site in a protein. We retrospectively validate our approach using mutagenesis data for retinoic acid binding to the Cellular Retinoic Acid Binding Protein II (CRABPII) system and then make prospective predictions that are borne out experimentally. The overall performance of our approach is supported by its success in identifying mutants that show high or even sub-nano-molar binding affinities of retinoic acid to the CRABPII system.
Collapse
Affiliation(s)
- Haizhen A Zhong
- Department of Chemistry , Michigan State University , East Lansing , Michigan 48824 , United States.,Department of Chemistry , University of Nebraska at Omaha , Omaha , Nebraska 68182 , United States
| | - Elizabeth M Santos
- Department of Chemistry , Michigan State University , East Lansing , Michigan 48824 , United States
| | - Chrysoula Vasileiou
- Department of Chemistry , Michigan State University , East Lansing , Michigan 48824 , United States
| | - Zheng Zheng
- Department of Chemistry , Michigan State University , East Lansing , Michigan 48824 , United States
| | - James H Geiger
- Department of Chemistry , Michigan State University , East Lansing , Michigan 48824 , United States
| | - Babak Borhan
- Department of Chemistry , Michigan State University , East Lansing , Michigan 48824 , United States
| | - Kenneth M Merz
- Department of Chemistry , Michigan State University , East Lansing , Michigan 48824 , United States
| |
Collapse
|