1
|
Patat AS, Nalbantoğlu ÖU. Enhancing Functional Protein Design Using Heuristic Optimization and Deep Learning for Anti-Inflammatory and Gene Therapy Applications. Proteins 2025. [PMID: 39985803 DOI: 10.1002/prot.26810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2024] [Revised: 01/21/2025] [Accepted: 02/03/2025] [Indexed: 02/24/2025]
Abstract
Protein sequence design is a highly challenging task, aimed at discovering new proteins that are more functional and producible under laboratory conditions than their natural counterparts. Deep learning-based approaches developed to address this problem have achieved significant success. However, these approaches often do not adequately emphasize the functional properties of proteins. In this study, we developed a heuristic optimization method to enhance key functionalities such as solubility, flexibility, and stability, while preserving the structural integrity of proteins. This method aims to reduce laboratory demands by enabling a design that is both functional and structurally sound. This approach is particularly valuable for the synthetic production of proteins with anti-inflammatory properties and those used in gene therapy. The designed proteins were initially evaluated for their ability to preserve natural structures using recovery and confidence metrics, followed by assessments with the AlphaFold tool. Additionally, natural protein sequences were mutated using a genetic algorithm and compared with those designed by our method. The results demonstrate that the protein sequences generated by our method exhibit much greater similarity to native protein sequences and structures. The code and sequences for the designed proteins are available at https://github.com/aysenursoyturk/HMHO.
Collapse
Affiliation(s)
- Ayşenur Soytürk Patat
- Department of Bioinformatics Systems Biology, Erciyes University, Kayseri, Turkey
- Department of Bioinformatics, Necmettin Erbakan University, Konya, Turkey
| | | |
Collapse
|
2
|
de Menezes AAPM, Aguiar RPS, Santos JVO, Sarkar C, Islam MT, Braga AL, Hasan MM, da Silva FCC, Sharifi-Rad J, Dey A, Calina D, Melo-Cavalcante AAC, Sousa JMC. Citrinin as a potential anti-cancer therapy: A comprehensive review. Chem Biol Interact 2023:110561. [PMID: 37230156 DOI: 10.1016/j.cbi.2023.110561] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Revised: 05/09/2023] [Accepted: 05/22/2023] [Indexed: 05/27/2023]
Abstract
Citrinin (CIT) is a polyketide-derived mycotoxin, which is produced by many fungal strains belonging to the gerena Monascus, Aspergillus, and Penicillium. It has been postulated that mycotoxins have several toxic mechanisms and are potentially used as antineoplastic agents. Therefore, the present study carried out a systematic review, including articles from 1978 to 2022, by collecting evidence in experimental studies of CIT antiplorifactive activity in cancer. The Data indicate that CIT intervenes in important mediators and cell signaling pathways, including MAPKs, ERK1/2, JNK, Bcl-2, BAX, caspases 3,6,7 and 9, p53, p21, PARP cleavage, MDA, reactive oxygen species (ROS) and antioxidant defenses (SOD, CAT, GST and GPX). These factors demonstrate the potential antitumor drug CIT in inducing cell death, reducing DNA repair capacity and inducing cytotoxic and genotoxic effects in cancer cells.
Collapse
Affiliation(s)
- Ag-Anne P M de Menezes
- Laboratory of Genetical Toxicology, Postgraduate Program in Pharmaceutical Sciences, Federal University of Piauí, Teresina, Piauí, 64, 049-550, Brazil.
| | - Raí P S Aguiar
- Laboratory of Genetical Toxicology, Postgraduate Program in Pharmaceutical Sciences, Federal University of Piauí, Teresina, Piauí, 64, 049-550, Brazil.
| | - José V O Santos
- Laboratory of Genetical Toxicology, Postgraduate Program in Pharmaceutical Sciences, Federal University of Piauí, Teresina, Piauí, 64, 049-550, Brazil.
| | - Chandan Sarkar
- Department of Pharmacy, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj, 8100, Bangladesh.
| | - Muhammad T Islam
- Department of Pharmacy, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj, 8100, Bangladesh.
| | - Antonio L Braga
- Laboratory of Genetical Toxicology, Postgraduate Program in Pharmaceutical Sciences, Federal University of Piauí, Teresina, Piauí, 64, 049-550, Brazil.
| | - Mohammad M Hasan
- Department of Biochemistry and Molecular Biology, Faculty of Life Science, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh.
| | - Felipe C C da Silva
- Postgraduate Program in Pharmaceutical Science, Federal University of Piauí, Teresina, PI, Brazil.
| | | | - Abhijit Dey
- Department of Life Sciences, Presidency University, 86/1 College Street, Kolkata, 700073, India.
| | - Daniela Calina
- Department of Clinical Pharmacy, University of Medicine and Pharmacy of Craiova, 200349, Craiova, Romania.
| | - Ana A C Melo-Cavalcante
- Laboratory of Genetical Toxicology, Postgraduate Program in Pharmaceutical Sciences, Federal University of Piauí, Teresina, Piauí, 64, 049-550, Brazil; Postgraduate Program in Pharmaceutical Science, Federal University of Piauí, Teresina, PI, Brazil.
| | - João M C Sousa
- Laboratory of Genetical Toxicology, Postgraduate Program in Pharmaceutical Sciences, Federal University of Piauí, Teresina, Piauí, 64, 049-550, Brazil; Postgraduate Program in Pharmaceutical Science, Federal University of Piauí, Teresina, PI, Brazil.
| |
Collapse
|
3
|
Huang X, Zhou J, Yang D, Zhang J, Xia X, Chen YE, Xu J. Decoding CRISPR-Cas PAM recognition with UniDesign. Brief Bioinform 2023; 24:bbad133. [PMID: 37078688 PMCID: PMC10199764 DOI: 10.1093/bib/bbad133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2022] [Revised: 02/09/2023] [Accepted: 03/16/2023] [Indexed: 04/21/2023] Open
Abstract
The critical first step in Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-associated (CRISPR-Cas) protein-mediated gene editing is recognizing a preferred protospacer adjacent motif (PAM) on target DNAs by the protein's PAM-interacting amino acids (PIAAs). Thus, accurate computational modeling of PAM recognition is useful in assisting CRISPR-Cas engineering to relax or tighten PAM requirements for subsequent applications. Here, we describe a universal computational protein design framework (UniDesign) for designing protein-nucleic acid interactions. As a proof of concept, we applied UniDesign to decode the PAM-PIAA interactions for eight Cas9 and two Cas12a proteins. We show that, given native PIAAs, the UniDesign-predicted PAMs are largely identical to the natural PAMs of all Cas proteins. In turn, given natural PAMs, the computationally redesigned PIAA residues largely recapitulated the native PIAAs (74% and 86% in terms of identity and similarity, respectively). These results demonstrate that UniDesign faithfully captures the mutual preference between natural PAMs and native PIAAs, suggesting it is a useful tool for engineering CRISPR-Cas and other nucleic acid-interacting proteins. UniDesign is open-sourced at https://github.com/tommyhuangthu/UniDesign.
Collapse
Affiliation(s)
- Xiaoqiang Huang
- Center for Advanced Models for Translational Sciences and Therapeutics, Department of Internal Medicine, University of Michigan Medical School, 2800 Plymouth Road, Ann Arbor, MI 48109, USA
| | - Jun Zhou
- Center for Advanced Models for Translational Sciences and Therapeutics, Department of Internal Medicine, University of Michigan Medical School, 2800 Plymouth Road, Ann Arbor, MI 48109, USA
| | - Dongshan Yang
- Center for Advanced Models for Translational Sciences and Therapeutics, Department of Internal Medicine, University of Michigan Medical School, 2800 Plymouth Road, Ann Arbor, MI 48109, USA
| | - Jifeng Zhang
- Center for Advanced Models for Translational Sciences and Therapeutics, Department of Internal Medicine, University of Michigan Medical School, 2800 Plymouth Road, Ann Arbor, MI 48109, USA
| | - Xiaofeng Xia
- Research & Development, ATGC Inc., 100 E Lancaster Avenue, LIMR Building Lab 129, Wynnewood, PA 19096, USA
| | - Yuqing Eugene Chen
- Center for Advanced Models for Translational Sciences and Therapeutics, Department of Internal Medicine, University of Michigan Medical School, 2800 Plymouth Road, Ann Arbor, MI 48109, USA
| | - Jie Xu
- Center for Advanced Models for Translational Sciences and Therapeutics, Department of Internal Medicine, University of Michigan Medical School, 2800 Plymouth Road, Ann Arbor, MI 48109, USA
| |
Collapse
|
4
|
Halder P, Mitra P. Human prion protein: exploring the thermodynamic stability and structural dynamics of its pathogenic mutants. J Biomol Struct Dyn 2022; 40:11274-11290. [PMID: 34338141 DOI: 10.1080/07391102.2021.1957715] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2022]
Abstract
Human familial prion diseases are known to be associated with different single-point mutants of the gene coding for prion protein with a primary focus at several locations of the globular domain. We have identified 12 different single-point pathogenic mutants of human prion protein (HuPrP) with the help of extensive perturbations/mutation technique at multiple locations of HuPrP sequence related to potentiality towards conformational disorders. Among these, some of the mutants include pathogenic variants that corroborate well with the literature reported proteins while majority include some unique single-point mutants that are either not explicitly studied early or studied for variants with different residues at the specific position. Primarily, our study sheds light on the unfolding mechanism of the above mentioned mutants in depth. Besides, we could identify some mutants under investigation that demonstrates not only unfolding of the helical structures but also extension and generation of the β-sheet structures and or simultaneously have highly exposed hydrophobic surface which is assumed to be linked with the production of aggregate/fibril structures of the prion protein. Among the identified mutants, Q212E needs special attention due to its maximum exposure of hydrophobic core towards solvent and E200Q is found to be important due to its maximum extent of β-content. We are also able to identify different respective structural conformations of the proteins according to their degree of structural unfolding and those conformations can be extracted and further studied in detail. Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Puspita Halder
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
| | - Pralay Mitra
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
| |
Collapse
|
5
|
Malik A, Banerjee A, Pal A, Mitra P. A sequence space search engine for computational protein design to modulate molecular functionality. J Biomol Struct Dyn 2022; 41:2937-2946. [PMID: 35220920 DOI: 10.1080/07391102.2022.2042386] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
De-novo protein design explores the untapped sequence space that is otherwise less discovered during the evolutionary process. This necessitates an efficient sequence space search engine for effective convergence in computational protein design. We propose a greedy simulated annealing-based Monte-Carlo parallel search algorithm for better sequence-structure compatibility probing in protein design. The guidance provided by the evolutionary profile, the greedy approach, and the cooling schedule adopted in the Monte Carlo simulation ensures sufficient exploration and exploitation of the search space leading to faster convergence. On evaluating the proposed algorithm, we find that a dataset of 76 target scaffolds report an average root-mean-square-deviation (RMSD) of 1.07 Å and an average TM-Score of 0.93 with the modeled designed protein sequences. High sequence recapitulation of 48.7% (59.4%) observed in the design sequences for all (hydrophobic) solvent-inaccessible residues again establish the goodness of the proposed algorithm. A high (93.4%) intra-group recapitulation of hydrophobic residues in the solvent-inaccessible region indicates that the proposed protein design algorithm preserves the core residues in the protein and provides alternative residue combinations in the solvent-accessible regions of the target protein. Furthermore, a COFACTOR-based protein functional analysis shows that the design sequences exhibit altered molecular functionality and introduce new molecular functions compared to the target scaffolds.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Ayush Malik
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
| | - Anupam Banerjee
- School of Medical Science and Technology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
| | - Abantika Pal
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
| | - Pralay Mitra
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
| |
Collapse
|
6
|
Pal A, Mulumudy R, Mitra P. Modularity-based parallel protein design algorithm with an implementation using shared memory programming. Proteins 2021; 90:658-669. [PMID: 34651333 DOI: 10.1002/prot.26263] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 09/23/2021] [Accepted: 10/01/2021] [Indexed: 01/08/2023]
Abstract
Given a target protein structure, the prime objective of protein design is to find amino acid sequences that will fold/acquire to the given three-dimensional structure. The protein design problem belongs to the non-deterministic polynomial-time-hard class as sequence search space increases exponentially with protein length. To ensure better search space exploration and faster convergence, we propose a protein modularity-based parallel protein design algorithm. The modular architecture of the protein structure is exploited by considering an intermediate structural organization between secondary structure and domain defined as protein unit (PU). Here, we have incorporated a divide-and-conquer approach where a protein is split into PUs and each PU region is explored in a parallel fashion. It has been further analyzed that our shared memory implementation of modularity-based parallel sequence search leads to better search space exploration compared to the case of traditional full protein design. Sequence-based analysis on design sequences depicts an average of 39.7% sequence similarity on the benchmark data set. Structure-based comparison of the modeled structures of the design protein with the target structure exhibited an average root-mean-square deviation of 1.17 Å and an average template modeling score of 0.89. The selected modeled structures of the design protein sequences are validated using 100 ns molecular dynamics simulations where 80% of the proteins have shown better or similar stability to the respective target proteins. Our study informs that our modularity-based protein design algorithm can be extended to protein interaction design as well.
Collapse
Affiliation(s)
- Abantika Pal
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
| | - Rohith Mulumudy
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
| | - Pralay Mitra
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
| |
Collapse
|
7
|
Banerjee A, Pal K, Mitra P. An Evolutionary Profile Guided Greedy Parallel Replica-Exchange Monte Carlo Search Algorithm for Rapid Convergence in Protein Design. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:489-499. [PMID: 31329126 DOI: 10.1109/tcbb.2019.2928809] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Protein design, also known as the inverse protein folding problem, is the identification of a protein sequence that folds into a target protein structure. Protein design is proved as an NP-hard problem. While researchers are working on designing heuristics with an emphasis on new scoring functions, we propose a replica-exchange Monte Carlo (REMC) search algorithm that ensures faster convergence using a greedy strategy. Using biological insights, we construct an evolutionary profile to encode the amino acid variability in different positions of the target protein from its structural homologs. The evolutionary profile guides the REMC search, and the greedy approach confirms appreciable exploration and exploitation of the sequence-structure fitness surface. We allow termination of a simulation trajectory once stagnant situation is detected. A series of sequence and structure level validations establish the goodness of our design. On a benchmark dataset, our algorithm reports an average root-mean-square deviation of 1.21Å between the target and the design proteins when modeled with an existing protein folding software. Besides, our algorithm assures 6.16 times overall speedup. In Molecular Dynamics simulations, we observe that four out of selected five design proteins report better to comparable stability to the corresponding target proteins.
Collapse
|
8
|
Huang X, Pearce R, Zhang Y. FASPR: an open-source tool for fast and accurate protein side-chain packing. Bioinformatics 2020; 36:3758-3765. [PMID: 32259206 DOI: 10.1093/bioinformatics/btaa234] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2020] [Revised: 03/30/2020] [Accepted: 04/01/2020] [Indexed: 01/04/2023] Open
Abstract
MOTIVATION Protein structure and function are essentially determined by how the side-chain atoms interact with each other. Thus, accurate protein side-chain packing (PSCP) is a critical step toward protein structure prediction and protein design. Despite the importance of the problem, however, the accuracy and speed of current PSCP programs are still not satisfactory. RESULTS We present FASPR for fast and accurate PSCP by using an optimized scoring function in combination with a deterministic searching algorithm. The performance of FASPR was compared with four state-of-the-art PSCP methods (CISRR, RASP, SCATD and SCWRL4) on both native and non-native protein backbones. For the assessment on native backbones, FASPR achieved a good performance by correctly predicting 69.1% of all the side-chain dihedral angles using a stringent tolerance criterion of 20°, compared favorably with SCWRL4, CISRR, RASP and SCATD which successfully predicted 68.8%, 68.6%, 67.8% and 61.7%, respectively. Additionally, FASPR achieved the highest speed for packing the 379 test protein structures in only 34.3 s, which was significantly faster than the control methods. For the assessment on non-native backbones, FASPR showed an equivalent or better performance on I-TASSER predicted backbones and the backbones perturbed from experimental structures. Detailed analyses showed that the major advantage of FASPR lies in the optimal combination of the dead-end elimination and tree decomposition with a well optimized scoring function, which makes FASPR of practical use for both protein structure modeling and protein design studies. AVAILABILITY AND IMPLEMENTATION The web server, source code and datasets are freely available at https://zhanglab.ccmb.med.umich.edu/FASPR and https://github.com/tommyhuangthu/FASPR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
9
|
Banerjee A, Mitra P. Ebola Virus VP35 Protein: Modeling of the Tetrameric Structure and an Analysis of Its Interaction with Human PKR. J Proteome Res 2020; 19:4533-4542. [PMID: 32871072 PMCID: PMC7640970 DOI: 10.1021/acs.jproteome.0c00473] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2020] [Indexed: 01/12/2023]
Abstract
The Viral Protein 35 (VP35), a crucial protein of the Zaire Ebolavirus (EBOV), interacts with a plethora of human proteins to cripple the human immune system. Despite its importance, the entire structure of the tetrameric assembly of EBOV VP35 and the means by which it antagonizes the autophosphorylation of the kinase domain of human protein kinase R (PKRK) is still elusive. We consult existing structural information to model a tetrameric assembly of the VP35 protein where 93% of the protein is modeled using crystal structure templates. We analyze our modeled tetrameric structure to identify interchain bonding networks and use molecular dynamics simulations and normal-mode analysis to unravel the flexibility and deformability of the different regions of the VP35 protein. We establish that the C-terminal of VP35 (VP35C) directly interacts with PKRK to prevent it from autophosphorylation. Further, we identify three plausible VP35C-PKRK complexes with better affinity than the PKRK dimer formed during autophosphorylation and use protein design to establish a new stretch in VP35C that interacts with PKRK. The proposed tetrameric assembly will aid in better understanding of the VP35 protein, and the reported VP35C-PKRK complexes along with their interacting sites will help in the shortlisting of small molecule inhibitors.
Collapse
Affiliation(s)
- Anupam Banerjee
- Advanced
Technology Development Centre, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Pralay Mitra
- Department
of Computer Science and Engineering, Indian
Institute of Technology Kharagpur, West Bengal 721302, India
| |
Collapse
|
10
|
Huang X, Zheng W, Pearce R, Zhang Y. SSIPe: accurately estimating protein-protein binding affinity change upon mutations using evolutionary profiles in combination with an optimized physical energy function. Bioinformatics 2020; 36:2429-2437. [PMID: 31830252 DOI: 10.1093/bioinformatics/btz926] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2019] [Revised: 11/08/2019] [Accepted: 12/09/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Most proteins perform their biological functions through interactions with other proteins in cells. Amino acid mutations, especially those occurring at protein interfaces, can change the stability of protein-protein interactions (PPIs) and impact their functions, which may cause various human diseases. Quantitative estimation of the binding affinity changes (ΔΔGbind) caused by mutations can provide critical information for protein function annotation and genetic disease diagnoses. RESULTS We present SSIPe, which combines protein interface profiles, collected from structural and sequence homology searches, with a physics-based energy function for accurate ΔΔGbind estimation. To offset the statistical limits of the PPI structure and sequence databases, amino acid-specific pseudocounts were introduced to enhance the profile accuracy. SSIPe was evaluated on large-scale experimental data containing 2204 mutations from 177 proteins, where training and test datasets were stringently separated with the sequence identity between proteins from the two datasets below 30%. The Pearson correlation coefficient between estimated and experimental ΔΔGbind was 0.61 with a root-mean-square-error of 1.93 kcal/mol, which was significantly better than the other methods. Detailed data analyses revealed that the major advantage of SSIPe over other traditional approaches lies in the novel combination of the physical energy function with the new knowledge-based interface profile. SSIPe also considerably outperformed a former profile-based method (BindProfX) due to the newly introduced sequence profiles and optimized pseudocount technique that allows for consideration of amino acid-specific prior mutation probabilities. AVAILABILITY AND IMPLEMENTATION Web-server/standalone program, source code and datasets are freely available at https://zhanglab.ccmb.med.umich.edu/SSIPe and https://github.com/tommyhuangthu/SSIPe. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
11
|
Strokach A, Becerra D, Corbi-Verge C, Perez-Riba A, Kim PM. Fast and Flexible Protein Design Using Deep Graph Neural Networks. Cell Syst 2020; 11:402-411.e4. [DOI: 10.1016/j.cels.2020.08.016] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Revised: 06/27/2020] [Accepted: 08/26/2020] [Indexed: 11/15/2022]
|
12
|
Huang X, Pearce R, Zhang Y. EvoEF2: accurate and fast energy function for computational protein design. Bioinformatics 2020; 36:1135-1142. [PMID: 31588495 DOI: 10.1093/bioinformatics/btz740] [Citation(s) in RCA: 62] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Revised: 09/19/2019] [Accepted: 09/25/2019] [Indexed: 01/26/2023] Open
Abstract
MOTIVATION The accuracy and success rate of de novo protein design remain limited, mainly due to the parameter over-fitting of current energy functions and their inability to discriminate incorrect designs from correct designs. RESULTS We developed an extended energy function, EvoEF2, for efficient de novo protein sequence design, based on a previously proposed physical energy function, EvoEF. Remarkably, EvoEF2 recovered 32.5%, 47.9% and 22.3% of all, core and surface residues for 148 test monomers, and was generally applicable to protein-protein interaction design, as it recapitulated 30.9%, 42.4%, 31.3% and 21.4% of all, core, interface and surface residues for 88 test dimers, significantly outperforming EvoEF on the native sequence recapitulation. We further used I-TASSER to evaluate the foldability of the 148 designed monomer sequences, where all of them were predicted to fold into structures with high fold- and atomic-level similarity to their corresponding native structures, as demonstrated by the fact that 87.8% of the predicted structures shared a root-mean-square-deviation less than 2 Å to their native counterparts. The study also demonstrated that the usefulness of physical energy functions is highly correlated with the parameter optimization processes, and EvoEF2, with parameters optimized using sequence recapitulation, is more suitable for computational protein sequence design than EvoEF, which was optimized on thermodynamic mutation data. AVAILABILITY AND IMPLEMENTATION The source code of EvoEF2 and the benchmark datasets are freely available at https://zhanglab.ccmb.med.umich.edu/EvoEF. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiaoqiang Huang
- Department of Computational Medicine and Bioinformatics, MI 48109, USA
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics, MI 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, MI 48109, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
13
|
Tu Z, Huang X, Fu J, Hu N, Zheng W, Li Y, Zhang Y. Landscape of variable domain of heavy-chain-only antibody repertoire from alpaca. Immunology 2020; 161:53-65. [PMID: 32506493 DOI: 10.1111/imm.13224] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 05/18/2020] [Accepted: 05/19/2020] [Indexed: 01/05/2023] Open
Abstract
Heavy-chain-only antibodies (HCAbs), which are devoid of light chains, have been found naturally occurring in various species including camelids and cartilaginous fish. Because of their high thermostability, refoldability and capacity for cell permeation, the variable regions of the heavy chain of HCAbs (VHHs) have been widely used in diagnosis, bio-imaging, food safety and therapeutics. Most immunogenetic and functional studies of HCAbs are based on case studies or a limited number of low-throughput sequencing data. A complete picture derived from more abundant high-throughput sequencing (HTS) data can help us gain deeper insights. We cloned and sequenced the full-length coding region of VHHs in Alpaca (Vicugna pacos) via HTS in this study. A new pipeline was developed to conduct an in-depth analysis of the HCAb repertoires. Various critical features, including the length distribution of complementarity-determining region 3 (CDR3), V(D)J usage, VJ pairing, germline-specific mutation rate and germline-specific scoring profiles (GSSPs), were systematically characterized. The quantitative data show that V(D)J usage and VHH recombination are highly biased. Interestingly, we found that the average CDR3 length of classical VHHs is longer than that of non-classical ones, whereas the mutation rates are similar in both kinds of VHHs. Finally, GSSPs were built to quantitatively describe and compare sequences that originate from each VJ pair. Overall, this study presents a comprehensive landscape of the HCAb repertoire, which can provide useful guidance for the modeling of somatic hypermutation and the design of novel functional VHHs or VHH repertoires via evolutionary profiles.
Collapse
Affiliation(s)
- Zhui Tu
- State Key Laboratory of Food Science and Technology, Nanchang University, Nanchang, China.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA.,Jiangxi Province Key Laboratory of Modern Analytical Science, Nanchang University, Nanchang, China
| | - Xiaoqiang Huang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Jinheng Fu
- State Key Laboratory of Food Science and Technology, Nanchang University, Nanchang, China.,Jiangxi-OAI Joint Research Institution, Nanchang University, Nanchang, China
| | - Na Hu
- State Key Laboratory of Food Science and Technology, Nanchang University, Nanchang, China.,Jiangxi Province Key Laboratory of Modern Analytical Science, Nanchang University, Nanchang, China.,Maternal and Child Medical Research Institute, Shenzhen Maternity and Child Healthcare Hospital, Southern Medical University, Shenzhen, China
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Yanping Li
- State Key Laboratory of Food Science and Technology, Nanchang University, Nanchang, China.,Jiangxi Province Key Laboratory of Modern Analytical Science, Nanchang University, Nanchang, China.,Jiangxi-OAI Joint Research Institution, Nanchang University, Nanchang, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
14
|
Pearce R, Huang X, Setiawan D, Zhang Y. EvoDesign: Designing Protein-Protein Binding Interactions Using Evolutionary Interface Profiles in Conjunction with an Optimized Physical Energy Function. J Mol Biol 2019; 431:2467-2476. [PMID: 30851277 DOI: 10.1016/j.jmb.2019.02.028] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2018] [Revised: 02/10/2019] [Accepted: 02/26/2019] [Indexed: 01/19/2023]
Abstract
EvoDesign (https://zhanglab.ccmb.med.umich.edu/EvoDesign) is an online server system for protein design. The method uses evolutionary profiles to guide the sequence search simulation and demonstrated significant advantages over physics-based approaches in terms of more accurately designing proteins that adopt desired target folds. Despite the success, the previous EvoDesign program focused only on monomer protein design, which limited its ability and usefulness in terms of designing functional proteins. In this work, we propose a new EvoDesign server, which extends the principles of evolution-based design to design protein-protein interactions. Starting from a two-chain complex structure, structurally similar interfaces are identified from known protein-protein interaction databases. An interface evolutionary profile is then constructed from a multiple sequence alignment of the interface analogies, which is combined with a newly developed, atomic-level physical energy function to guide the replica-exchange Monte Carlo simulation search. The purpose of the server is to redesign the specified complex chain to increase its stability and binding affinity for the other chain in the complex. With the improved scope and accuracy of the methodology, the new EvoDesign pipeline should become a useful online tool for functional protein design and drug discovery studies.
Collapse
Affiliation(s)
- Robin Pearce
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA; Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Xiaoqiang Huang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Dani Setiawan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Zhang
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA; Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|