1
|
Xiong D, U K, Sun J, Cribbs AP. PLMC: Language Model of Protein Sequences Enhances Protein Crystallization Prediction. Interdiscip Sci 2024; 16:802-813. [PMID: 39155325 DOI: 10.1007/s12539-024-00639-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 05/13/2024] [Accepted: 05/21/2024] [Indexed: 08/20/2024]
Abstract
X-ray diffraction crystallography has been most widely used for protein three-dimensional (3D) structure determination for which whether proteins are crystallizable is a central prerequisite. Yet, there are a number of procedures during protein crystallization, including protein material production, purification, and crystal production, which take turns affecting the crystallization outcome. Due to the expensive and laborious nature of this multi-stage process, various computational tools have been developed to predict protein crystallization propensity, which is then used to guide the experimental determination. In this study, we presented a novel deep learning framework, PLMC, to improve multi-stage protein crystallization propensity prediction by leveraging a pre-trained protein language model. To effectively train PLMC, two groups of features of each protein were integrated into a more comprehensive representation, including protein language embeddings from the large-scale protein sequence database and a handcrafted feature set consisting of physicochemical, sequence-based and disordered-related information. These features were further separately embedded for refinement, and then concatenated for the final prediction. Notably, our extensive benchmarking tests demonstrate that PLMC greatly outperforms other state-of-the-art methods by achieving AUC scores of 0.773, 0.893, and 0.913, respectively, at the aforementioned individual stages, and 0.982 at the final crystallization stage. Furthermore, PLMC is shown to be superior for predicting the crystallization of both globular and membrane proteins, as demonstrated by an AUC score of 0.991 for the latter. These results suggest the significant potential of PLMC in assisting researchers with the experimental design of crystallizable protein variants.
Collapse
Affiliation(s)
- Dapeng Xiong
- Department of Computational Biology, Cornell University, Ithaca, 14853, USA.
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, USA.
| | - Kaicheng U
- Department of Computational Biology, Cornell University, Ithaca, 14853, USA
| | - Jianfeng Sun
- Botnar Research Centre, University of Oxford, Oxford, OX3 7LD, UK.
| | - Adam P Cribbs
- Botnar Research Centre, University of Oxford, Oxford, OX3 7LD, UK
| |
Collapse
|
2
|
Gillani M, Pollastri G. Protein subcellular localization prediction tools. Comput Struct Biotechnol J 2024; 23:1796-1807. [PMID: 38707539 PMCID: PMC11066471 DOI: 10.1016/j.csbj.2024.04.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 04/11/2024] [Accepted: 04/11/2024] [Indexed: 05/07/2024] Open
Abstract
Protein subcellular localization prediction is of great significance in bioinformatics and biological research. Most of the proteins do not have experimentally determined localization information, computational prediction methods and tools have been acting as an active research area for more than two decades now. Knowledge of the subcellular location of a protein provides valuable information about its functionalities, the functioning of the cell, and other possible interactions with proteins. Fast, reliable, and accurate predictors provides platforms to harness the abundance of sequence data to predict subcellular locations accordingly. During the last decade, there has been a considerable amount of research effort aimed at developing subcellular localization predictors. This paper reviews recent subcellular localization prediction tools in the Eukaryotic, Prokaryotic, and Virus-based categories followed by a detailed analysis. Each predictor is discussed based on its main features, strengths, weaknesses, algorithms used, prediction techniques, and analysis. This review is supported by prediction tools taxonomies that highlight their rele- vant area and examples for uncomplicated categorization and ease of understandability. These taxonomies help users find suitable tools according to their needs. Furthermore, recent research gaps and challenges are discussed to cover areas that need the utmost attention. This survey provides an in-depth analysis of the most recent prediction tools to facilitate readers and can be considered a quick guide for researchers to identify and explore the recent literature advancements.
Collapse
Affiliation(s)
- Maryam Gillani
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| | - Gianluca Pollastri
- School of Computer Science, University College Dublin (UCD), Dublin, D04 V1W8, Ireland
| |
Collapse
|
3
|
Lin SK, Zhou J, Lu Y, Guo L, Huang JJ, Lin JF. Computer-Guided Engineered Endo- and Exocleaving Glycosidase for Significantly Improving Production of Ginsenoside F1. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2024; 72:26294-26304. [PMID: 39535231 DOI: 10.1021/acs.jafc.4c07387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2024]
Abstract
Ginsenoside F1, a particularly rare and valuable compound known for its health benefits, requires precise deglycosylation due to the extensive glycosylation of ginsenosides in Panax notoginseng. Here, we identified that the β-d-glucosidase BglSK exhibits both endo- and exocleaving glycosidase activities with multi-6-O-glycosides, thereby facilitating the specific production of Ginsenoside F1. The variant BglSKT137A/L508A, obtained through protein engineering, displayed kcat/KM values for the reactions of ginsenoside Rg1 and notoginsenoside R1 that were increased by 13.88-fold and 108.56-fold, respectively, compared with the BglSKWT. The reduced steric hindrance and the overall increase in loop stability show a higher tendency to adopt a closed conformation and facilitate the prereaction state, which may explain the enhanced catalytic efficiency of the engineered enzyme. These beneficial mutants will deepen our understanding of mechanisms for improving glycosidase activity and provide tools for producing high-value P. notoginseng products.
Collapse
Affiliation(s)
- Shi-Kun Lin
- College of Food Science, South China Agricultural University, Guangzhou 510640, China
| | - Jinlin Zhou
- Golden Health Biotechnology Co., Ltd., Foshan 528225, China
| | - Yujing Lu
- School of Chemical Engineering and Light Industry, School of Biomedical and Pharmaceutical Sciences, Guangdong University of Technology, Guangzhou 510006, China
| | - Liqiong Guo
- College of Food Science, South China Agricultural University, Guangzhou 510640, China
| | - Jia-Jun Huang
- Golden Health Biotechnology Co., Ltd., Foshan 528225, China
- TF BioSyn Biotechnology Co., Ltd., Foshan 528225, China
| | - Jun-Fang Lin
- College of Food Science, South China Agricultural University, Guangzhou 510640, China
| |
Collapse
|
4
|
Qiu S, Ju CL, Wang T, Chen J, Cui YT, Wang LQ, Fan FF, Huang J. Evolving ω-amine transaminase AtATA guided by substrate-enzyme binding free energy for enhancing activity and stability against non-natural substrates. Appl Environ Microbiol 2024; 90:e0054324. [PMID: 38864627 PMCID: PMC11267935 DOI: 10.1128/aem.00543-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Accepted: 05/15/2024] [Indexed: 06/13/2024] Open
Abstract
In the field of chiral amine synthesis, ω-amine transaminase (ω-ATA) is one of the most established enzymes capable of asymmetric amination under optimal conditions. However, the applicability of ω-ATA toward more non-natural complex molecules remains limited due to its low transamination activity, thermostability, and narrow substrate scope. Here, by employing a combined approach of computational virtual screening strategy and combinatorial active-site saturation test/iterative saturation mutagenesis strategy, we have constructed the best variant M14C3-V5 (M14C3-V62A-V116S-E117I-L118I-V147F) with improved ω-ATA from Aspergillus terreus (AtATA) activity and thermostability toward non-natural substrate 1-acetylnaphthalene, which is the ketone precursor for producing the intermediate (R)-(+)-1-(1-naphthyl)ethylamine [(R)-NEA] of cinacalcet hydrochloride, showing activity enhancement of up to 3.4-fold compared to parent enzyme M14C3 (AtATA-F115L-M150C-H210N-M280C-V149A-L182F-L187F). The computational tools YASARA, Discovery Studio, Amber, and FoldX were applied for predicting mutation hotspots based on substrate-enzyme binding free energies and to show the possible mechanism with features related to AtATA structure, catalytic activity, and stability in silico analyses. M14C3-V5 achieved 71.8% conversion toward 50 mM 1-acetylnaphthalene in a 50 mL preparative-scale reaction for preparing (R)-NEA. Moreover, M14C3-V5 expanded the substrate scope toward aromatic ketone compounds. The generated virtual screening strategy based on the changes in binding free energies has successfully predicted the AtATA activity toward 1-acetylnaphthalene and related substrates. Together with experimental data, these approaches can serve as a gateway to explore desirable performances, expand enzyme-substrate scope, and accelerate biocatalysis.IMPORTANCEChiral amine is a crucial compound with many valuable applications. Their asymmetric synthesis employing ω-amine transaminases (ω-ATAs) is considered an attractive method. However, most ω-ATAs exhibit low activity and stability toward various non-natural substrates, which limits their industrial application. In this work, protein engineering strategy and computer-aided design are performed to evolve the activity and stability of ω-ATA from Aspergillus terreus toward non-natural substrates. After five rounds of mutations, the best variant, M14C3-V5, is obtained, showing better catalytic efficiency toward 1-acetylnaphthalene and higher thermostability than the original enzyme, M14C3. The robust combinational variant acquired displayed significant application value for pushing the asymmetric synthesis of aromatic chiral amines to a higher level.
Collapse
Affiliation(s)
- Shuai Qiu
- Key Laboratory of Chemical and Biological Processing Technology for Farm Products of Zhejiang Province, Zhejiang Provincial Collaborative Innovation Center of Agricultural Biological Resources Biochemical Manufacturing, School of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Hangzhou, China
| | - Cong-Lin Ju
- Key Laboratory of Chemical and Biological Processing Technology for Farm Products of Zhejiang Province, Zhejiang Provincial Collaborative Innovation Center of Agricultural Biological Resources Biochemical Manufacturing, School of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Hangzhou, China
| | - Tong Wang
- Key Laboratory of Chemical and Biological Processing Technology for Farm Products of Zhejiang Province, Zhejiang Provincial Collaborative Innovation Center of Agricultural Biological Resources Biochemical Manufacturing, School of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Hangzhou, China
| | - Jie Chen
- Key Laboratory of Chemical and Biological Processing Technology for Farm Products of Zhejiang Province, Zhejiang Provincial Collaborative Innovation Center of Agricultural Biological Resources Biochemical Manufacturing, School of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Hangzhou, China
| | - Yu-Tong Cui
- Key Laboratory of Chemical and Biological Processing Technology for Farm Products of Zhejiang Province, Zhejiang Provincial Collaborative Innovation Center of Agricultural Biological Resources Biochemical Manufacturing, School of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Hangzhou, China
| | - Lin-Quan Wang
- Key Laboratory of Chemical and Biological Processing Technology for Farm Products of Zhejiang Province, Zhejiang Provincial Collaborative Innovation Center of Agricultural Biological Resources Biochemical Manufacturing, School of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Hangzhou, China
| | - Fang-Fang Fan
- Key Laboratory of Chemical and Biological Processing Technology for Farm Products of Zhejiang Province, Zhejiang Provincial Collaborative Innovation Center of Agricultural Biological Resources Biochemical Manufacturing, School of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Hangzhou, China
| | - Jun Huang
- Key Laboratory of Chemical and Biological Processing Technology for Farm Products of Zhejiang Province, Zhejiang Provincial Collaborative Innovation Center of Agricultural Biological Resources Biochemical Manufacturing, School of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Hangzhou, China
| |
Collapse
|
5
|
Dahlström KM, Salminen TA. Apprehensions and emerging solutions in ML-based protein structure prediction. Curr Opin Struct Biol 2024; 86:102819. [PMID: 38631107 DOI: 10.1016/j.sbi.2024.102819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/05/2024] [Accepted: 03/31/2024] [Indexed: 04/19/2024]
Abstract
The three-dimensional structure of proteins determines their function in vital biological processes. Thus, when the structure is known, the molecular mechanism of protein function can be understood in more detail and obtained information utilized in biotechnological, diagnostics, and therapeutic applications. Over the past five years, machine learning (ML)-based modeling has pushed protein structure prediction to the next level with AlphaFold in the front line, predicting the structure for hundreds of millions of proteins. Further advances recently report promising ML-based approaches for solving remaining challenges by incorporating functionally important metals, co-factors, post-translational modifications, structural dynamics, and interdomain and multimer interactions in the structure prediction process.
Collapse
Affiliation(s)
- Käthe M Dahlström
- Structural Bioinformatics Laboratory, Biochemistry, Faculty of Science and Engineering, Åbo Akademi University, Tykistökatu 6A, 20520 Turku, Finland; InFLAMES Research Flagship Center, Åbo Akademi University, 20520 Turku, Finland
| | - Tiina A Salminen
- Structural Bioinformatics Laboratory, Biochemistry, Faculty of Science and Engineering, Åbo Akademi University, Tykistökatu 6A, 20520 Turku, Finland; InFLAMES Research Flagship Center, Åbo Akademi University, 20520 Turku, Finland.
| |
Collapse
|
6
|
da Silva LSA, Seman LO, Camponogara E, Mariani VC, Dos Santos Coelho L. Bilinear optimization of protein structure prediction: An exact approach via AB off-lattice model. Comput Biol Med 2024; 176:108558. [PMID: 38754216 DOI: 10.1016/j.compbiomed.2024.108558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 04/25/2024] [Accepted: 05/05/2024] [Indexed: 05/18/2024]
Abstract
Protein structure prediction (PSP) remains a central challenge in computational biology due to its inherent complexity and high dimensionality. While numerous heuristic approaches have appeared in the literature, their success varies. The AB off-lattice model, which characterizes proteins as sequences of A (hydrophobic) and B (hydrophilic) beads, presents a simplified perspective on PSP. This work presents a mathematical optimization-based methodology capitalizing on the off-lattice AB model. Dissecting the inherent non-linearities of the energy landscape of protein folding allowed for formulating the PSP as a bilinear optimization problem. This formulation was achieved by introducing auxiliary variables and constraints that encapsulate the nuanced relationship between the protein's conformational space and its energy landscape. The proposed bilinear model exhibited notable accuracy in pinpointing the global minimum energy conformations on a benchmark dataset presented by the Protein Data Bank (PDB). Compared to traditional heuristic-based methods, this bilinear approach yielded exact solutions, reducing the likelihood of local minima entrapment. This research highlights the potential of reframing the traditionally non-linear protein structure prediction problem into a bilinear optimization problem through the off-lattice AB model. Such a transformation offers a route toward methodologies that can determine the global solution, challenging current PSP paradigms. Exploration into hybrid models, merging bilinear optimization and heuristic components, might present an avenue for balancing accuracy with computational efficiency.
Collapse
Affiliation(s)
- Luiza Scapinello Aquino da Silva
- Electrical Engineering Graduate Program (PPGEE), Federal University of Parana (UFPR), Coronel Francisco Heraclito dos Santos, Curitiba, 81530-000, Paraná, Brazil.
| | - Laio Oriel Seman
- Department of Automation and Systems Engineering, Federal University of Santa Catarina (UFSC), Engenheiro Agronômico Andrei Cristian Ferreira, Florianópolis, 88040-900, Santa Catarina, Brazil
| | - Eduardo Camponogara
- Department of Automation and Systems Engineering, Federal University of Santa Catarina (UFSC), Engenheiro Agronômico Andrei Cristian Ferreira, Florianópolis, 88040-900, Santa Catarina, Brazil
| | - Viviana Cocco Mariani
- Electrical Engineering Graduate Program (PPGEE), Federal University of Parana (UFPR), Coronel Francisco Heraclito dos Santos, Curitiba, 81530-000, Paraná, Brazil; Mechanical Engineering Graduate Program (PGMec), Federal University of Parana (UFPR), Coronel Francisco Heraclito dos Santos, Curitiba, 81530-000, Paraná, Brazil
| | - Leandro Dos Santos Coelho
- Electrical Engineering Graduate Program (PPGEE), Federal University of Parana (UFPR), Coronel Francisco Heraclito dos Santos, Curitiba, 81530-000, Paraná, Brazil
| |
Collapse
|
7
|
Lee S, Kim G, Karin EL, Mirdita M, Park S, Chikhi R, Babaian A, Kryshtafovych A, Steinegger M. Petabase-Scale Homology Search for Structure Prediction. Cold Spring Harb Perspect Biol 2024; 16:a041465. [PMID: 38316555 PMCID: PMC11065157 DOI: 10.1101/cshperspect.a041465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2024]
Abstract
The recent CASP15 competition highlighted the critical role of multiple sequence alignments (MSAs) in protein structure prediction, as demonstrated by the success of the top AlphaFold2-based prediction methods. To push the boundaries of MSA utilization, we conducted a petabase-scale search of the Sequence Read Archive (SRA), resulting in gigabytes of aligned homologs for CASP15 targets. These were merged with default MSAs produced by ColabFold-search and provided to ColabFold-predict. By using SRA data, we achieved highly accurate predictions (GDT_TS > 70) for 66% of the non-easy targets, whereas using ColabFold-search default MSAs scored highly in only 52%. Next, we tested the effect of deep homology search and ColabFold's advanced features, such as more recycles, on prediction accuracy. While SRA homologs were most significant for improving ColabFold's CASP15 ranking from 11th to 3rd place, other strategies contributed too. We analyze these in the context of existing strategies to improve prediction.
Collapse
Affiliation(s)
- Sewon Lee
- School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea
| | - Gyuri Kim
- School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea
| | | | - Milot Mirdita
- School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea
| | - Sukhwan Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, South Korea
| | - Rayan Chikhi
- Institut Pasteur, Université Paris Cité, G5 Sequence Bioinformatics, 75015 Paris, France
| | - Artem Babaian
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | | | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, South Korea
- Artificial Intelligence Institute, Seoul National University, Seoul 08826, South Korea
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul 08826, South Korea
| |
Collapse
|
8
|
Abe F, Nakano A, Hirata I, Tanimoto K, Kato K. Structure and function of engineered stromal cell-derived factor-1α. Dent Mater J 2024; 43:286-293. [PMID: 38417858 DOI: 10.4012/dmj.2023-247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2024]
Abstract
To design biologically active, collagen-based scaffolds for bone tissue engineering, we have synthesized chimeric proteins consisting of stromal cell-derived factor-1α (SDF) and the von Willebrand factor A3 collagen-binding domain (CBD). The chimeric proteins were used to evaluate the effect of domain linkage and its order on the structure and function of the SDF and CBD. The structure of the chimeric proteins was analyzed by circular dichroism spectroscopy, while functional analysis was performed by a cell migration assay for the SDF domain and a collagen-binding assay for the CBD domain. Furthermore, computational structural prediction was conducted for the chimeric proteins to examine the consistency with the results of structural and functional analyses. Our structural and functional analyses as well as structural prediction revealed that linking two domains can affect their functions. However, their order had minor effects on the three-dimensional structure of CBD and SDF in the chimeric proteins.
Collapse
Affiliation(s)
- Fumika Abe
- Department of Biomaterials, Graduate School of Biomedical and Health Sciences, Hiroshima University
- Department of Orthodontics and Craniofacial Developmental Biology, Graduate School of Biomedical and Health Sciences, Hiroshima University
| | - Ayana Nakano
- Department of Biomaterials, Graduate School of Biomedical and Health Sciences, Hiroshima University
- Department of Orthodontics and Craniofacial Developmental Biology, Graduate School of Biomedical and Health Sciences, Hiroshima University
| | - Isao Hirata
- Department of Biomaterials, Graduate School of Biomedical and Health Sciences, Hiroshima University
| | - Kotaro Tanimoto
- Department of Orthodontics and Craniofacial Developmental Biology, Graduate School of Biomedical and Health Sciences, Hiroshima University
| | - Koichi Kato
- Department of Biomaterials, Graduate School of Biomedical and Health Sciences, Hiroshima University
- Nanomedicine Research Division, Research Institute for Nanodevices, Hiroshima University
| |
Collapse
|
9
|
Chidambara Thanu V, Jabeen A, Ranganathan S. iBio-GATS-A Semi-Automated Workflow for Structural Modelling of Insect Odorant Receptors. Int J Mol Sci 2024; 25:3055. [PMID: 38474300 DOI: 10.3390/ijms25053055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 02/26/2024] [Accepted: 03/04/2024] [Indexed: 03/14/2024] Open
Abstract
Insects utilize seven transmembrane (7TM) odorant receptor (iOR) proteins, with an inverted topology compared to G-protein coupled receptors (GPCRs), to detect chemical cues in the environment. For pest biocontrol, chemical attractants are used to trap insect pests. However, with the influx of invasive insect pests, novel odorants are urgently needed, specifically designed to match 3D iOR structures. Experimental structural determination of these membrane receptors remains challenging and only four experimental iOR structures from two evolutionarily distant organisms have been solved. Template-based modelling (TBM) is a complementary approach, to generate model structures, selecting templates based on sequence identity. As the iOR family is highly divergent, a different template selection approach than sequence identity is needed. Bio-GATS template selection for GPCRs, based on hydrophobicity correspondence, has been morphed into iBio-GATS, for template selection from available experimental iOR structures. This easy-to-use semi-automated workflow has been extended to generate high-quality models from any iOR sequence from the selected template, using Python and shell scripting. This workflow was successfully validated on Apocrypta bakeri Orco and Machilis hrabei OR5 structures. iBio-GATS models generated for the fruit fly iOR, OR59b and Orco, yielded functional ligand binding results concordant with experimental mutagenesis findings, compared to AlphaFold2 models.
Collapse
Affiliation(s)
| | - Amara Jabeen
- Applied Biosciences, Macquarie University, Sydney 2109, Australia
| | | |
Collapse
|
10
|
Hasan ME, Samir A, Khalil MM, Shafaa MW. Bioinformatics approach for prediction and analysis of the Non-Structural Protein 4B (NSP4B) of the Zika virus. J Genet Eng Biotechnol 2024; 22:100336. [PMID: 38494248 PMCID: PMC10860876 DOI: 10.1016/j.jgeb.2023.100336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
BACKGROUND The Nonstructural Protein (NSP) 4B of Zika virus of 251 amino acids from (ZIKV/Human/POLG_ZIKVF) with accession number (A0A024B7W1), Induces the production of Endoplasmic Reticulum ER-derived membrane vesicles, which are the sites of viral replication. To understand the physical basis of how proteins fold in nature and to solve the challenge of protein structure prediction, Ab-initio and comparative modeling are crucial tools. RESULTS The systematic in silico technique, ThreaDom, had only predicted one domain (4 - 190) of NSP4B. I-TASSER, and Alphafold were ranked as the best servers for full-length 3-D protein structure predictions of NSP4B, where the predicted models were evaluated quantitatively using benchmarked metrics including C-score (-3.43), TM-score (0.77949), RMSD (2.73), and Z-score (1.561). The functional and protein binding motifs were realized using motif databases, secondary and surface accessibility predictions combined with Post-Translational Modification Sites (PTMs) prediction. Two highly conserved protein-binding motifs (Flavi NS4B and Bacillus papRprotein), together with three (PTMs) (Casein Kinase II, Myristyl site, and ASN-Glycosylation site) were predicted utilizing the Motif scan and Scanprosite servers. These patterns and PTMs were associated with NSP4B's role in triggering the development of the viral replication complex and its participation in the localization of NS3 and NS5 on the membrane. Only one hit from Structural Classification of Protein (SCOP) matched the protein sequence at positions 10 to 397 and was categorized six-hairpin glycosidases superfamily according to CATH (Class, Architecture, Topology, and Homology). Integrating this NSP4B information with the templates' SCOP and CATH annotations achieves it easier to attribute structure-function/evolution links to both previously known and recently discovered protein structures.
Collapse
Affiliation(s)
- Mohamed E Hasan
- Bioinformatics Department, Genetic Engineering and Biotechnology Research Institute, University of Sadat City, Sadat City 32897, Egypt.
| | - Aya Samir
- Physics Department, Medical Biophysics Division, Faculty of Science, Helwan University, Cairo, Egypt
| | - Magdy M Khalil
- Physics Department, Medical Biophysics Division, Faculty of Science, Helwan University, Cairo, Egypt; School of Biotechnology, Badr University in Cairo, Egypt
| | - Medhat W Shafaa
- Physics Department, Medical Biophysics Division, Faculty of Science, Helwan University, Cairo, Egypt
| |
Collapse
|
11
|
Wuyun Q, Chen Y, Shen Y, Cao Y, Hu G, Cui W, Gao J, Zheng W. Recent Progress of Protein Tertiary Structure Prediction. Molecules 2024; 29:832. [PMID: 38398585 PMCID: PMC10893003 DOI: 10.3390/molecules29040832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 02/06/2024] [Accepted: 02/08/2024] [Indexed: 02/25/2024] Open
Abstract
The prediction of three-dimensional (3D) protein structure from amino acid sequences has stood as a significant challenge in computational and structural bioinformatics for decades. Recently, the widespread integration of artificial intelligence (AI) algorithms has substantially expedited advancements in protein structure prediction, yielding numerous significant milestones. In particular, the end-to-end deep learning method AlphaFold2 has facilitated the rise of structure prediction performance to new heights, regularly competitive with experimental structures in the 14th Critical Assessment of Protein Structure Prediction (CASP14). To provide a comprehensive understanding and guide future research in the field of protein structure prediction for researchers, this review describes various methodologies, assessments, and databases in protein structure prediction, including traditionally used protein structure prediction methods, such as template-based modeling (TBM) and template-free modeling (FM) approaches; recently developed deep learning-based methods, such as contact/distance-guided methods, end-to-end folding methods, and protein language model (PLM)-based methods; multi-domain protein structure prediction methods; the CASP experiments and related assessments; and the recently released AlphaFold Protein Structure Database (AlphaFold DB). We discuss their advantages, disadvantages, and application scopes, aiming to provide researchers with insights through which to understand the limitations, contexts, and effective selections of protein structure prediction methods in protein-related fields.
Collapse
Affiliation(s)
- Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Yihan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Yifeng Shen
- Faculty of Environment and Information Studies, Keio University, Fujisawa 252-0882, Kanagawa, Japan;
| | - Yang Cao
- College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Gang Hu
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China
| | - Wei Cui
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China;
| | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
12
|
Zheng W, Wuyun Q, Li Y, Zhang C, Freddolino PL, Zhang Y. Improving deep learning protein monomer and complex structure prediction using DeepMSA2 with huge metagenomics data. Nat Methods 2024; 21:279-289. [PMID: 38167654 PMCID: PMC10864179 DOI: 10.1038/s41592-023-02130-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Accepted: 11/13/2023] [Indexed: 01/05/2024]
Abstract
Leveraging iterative alignment search through genomic and metagenome sequence databases, we report the DeepMSA2 pipeline for uniform protein single- and multichain multiple-sequence alignment (MSA) construction. Large-scale benchmarks show that DeepMSA2 MSAs can remarkably increase the accuracy of protein tertiary and quaternary structure predictions compared with current state-of-the-art methods. An integrated pipeline with DeepMSA2 participated in the most recent CASP15 experiment and created complex structural models with considerably higher quality than the AlphaFold2-Multimer server (v.2.2.0). Detailed data analyses show that the major advantage of DeepMSA2 lies in its balanced alignment search and effective model selection, and in the power of integrating huge metagenomics databases. These results demonstrate a new avenue to improve deep learning protein structure prediction through advanced MSA construction and provide additional evidence that optimization of input information to deep learning-based structure prediction methods must be considered with as much care as the design of the predictor itself.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - P Lydia Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA.
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore.
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA.
- Department of Computer Science, School of Computing, National University of Singapore, Singapore, Singapore.
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
13
|
Koch WHG. Quantum chemical "Aufbau" principles: how to estimate the shape of highly flexible (bio-)polymers? A recursively extendable "chemion picture" of Euler-Hückel-type. J Mol Model 2024; 30:47. [PMID: 38265671 PMCID: PMC11315800 DOI: 10.1007/s00894-023-05807-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 12/09/2023] [Indexed: 01/25/2024]
Abstract
An outline is given of how to split the n-dimensional space of torsion angles occurring in flexible (bio-)polymers (like alkanes, nucleic acids, or proteins, for instance) into n one-dimensional potential curves. Forthcoming applications will focus on the "protein folding problem," beginning with polyglycine. CONTEXT: In accordance with Euler's topology rules, molecules are considered to be composed of "vertices" (atoms, ligands, bonding sites, functional groups, and bigger fragments). Following Hückel, each vertex is represented by only one basis function. Starting from the "monofocal" hydrids CH[Formula: see text], NH[Formula: see text], OH[Formula: see text], FH, and SiH[Formula: see text], PH[Formula: see text], SH[Formula: see text], ClH as anchor units, "chemionic" Hamiltonians (of individual "chemion ensembles" and proportional nuclear charges) are constructed recursively, together with an appropriate basis set for the first five (normal) alkanes and some related oligomers like primary alcohols, alkyl amines, and alkyl chlorides. METHODS: Standard methods ("Restricted Hartree-Fock RHF" and "Full Configuration Interaction FCI") are used to solve the various stationary Schrödinger equations. Two software packages are indispensable: "SMILES" for integral evaluations over Slater-type orbitals (STO), and "Numerical Recipes" for matrix diagonalizations and inversions. While managing with only two-center repulsion integrals, "implicit multi-center integrations" lead us to the non-empirical fundament of Hoffmann's "Extended-Hückel Theory."
Collapse
Affiliation(s)
- Wolfhard H G Koch
- Facultad de Estudios Superiores Zaragoza, Universidad Nacional Autónoma de México, Mexico City, Mexico.
- Institut für Physikalische und Theoretische Chemie, Universität Tübingen, Tübingen, Germany.
| |
Collapse
|
14
|
Peng CX, Liang F, Xia YH, Zhao KL, Hou MH, Zhang GJ. Recent Advances and Challenges in Protein Structure Prediction. J Chem Inf Model 2024; 64:76-95. [PMID: 38109487 DOI: 10.1021/acs.jcim.3c01324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2023]
Abstract
Artificial intelligence has made significant advances in the field of protein structure prediction in recent years. In particular, DeepMind's end-to-end model, AlphaFold2, has demonstrated the capability to predict three-dimensional structures of numerous unknown proteins with accuracy levels comparable to those of experimental methods. This breakthrough has opened up new possibilities for understanding protein structure and function as well as accelerating drug discovery and other applications in the field of biology and medicine. Despite the remarkable achievements of artificial intelligence in the field, there are still some challenges and limitations. In this Review, we discuss the recent progress and some of the challenges in protein structure prediction. These challenges include predicting multidomain protein structures, protein complex structures, multiple conformational states of proteins, and protein folding pathways. Furthermore, we highlight directions in which further improvements can be conducted.
Collapse
Affiliation(s)
- Chun-Xiang Peng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Fang Liang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Yu-Hao Xia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Kai-Long Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Ming-Hua Hou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
15
|
Chen YM, Lu CT, Wang CW, Fischer WB. Repurposing dye ligands as antivirals via a docking approach on viral membrane and globular proteins - SARS-CoV-2 and HPV-16. BIOCHIMICA ET BIOPHYSICA ACTA. BIOMEMBRANES 2024; 1866:184220. [PMID: 37657640 DOI: 10.1016/j.bbamem.2023.184220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 08/21/2023] [Accepted: 08/24/2023] [Indexed: 09/03/2023]
Abstract
A series of dye ligands are docked to three different proteins, E and 3a of severe acute respiratory syndrome corona virus 2 (SARS-CoV-2) and E6 of human papilloma virus type 16 (HPV-16) using three different software. A four-level selection algorithm is used based on nonparametric statistics of numerical key values such as the "rank" derived from (i) averaged estimated binding energies (EBEs) and (ii) absolute EBE value of each of the software, (iii) frequency of ranking and (iv) rank of the area-under-curve values (AUCs) from decoy docking. A series of repurposing drugs and known antivirals used in experimental studies are docked for comparison. One dye ligand is ranked best for all proteins using the selection algorithm levels i - iii. Another three dye ligands are ranked top for the proteins individually when using all four levels.
Collapse
Affiliation(s)
- Yi-Ming Chen
- Institute of Biophotonics, School of Biomedical Science and Engineering, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Ching-Tai Lu
- Institute of Biophotonics, School of Biomedical Science and Engineering, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Chia-Wen Wang
- Institute of Biophotonics, School of Biomedical Science and Engineering, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Wolfgang B Fischer
- Institute of Biophotonics, School of Biomedical Science and Engineering, National Yang Ming Chiao Tung University, Taipei, Taiwan.
| |
Collapse
|
16
|
Ng TK, Ji J, Liu Q, Yao Y, Wang WY, Cao Y, Chen CB, Lin JW, Dong G, Cen LP, Huang C, Zhang M. Evaluation of Myocilin Variant Protein Structures Modeled by AlphaFold2. Biomolecules 2023; 14:14. [PMID: 38275755 PMCID: PMC10813463 DOI: 10.3390/biom14010014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 12/12/2023] [Accepted: 12/15/2023] [Indexed: 01/27/2024] Open
Abstract
Deep neural network-based programs can be applied to protein structure modeling by inputting amino acid sequences. Here, we aimed to evaluate the AlphaFold2-modeled myocilin wild-type and variant protein structures and compare to the experimentally determined protein structures. Molecular dynamic and ligand binding properties of the experimentally determined and AlphaFold2-modeled protein structures were also analyzed. AlphaFold2-modeled myocilin variant protein structures showed high similarities in overall structure to the experimentally determined mutant protein structures, but the orientations and geometries of amino acid side chains were slightly different. The olfactomedin-like domain of the modeled missense variant protein structures showed fewer folding changes than the nonsense variant when compared to the predicted wild-type protein structure. Differences were also observed in molecular dynamics and ligand binding sites between the AlphaFold2-modeled and experimentally determined structures as well as between the wild-type and variant structures. In summary, the folding of the AlphaFold2-modeled MYOC variant protein structures could be similar to that determined by the experiments but with differences in amino acid side chain orientations and geometries. Careful comparisons with experimentally determined structures are needed before the applications of the in silico modeled variant protein structures.
Collapse
Affiliation(s)
- Tsz Kin Ng
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
- Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong, China
| | - Jie Ji
- Network & Information Centre, Shantou University, Shantou 515041, China
| | - Qingping Liu
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
- Key Laboratory of Carbohydrate and Lipid Metabolism Research, College of Life Science and Technology, Dalian University, Dalian 116622, China
| | - Yao Yao
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
- Shantou University Medical College, Shantou 515041, China
| | - Wen-Ying Wang
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
- Shantou University Medical College, Shantou 515041, China
| | - Yingjie Cao
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
| | - Chong-Bo Chen
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
| | - Jian-Wei Lin
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
| | - Geng Dong
- Shantou University Medical College, Shantou 515041, China
| | - Ling-Ping Cen
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
| | - Chukai Huang
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
| | - Mingzhi Zhang
- Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou 515041, China; (T.K.N.)
| |
Collapse
|
17
|
Lee JW, Won JH, Jeon S, Choo Y, Yeon Y, Oh JS, Kim M, Kim S, Joung I, Jang C, Lee SJ, Kim TH, Jin KH, Song G, Kim ES, Yoo J, Paek E, Noh YK, Joo K. DeepFold: enhancing protein structure prediction through optimized loss functions, improved template features, and re-optimized energy function. Bioinformatics 2023; 39:btad712. [PMID: 37995286 PMCID: PMC10699847 DOI: 10.1093/bioinformatics/btad712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 11/17/2023] [Accepted: 11/22/2023] [Indexed: 11/25/2023] Open
Abstract
MOTIVATION Predicting protein structures with high accuracy is a critical challenge for the broad community of life sciences and industry. Despite progress made by deep neural networks like AlphaFold2, there is a need for further improvements in the quality of detailed structures, such as side-chains, along with protein backbone structures. RESULTS Building upon the successes of AlphaFold2, the modifications we made include changing the losses of side-chain torsion angles and frame aligned point error, adding loss functions for side chain confidence and secondary structure prediction, and replacing template feature generation with a new alignment method based on conditional random fields. We also performed re-optimization by conformational space annealing using a molecular mechanics energy function which integrates the potential energies obtained from distogram and side-chain prediction. In the CASP15 blind test for single protein and domain modeling (109 domains), DeepFold ranked fourth among 132 groups with improvements in the details of the structure in terms of backbone, side-chain, and Molprobity. In terms of protein backbone accuracy, DeepFold achieved a median GDT-TS score of 88.64 compared with 85.88 of AlphaFold2. For TBM-easy/hard targets, DeepFold ranked at the top based on Z-scores for GDT-TS. This shows its practical value to the structural biology community, which demands highly accurate structures. In addition, a thorough analysis of 55 domains from 39 targets with publicly available structures indicates that DeepFold shows superior side-chain accuracy and Molprobity scores among the top-performing groups. AVAILABILITY AND IMPLEMENTATION DeepFold tools are open-source software available at https://github.com/newtonjoo/deepfold.
Collapse
Affiliation(s)
- Jae-Won Lee
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Jong-Hyun Won
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Seonggwang Jeon
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Yujin Choo
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
- Department of Artificial intelligence, Hanyang University, Seoul 04763, Korea
| | - Yubin Yeon
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Jin-Seon Oh
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
- Department of Artificial intelligence, Hanyang University, Seoul 04763, Korea
| | - Minsoo Kim
- Department of Physics, Sungkyunkwan University, Suwon 16419, Korea
| | - SeonHwa Kim
- School of Electrical Engineering, Korea University, Seoul 02841, Korea
| | | | - Cheongjae Jang
- Artificial Intelligence Institute, Hanyang University, Seoul 04763, Korea
| | - Sung Jong Lee
- Basic Science Research Institute, Changwon National University, Changwon 51140, Korea
| | - Tae Hyun Kim
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
| | - Kyong Hwan Jin
- School of Electrical Engineering, Korea University, Seoul 02841, Korea
| | - Giltae Song
- School of Computer Science and Engineering, Pusan National University, Busan 46241, Korea
| | - Eun-Sol Kim
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
| | - Jejoong Yoo
- Department of Physics, Sungkyunkwan University, Suwon 16419, Korea
| | - Eunok Paek
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
| | - Yung-Kyun Noh
- Department of Computer Science, Hanyang University, Seoul 04763, Korea
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul 02455, Korea
| | - Keehyoung Joo
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 02455, Korea
| |
Collapse
|
18
|
Oda T. Improving protein structure prediction with extended sequence similarity searches and deep-learning-based refinement in CASP15. Proteins 2023; 91:1712-1723. [PMID: 37485822 DOI: 10.1002/prot.26551] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 06/23/2023] [Accepted: 06/28/2023] [Indexed: 07/25/2023]
Abstract
The human predictor team PEZYFoldings got first place with the assessor's formulae (3rd place with Global Distance Test Total Score [GDT-TS]) in the single-domain category and 10th place in the multimer category in Critical Assessment of Structure Prediction 15. In this paper, I describe the exact method used by PEZYFoldings in the competition. As AlphaFold2 and AlphaFold-Multimer, developed by DeepMind, were state-of-the-art structure prediction tools, it was assumed that enhancing the input and output of the tools was an effective strategy to obtain the highest accuracy for structure prediction. Therefore, I used additional tools and databases to collect evolutionarily related sequences and introduced a deep-learning-based model in the refinement step. In addition to these modifications, manual interventions were performed to address various tasks. Detailed analyses were performed after the competition to identify the main contributors to performance. Comparing the number of evolutionarily related sequences I used with those of the other teams that provided AlphaFold2's baseline predictions revealed that an extensive sequence similarity search was one of the main contributors. Nonetheless, there were specific targets for which I could not identify any evolutionarily related sequences, resulting in my inability to construct accurate structures for these targets. Notably, I noticed that I had gained large Z-scores with the subunits of H1137, for which I performed manual domain parsing considering the interfaces between the subunits. This finding implies that the manual intervention contributed to my performance. The influence of the refinement model on the accuracy of structure prediction was minimal. I could have predicted structures with a similar level of accuracy without employing the refinement model. However, from the perspective of accuracy self-estimate, many structures demonstrated improvement after refinement. This improvement likely had a substantial influence on improving my position in the assessor's formulae rankings. These results highlight the opportunities for improvement in (1) multimer prediction, (2) building of larger and more diverse databases, and (3) developing tools to predict structures from primary sequences alone. In addition, transferring the manual intervention process to automation is a future concern.
Collapse
|
19
|
Zheng W, Wuyun Q, Freddolino PL, Zhang Y. Integrating deep learning, threading alignments, and a multi-MSA strategy for high-quality protein monomer and complex structure prediction in CASP15. Proteins 2023; 91:1684-1703. [PMID: 37650367 PMCID: PMC10840719 DOI: 10.1002/prot.26585] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 08/04/2023] [Accepted: 08/14/2023] [Indexed: 09/01/2023]
Abstract
We report the results of the "UM-TBM" and "Zheng" groups in CASP15 for protein monomer and complex structure prediction. These prediction sets were obtained using the D-I-TASSER and DMFold-Multimer algorithms, respectively. For monomer structure prediction, D-I-TASSER introduced four new features during CASP15: (i) a multiple sequence alignment (MSA) generation protocol that combines multi-source MSA searching and a structural modeling-based MSA ranker; (ii) attention-network based spatial restraints; (iii) a multi-domain module containing domain partition and arrangement for domain-level templates and spatial restraints; (iv) an optimized I-TASSER-based folding simulation system for full-length model creation guided by a combination of deep learning restraints, threading alignments, and knowledge-based potentials. For 47 free modeling targets in CASP15, the final models predicted by D-I-TASSER showed average TM-score 19% higher than the standard AlphaFold2 program. We thus showed that traditional Monte Carlo-based folding simulations, when appropriately coupled with deep learning algorithms, can generate models with improved accuracy over end-to-end deep learning methods alone. For protein complex structure prediction, DMFold-Multimer generated models by integrating a new MSA generation algorithm (DeepMSA2) with the end-to-end modeling module from AlphaFold2-Multimer. For the 38 complex targets, DMFold-Multimer generated models with an average TM-score of 0.83 and Interface Contact Score of 0.60, both significantly higher than those of competing complex prediction tools. Our analyses on complexes highlighted the critical role played by MSA generating, ranking, and pairing in protein complex structure prediction. We also discuss future room for improvement in the areas of viral protein modeling and complex model ranking.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Peter L Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Computer Science, School of Computing, National University of Singapore, 117417 Singapore
- Cancer Science Institute of Singapore, National University of Singapore, 117599, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 117596, Singapore
| |
Collapse
|
20
|
Ribeiro AJM, Riziotis IG, Borkakoti N, Thornton JM. Enzyme function and evolution through the lens of bioinformatics. Biochem J 2023; 480:1845-1863. [PMID: 37991346 PMCID: PMC10754289 DOI: 10.1042/bcj20220405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 11/09/2023] [Accepted: 11/14/2023] [Indexed: 11/23/2023]
Abstract
Enzymes have been shaped by evolution over billions of years to catalyse the chemical reactions that support life on earth. Dispersed in the literature, or organised in online databases, knowledge about enzymes can be structured in distinct dimensions, either related to their quality as biological macromolecules, such as their sequence and structure, or related to their chemical functions, such as the catalytic site, kinetics, mechanism, and overall reaction. The evolution of enzymes can only be understood when each of these dimensions is considered. In addition, many of the properties of enzymes only make sense in the light of evolution. We start this review by outlining the main paradigms of enzyme evolution, including gene duplication and divergence, convergent evolution, and evolution by recombination of domains. In the second part, we overview the current collective knowledge about enzymes, as organised by different types of data and collected in several databases. We also highlight some increasingly powerful computational tools that can be used to close gaps in understanding, in particular for types of data that require laborious experimental protocols. We believe that recent advances in protein structure prediction will be a powerful catalyst for the prediction of binding, mechanism, and ultimately, chemical reactions. A comprehensive mapping of enzyme function and evolution may be attainable in the near future.
Collapse
Affiliation(s)
- Antonio J. M. Ribeiro
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Ioannis G. Riziotis
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Neera Borkakoti
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Janet M. Thornton
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| |
Collapse
|
21
|
Qi H, Wang T, Li H, Li C, Guan L, Liu W, Wang J, Lu F, Mao S, Qin HM. Sequence- and Structure-Based Mining of Thermostable D-Allulose 3-Epimerase and Computer-Guided Protein Engineering To Improve Enzyme Activity. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2023; 71:18431-18442. [PMID: 37970673 DOI: 10.1021/acs.jafc.3c07204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2023]
Abstract
D-Allulose, a functional sweetener, can be synthesized from fructose using D-allulose 3-epimerase (DAEase). Nevertheless, a majority of the reported DAEases have inadequate stability under harsh industrial reaction conditions, which greatly limits their practical applications. In this study, big data mining combined with a computer-guided free energy calculation strategy was employed to discover a novel DAEase with excellent thermostability. Consensus sequence analysis of flexible regions and comparison of binding energies after substrate docking were performed using phylogeny-guided big data analyses. TtDAE from Thermogutta terrifontis was the most thermostable among 358 candidate enzymes, with a half-life of 32 h at 70 °C. Subsequently, structure-guided virtual screening and a customized strategy based on a combinatorial active-site saturation test/iterative saturation mutagenesis were utilized to engineer TtDAE. Finally, the catalytic activity of the M4 variant (P105A/L14C/T63G/I65A) was increased by 5.12-fold. Steered molecular dynamics simulations indicated that M4 had an enlarged substrate-binding pocket, which enhanced the fit between the enzyme and the substrate. The approach presented here, combining DAEases mining with further rational modification, provides guidance for obtaining promising catalysts for industrial-scale production.
Collapse
Affiliation(s)
- Hongbin Qi
- Key Laboratory of Industrial Fermentation Microbiology of the Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, National Engineering Laboratory for Industrial Enzymes, Tianjin 300457, China
| | - Tong Wang
- Key Laboratory of Industrial Fermentation Microbiology of the Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, National Engineering Laboratory for Industrial Enzymes, Tianjin 300457, China
| | - Huimin Li
- Key Laboratory of Industrial Fermentation Microbiology of the Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, National Engineering Laboratory for Industrial Enzymes, Tianjin 300457, China
| | - Chao Li
- Key Laboratory of Industrial Fermentation Microbiology of the Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, National Engineering Laboratory for Industrial Enzymes, Tianjin 300457, China
| | - Lijun Guan
- Institute of Food Processing, Heilongjiang Academy of Agricultural Sciences, Harbin 150086, China
| | - Weidong Liu
- Industrial Enzymes National Engineering Laboratory, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
| | - Jianwen Wang
- Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo 113-8657, Japan
| | - Fuping Lu
- Key Laboratory of Industrial Fermentation Microbiology of the Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, National Engineering Laboratory for Industrial Enzymes, Tianjin 300457, China
| | - Shuhong Mao
- Key Laboratory of Industrial Fermentation Microbiology of the Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, National Engineering Laboratory for Industrial Enzymes, Tianjin 300457, China
| | - Hui-Min Qin
- Key Laboratory of Industrial Fermentation Microbiology of the Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, College of Biotechnology, Tianjin University of Science and Technology, National Engineering Laboratory for Industrial Enzymes, Tianjin 300457, China
| |
Collapse
|
22
|
Azman AT, Mohd Isa NS, Mohd Zin Z, Abdullah MAA, Aidat O, Zainol MK. Protein Hydrolysate from Underutilized Legumes: Unleashing the Potential for Future Functional Foods. Prev Nutr Food Sci 2023; 28:209-223. [PMID: 37842256 PMCID: PMC10567599 DOI: 10.3746/pnf.2023.28.3.209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 06/26/2023] [Accepted: 07/07/2023] [Indexed: 10/17/2023] Open
Abstract
Proteins play a vital role in human development, growth, and overall health. Traditionally, animal-derived proteins were considered the primary source of dietary protein. However, in recent years, there has been a remarkable shift in dietary consumption patterns, with a growing preference for plant-based protein sources. This shift has resulted in a significant increase in the production of plant proteins in the food sector. Consequently, there has been a surge in research exploring various plant sources, particularly wild, and underutilized legumes such as Canavalia, Psophocarpus, Cajanus, Lablab, Phaseolus, and Vigna, due to their exceptional nutraceutical value. This review presents the latest insights into innovative approaches used to extract proteins from underutilized legumes. Furthermore, it highlights the purification of protein hydrolysate using Fast Protein Liquid Chromatography. This review also covers the characterization of purified peptides, including their molecular weight, amino acid composition, and the creation of three-dimensional models based on amino acid sequences. The potential of underutilized legume protein hydrolysates as functional ingredients in the food industry is a key focus of this review. By incorporating these protein sources into food production, we can foster sustainable and healthy practices while minimizing environmental impact. The investigation of underutilized legumes offers exciting possibilities for future research and development in this area, further enhancing the utilization of plant-based protein sources.
Collapse
Affiliation(s)
- Ain Tasnim Azman
- Faculty of Fisheries and Food Science, Universiti Malaysia Terengganu, Kuala Nerus, Terengganu 21030, Malaysia
| | - Nur Suaidah Mohd Isa
- Faculty of Fisheries and Food Science, Universiti Malaysia Terengganu, Kuala Nerus, Terengganu 21030, Malaysia
| | - Zamzahaila Mohd Zin
- Faculty of Fisheries and Food Science, Universiti Malaysia Terengganu, Kuala Nerus, Terengganu 21030, Malaysia
| | - Mohd Aidil Adhha Abdullah
- Faculty of Science and Marine Environment, Universiti Malaysia Terengganu, Kuala Nerus, Terengganu 21030, Malaysia
| | - Omaima Aidat
- Laboratory of Food Technology and Nutrition, Abdelhamid Ibn Badis University, Mostaganem 27000, Algeria
| | - Mohamad Khairi Zainol
- Faculty of Fisheries and Food Science, Universiti Malaysia Terengganu, Kuala Nerus, Terengganu 21030, Malaysia
| |
Collapse
|
23
|
Li J, Kang G, Wang J, Yuan H, Wu Y, Meng S, Wang P, Zhang M, Wang Y, Feng Y, Huang H, de Marco A. Affinity maturation of antibody fragments: A review encompassing the development from random approaches to computational rational optimization. Int J Biol Macromol 2023; 247:125733. [PMID: 37423452 DOI: 10.1016/j.ijbiomac.2023.125733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 07/04/2023] [Accepted: 07/06/2023] [Indexed: 07/11/2023]
Abstract
Routinely screened antibody fragments usually require further in vitro maturation to achieve the desired biophysical properties. Blind in vitro strategies can produce improved ligands by introducing random mutations into the original sequences and selecting the resulting clones under more and more stringent conditions. Rational approaches exploit an alternative perspective that aims first at identifying the specific residues potentially involved in the control of biophysical mechanisms, such as affinity or stability, and then to evaluate what mutations could improve those characteristics. The understanding of the antigen-antibody interactions is instrumental to develop this process the reliability of which, consequently, strongly depends on the quality and completeness of the structural information. Recently, methods based on deep learning approaches critically improved the speed and accuracy of model building and are promising tools for accelerating the docking step. Here, we review the features of the available bioinformatic instruments and analyze the reports illustrating the result obtained with their application to optimize antibody fragments, and nanobodies in particular. Finally, the emerging trends and open questions are summarized.
Collapse
Affiliation(s)
- Jiaqi Li
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China; Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China
| | - Guangbo Kang
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China; Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China
| | - Jiewen Wang
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China; Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China
| | - Haibin Yuan
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China; Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China
| | - Yili Wu
- Zhejiang Provincial Clinical Research Center for Mental Disorders, School of Mental Health and the Affiliated Kangning Hospital, Institute of Aging, Key Laboratory of Alzheimer's Disease of Zhejiang Province, Wenzhou Medical University, Oujiang Laboratory, Wenzhou, Zhejiang 325035, China
| | - Shuxian Meng
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China
| | - Ping Wang
- New Technology R&D Department, Tianjin Modern Innovative TCM Technology Company Limited, Tianjin 300392, China
| | - Miao Zhang
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China; China Resources Biopharmaceutical Company Limited, Beijing 100029, China
| | - Yuli Wang
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China; Tianjin Pharmaceutical Da Ren Tang Group Corporation Limited, Traditional Chinese Pharmacy Research Institute, Tianjin Key Laboratory of Quality Control in Chinese Medicine, Tianjin 300457, China; State Key Laboratory of Drug Delivery Technology and Pharmacokinetics, Tianjin Institute of Pharmaceutical Research, Tianjin 300193, China
| | - Yuanhang Feng
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China
| | - He Huang
- School of Chemical Engineering and Technology, Tianjin University, Tianjin 300350, China; Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China.
| | - Ario de Marco
- Laboratory for Environmental and Life Sciences, University of Nova Gorica, Nova Gorica, Slovenia.
| |
Collapse
|
24
|
Mani H, Chang CC, Hsu HJ, Yang CH, Yen JH, Liou JW. Comparison, Analysis, and Molecular Dynamics Simulations of Structures of a Viral Protein Modeled Using Various Computational Tools. Bioengineering (Basel) 2023; 10:1004. [PMID: 37760106 PMCID: PMC10525864 DOI: 10.3390/bioengineering10091004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 08/16/2023] [Accepted: 08/22/2023] [Indexed: 09/29/2023] Open
Abstract
The structural analysis of proteins is a major domain of biomedical research. Such analysis requires resolved three-dimensional structures of proteins. Advancements in computer technology have led to progress in biomedical research. In silico prediction and modeling approaches have facilitated the construction of protein structures, with or without structural templates. In this study, we used three neural network-based de novo modeling approaches-AlphaFold2 (AF2), Robetta-RoseTTAFold (Robetta), and transform-restrained Rosetta (trRosetta)-and two template-based tools-the Molecular Operating Environment (MOE) and iterative threading assembly refinement (I-TASSER)-to construct the structure of a viral capsid protein, hepatitis C virus core protein (HCVcp), whose structure have not been fully resolved by laboratory techniques. Templates with sufficient sequence identity for the homology modeling of complete HCVcp are currently unavailable. Therefore, we performed domain-based homology modeling for MOE simulations. The templates for each domain were obtained through sequence-based searches on NCBI and the Protein Data Bank. Then, the modeled domains were assembled to construct the complete structure of HCVcp. The full-length structure and two truncated forms modeled using various computational tools were compared. Molecular dynamics (MD) simulations were performed to refine the structures. The root mean square deviation of backbone atoms, root mean square fluctuation of Cα atoms, and radius of gyration were calculated to monitor structural changes and convergence in the simulations. The model quality was evaluated through ERRAT and phi-psi plot analysis. In terms of the initial prediction for protein modeling, Robetta and trRosetta outperformed AF2. Regarding template-based tools, MOE outperformed I-TASSER. MD simulations resulted in compactly folded protein structures, which were of good quality and theoretically accurate. Thus, the predicted structures of certain proteins must be refined to obtain reliable structural models. MD simulation is a promising tool for this purpose.
Collapse
Affiliation(s)
- Hemalatha Mani
- Institute of Medical Sciences, Tzu Chi University, Hualien 97004, Taiwan
| | - Chun-Chun Chang
- Department of Laboratory Medicine, Hualien Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation, Hualien 97004, Taiwan
- Department of Laboratory Medicine and Biotechnology, Tzu Chi University, Hualien 97004, Taiwan
| | - Hao-Jen Hsu
- Department of Biomedical Sciences and Engineering, Tzu Chi University, Hualien 97004, Taiwan
| | - Chin-Hao Yang
- Department of Biochemistry, School of Medicine, Tzu Chi University, Hualien 97004, Taiwan
| | - Jui-Hung Yen
- Department of Molecular Biology and Human Genetics, Tzu Chi University, Hualien 97004, Taiwan
| | - Je-Wen Liou
- Institute of Medical Sciences, Tzu Chi University, Hualien 97004, Taiwan
- Department of Laboratory Medicine and Biotechnology, Tzu Chi University, Hualien 97004, Taiwan
- Department of Biochemistry, School of Medicine, Tzu Chi University, Hualien 97004, Taiwan
| |
Collapse
|
25
|
Lee S, Kim G, Karin EL, Mirdita M, Park S, Chikhi R, Babaian A, Kryshtafovych A, Steinegger M. Petascale Homology Search for Structure Prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.10.548308. [PMID: 37503235 PMCID: PMC10369885 DOI: 10.1101/2023.07.10.548308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
The recent CASP15 competition highlighted the critical role of multiple sequence alignments (MSAs) in protein structure prediction, as demonstrated by the success of the top AlphaFold2-based prediction methods. To push the boundaries of MSA utilization, we conducted a petabase-scale search of the Sequence Read Archive (SRA), resulting in gigabytes of aligned homologs for CASP15 targets. These were merged with default MSAs produced by ColabFold-search and provided to ColabFold-predict. By using SRA data, we achieved highly accurate predictions (GDT_TS > 70) for 66% of the non-easy targets, whereas using ColabFold-search default MSAs scored highly in only 52%. Next, we tested the effect of deep homology search and ColabFold's advanced features, such as more recycles, on prediction accuracy. While SRA homologs were most significant for improving ColabFold's CASP15 ranking from 11th to 3rd place, other strategies contributed too. We analyze these in the context of existing strategies to improve prediction.
Collapse
Affiliation(s)
- Sewon Lee
- School of Biological Sciences, Seoul National University, Seoul 08826, South Korea
| | - Gyuri Kim
- School of Biological Sciences, Seoul National University, Seoul 08826, South Korea
| | | | - Milot Mirdita
- School of Biological Sciences, Seoul National University, Seoul 08826, South Korea
| | - Sukhwan Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, South Korea
| | - Rayan Chikhi
- Institut Pasteur, Université Paris Cité, G5 Sequence Bioinformatics, 75015 Paris, France
| | - Artem Babaian
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | | | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul 08826, South Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, South Korea
- Artificial Intelligence Institute, Seoul National University, Seoul 08826, South Korea
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul 08826, South Korea
| |
Collapse
|
26
|
Sinha A, Sangeet S, Roy S. Evolution of Sequence and Structure of SARS-CoV-2 Spike Protein: A Dynamic Perspective. ACS OMEGA 2023; 8:23283-23304. [PMID: 37426203 PMCID: PMC10324094 DOI: 10.1021/acsomega.3c00944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Accepted: 06/01/2023] [Indexed: 07/11/2023]
Abstract
Novel coronavirus (SARS-CoV-2) enters its host cell through a surface spike protein. The viral spike protein has undergone several modifications/mutations at the genomic level, through which it modulated its structure-function and passed through several variants of concern. Recent advances in high-resolution structure determination and multiscale imaging techniques, cost-effective next-generation sequencing, and development of new computational methods (including information theory, statistical methods, machine learning, and many other artificial intelligence-based techniques) have hugely contributed to the characterization of sequence, structure, function of spike proteins, and its different variants to understand viral pathogenesis, evolutions, and transmission. Laying on the foundation of the sequence-structure-function paradigm, this review summarizes not only the important findings on structure/function but also the structural dynamics of different spike components, highlighting the effects of mutations on them. As dynamic fluctuations of three-dimensional spike structure often provide important clues for functional modulation, quantifying time-dependent fluctuations of mutational events over spike structure and its genetic/amino acidic sequence helps identify alarming functional transitions having implications for enhanced fusogenicity and pathogenicity of the virus. Although these dynamic events are more difficult to capture than quantifying a static, average property, this review encompasses those challenging aspects of characterizing the evolutionary dynamics of spike sequence and structure and their implications for functions.
Collapse
|
27
|
Spiers AJ, Dorfmueller HC, Jerdan R, McGregor J, Nicoll A, Steel K, Cameron S. Bioinformatics characterization of BcsA-like orphan proteins suggest they form a novel family of pseudomonad cyclic-β-glucan synthases. PLoS One 2023; 18:e0286540. [PMID: 37267309 PMCID: PMC10237404 DOI: 10.1371/journal.pone.0286540] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Accepted: 05/18/2023] [Indexed: 06/04/2023] Open
Abstract
Bacteria produce a variety of polysaccharides with functional roles in cell surface coating, surface and host interactions, and biofilms. We have identified an 'Orphan' bacterial cellulose synthase catalytic subunit (BcsA)-like protein found in four model pseudomonads, P. aeruginosa PA01, P. fluorescens SBW25, P. putida KT2440 and P. syringae pv. tomato DC3000. Pairwise alignments indicated that the Orphan and BcsA proteins shared less than 41% sequence identity suggesting they may not have the same structural folds or function. We identified 112 Orphans among soil and plant-associated pseudomonads as well as in phytopathogenic and human opportunistic pathogenic strains. The wide distribution of these highly conserved proteins suggest they form a novel family of synthases producing a different polysaccharide. In silico analysis, including sequence comparisons, secondary structure and topology predictions, and protein structural modelling, revealed a two-domain transmembrane ovoid-like structure for the Orphan protein with a periplasmic glycosyl hydrolase family GH17 domain linked via a transmembrane region to a cytoplasmic glycosyltransferase family GT2 domain. We suggest the GT2 domain synthesises β-(1,3)-glucan that is transferred to the GH17 domain where it is cleaved and cyclised to produce cyclic-β-(1,3)-glucan (CβG). Our structural models are consistent with enzymatic characterisation and recent molecular simulations of the PaPA01 and PpKT2440 GH17 domains. It also provides a functional explanation linking PaPAK and PaPA14 Orphan (also known as NdvB) transposon mutants with CβG production and biofilm-associated antibiotic resistance. Importantly, cyclic glucans are also involved in osmoregulation, plant infection and induced systemic suppression, and our findings suggest this novel family of CβG synthases may provide similar range of adaptive responses for pseudomonads.
Collapse
Affiliation(s)
- Andrew J. Spiers
- School of Applied Sciences, Abertay University, Dundee, United Kingdom
| | - Helge C. Dorfmueller
- Division of Molecular Microbiology, School of Life Sciences, University of Dundee, Dundee, United Kingdom
| | - Robyn Jerdan
- School of Applied Sciences, Abertay University, Dundee, United Kingdom
| | - Jessica McGregor
- Nuffield Research Placement Students, School of Applied Sciences, Abertay University, Dundee, United Kingdom
| | - Abbie Nicoll
- Nuffield Research Placement Students, School of Applied Sciences, Abertay University, Dundee, United Kingdom
| | - Kenzie Steel
- Nuffield Research Placement Students, School of Applied Sciences, Abertay University, Dundee, United Kingdom
| | - Scott Cameron
- School of Applied Sciences, Abertay University, Dundee, United Kingdom
| |
Collapse
|
28
|
Zheng LE, Barethiya S, Nordquist E, Chen J. Machine Learning Generation of Dynamic Protein Conformational Ensembles. Molecules 2023; 28:4047. [PMID: 37241789 PMCID: PMC10220786 DOI: 10.3390/molecules28104047] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 05/04/2023] [Accepted: 05/09/2023] [Indexed: 05/28/2023] Open
Abstract
Machine learning has achieved remarkable success across a broad range of scientific and engineering disciplines, particularly its use for predicting native protein structures from sequence information alone. However, biomolecules are inherently dynamic, and there is a pressing need for accurate predictions of dynamic structural ensembles across multiple functional levels. These problems range from the relatively well-defined task of predicting conformational dynamics around the native state of a protein, which traditional molecular dynamics (MD) simulations are particularly adept at handling, to generating large-scale conformational transitions connecting distinct functional states of structured proteins or numerous marginally stable states within the dynamic ensembles of intrinsically disordered proteins. Machine learning has been increasingly applied to learn low-dimensional representations of protein conformational spaces, which can then be used to drive additional MD sampling or directly generate novel conformations. These methods promise to greatly reduce the computational cost of generating dynamic protein ensembles, compared to traditional MD simulations. In this review, we examine recent progress in machine learning approaches towards generative modeling of dynamic protein ensembles and emphasize the crucial importance of integrating advances in machine learning, structural data, and physical principles to achieve these ambitious goals.
Collapse
Affiliation(s)
- Li-E Zheng
- Department of Gynecology, The First Affiliated Hospital of Fujian Medical University, Fuzhou 350005, China;
| | - Shrishti Barethiya
- Department of Chemistry, University of Massachusetts Amherst, Amherst, MA 01003, USA; (S.B.); (E.N.)
| | - Erik Nordquist
- Department of Chemistry, University of Massachusetts Amherst, Amherst, MA 01003, USA; (S.B.); (E.N.)
| | - Jianhan Chen
- Department of Chemistry, University of Massachusetts Amherst, Amherst, MA 01003, USA; (S.B.); (E.N.)
| |
Collapse
|
29
|
Wodak SJ, Vajda S, Lensink MF, Kozakov D, Bates PA. Critical Assessment of Methods for Predicting the 3D Structure of Proteins and Protein Complexes. Annu Rev Biophys 2023; 52:183-206. [PMID: 36626764 PMCID: PMC10885158 DOI: 10.1146/annurev-biophys-102622-084607] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Advances in a scientific discipline are often measured by small, incremental steps. In this review, we report on two intertwined disciplines in the protein structure prediction field, modeling of single chains and modeling of complexes, that have over decades emulated this pattern, as monitored by the community-wide blind prediction experiments CASP and CAPRI. However, over the past few years, dramatic advances were observed for the accurate prediction of single protein chains, driven by a surge of deep learning methodologies entering the prediction field. We review the mainscientific developments that enabled these recent breakthroughs and feature the important role of blind prediction experiments in building up and nurturing the structure prediction field. We discuss how the new wave of artificial intelligence-based methods is impacting the fields of computational and experimental structural biology and highlight areas in which deep learning methods are likely to lead to future developments, provided that major challenges are overcome.
Collapse
Affiliation(s)
- Shoshana J Wodak
- VIB-VUB Center for Structural Biology, Vrije Universiteit Brussel, Brussels, Belgium;
| | - Sandor Vajda
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA;
- Department of Chemistry, Boston University, Boston, Massachusetts, USA
| | - Marc F Lensink
- Univ. Lille, CNRS, UMR 8576-UGSF-Unité de Glycobiologie Structurale et Fonctionnelle, Lille, France;
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA;
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
| | - Paul A Bates
- Biomolecular Modelling Laboratory, The Francis Crick Institute, London, United Kingdom;
| |
Collapse
|
30
|
Wu T, Guo Z, Cheng J. Atomic protein structure refinement using all-atom graph representations and SE(3)-equivariant graph transformer. Bioinformatics 2023; 39:btad298. [PMID: 37144951 PMCID: PMC10191610 DOI: 10.1093/bioinformatics/btad298] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Revised: 03/18/2023] [Accepted: 04/27/2023] [Indexed: 05/06/2023] Open
Abstract
MOTIVATION The state-of-art protein structure prediction methods such as AlphaFold are being widely used to predict structures of uncharacterized proteins in biomedical research. There is a significant need to further improve the quality and nativeness of the predicted structures to enhance their usability. In this work, we develop ATOMRefine, a deep learning-based, end-to-end, all-atom protein structural model refinement method. It uses a SE(3)-equivariant graph transformer network to directly refine protein atomic coordinates in a predicted tertiary structure represented as a molecular graph. RESULTS The method is first trained and tested on the structural models in AlphaFoldDB whose experimental structures are known, and then blindly tested on 69 CASP14 regular targets and 7 CASP14 refinement targets. ATOMRefine improves the quality of both backbone atoms and all-atom conformation of the initial structural models generated by AlphaFold. It also performs better than two state-of-the-art refinement methods in multiple evaluation metrics including an all-atom model quality score-the MolProbity score based on the analysis of all-atom contacts, bond length, atom clashes, torsion angles, and side-chain rotamers. As ATOMRefine can refine a protein structure quickly, it provides a viable, fast solution for improving protein geometry and fixing structural errors of predicted structures through direct coordinate refinement. AVAILABILITY AND IMPLEMENTATION The source code of ATOMRefine is available in the GitHub repository (https://github.com/BioinfoMachineLearning/ATOMRefine). All the required data for training and testing are available at https://doi.org/10.5281/zenodo.6944368.
Collapse
Affiliation(s)
- Tianqi Wu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, United States
| | - Zhiye Guo
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, United States
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, United States
| |
Collapse
|
31
|
Moazzeni A, Kheirandish M, Khamisipour G, Rahbarizadeh F. Directed targeting of B-cell maturation antigen-specific CAR T cells by bioinformatic approaches: From in-silico to in-vitro. Immunobiology 2023; 228:152376. [PMID: 37058845 DOI: 10.1016/j.imbio.2023.152376] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2022] [Revised: 02/13/2023] [Accepted: 03/05/2023] [Indexed: 04/16/2023]
Abstract
AIMS Chimeric Antigen Receptor (CAR) T-cell is a breakthrough in cancer immunotherapy. The primary step of successful CAR T cell therapy is designing a specific single-chain fragment variable (scFv). This study aims to verify the designed anti-BCMA (B cell maturation antigen) CAR using bioinformatic techniques with the following experimental evaluations. MAIN METHODS Following the second generation of anti-BCMA CAR designing, the protein structure, function prediction, physicochemical complementarity at the ligand-receptor interface, and biding sites analysis of anti-BCMA CAR construct were confirmed using different modeling and docking server, including Expasy, I-TASSER, HDock, and PyMOL software. To generate CAR T-cells, isolated T cells were transduced. Then, anti-BCMA CAR mRNA and its surface expression were confirmed by real-time -PCR and flow cytometry methods, respectively. To evaluate the surface expression of anti-BCMA CAR, anti-(Fab')2 and anti-CD8 antibodies were employed. Finally, anti-BCMA CAR T cells were co-cultured with BCMA+/- cell lines to assess the expression of CD69 and CD107a as activation and cytotoxicity markers. KEY FINDINGS In-silico results approved the suitable protein folding, perfect orientation, and correct locating of functional domains at the receptor-ligand binding site. The in-vitro results confirmed high expression of scFv (89 ± 1.15% (and CD8α (54 ± 2.88%). The expression of CD69 (91.97 ± 1.7%) and CD107a (92.05 ± 1.29%) were significantly increased, indicating appropriate activation and cytotoxicity. SIGNIFICANCE In-silico studies before experimental assessments are crucial for state-of-art CAR designing. Highly activation and cytotoxicity of anti-BCMA CAR T-cell revealed that our CAR construct methodology would be applicable to define the road map of CAR T cell therapy.
Collapse
Affiliation(s)
- Ali Moazzeni
- Immunology Department, Blood Transfusion Research Center, High Institute for Research and Education in Transfusion Medicine (IBTO), Tehran, Iran
| | - Maryam Kheirandish
- Immunology Department, Blood Transfusion Research Center, High Institute for Research and Education in Transfusion Medicine (IBTO), Tehran, Iran.
| | - Gholamreza Khamisipour
- Department of Hematology, Faculty of Allied Medicine, Bushehr University of Medical Sciences, Bushehr, Iran.
| | - Fatemeh Rahbarizadeh
- Department of Medical Biotechnology, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
| |
Collapse
|
32
|
Si Y, Yan C. Improved inter-protein contact prediction using dimensional hybrid residual networks and protein language models. Brief Bioinform 2023; 24:7033302. [PMID: 36759333 DOI: 10.1093/bib/bbad039] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2022] [Revised: 01/13/2023] [Accepted: 01/18/2023] [Indexed: 02/11/2023] Open
Abstract
The knowledge of contacting residue pairs between interacting proteins is very useful for the structural characterization of protein-protein interactions (PPIs). However, accurately identifying the tens of contacting ones from hundreds of thousands of inter-protein residue pairs is extremely challenging, and performances of the state-of-the-art inter-protein contact prediction methods are still quite limited. In this study, we developed a deep learning method for inter-protein contact prediction, which is referred to as DRN-1D2D_Inter. Specifically, we employed pretrained protein language models to generate structural information-enriched input features to residual networks formed by dimensional hybrid residual blocks to perform inter-protein contact prediction. Extensively bechmarking DRN-1D2D_Inter on multiple datasets, including both heteromeric PPIs and homomeric PPIs, we show DRN-1D2D_Inter consistently and significantly outperformed two state-of-the-art inter-protein contact prediction methods, including GLINTER and DeepHomo, although both the latter two methods leveraged the native structures of interacting proteins in the prediction, and DRN-1D2D_Inter made the prediction purely from sequences. We further show that applying the predicted contacts as constraints for protein-protein docking can significantly improve its performance for protein complex structure prediction.
Collapse
Affiliation(s)
- Yunda Si
- School of Physics, Huazhong University of Science and Technology, China
| | - Chengfei Yan
- School of Physics, Huazhong University of Science and Technology, China
| |
Collapse
|
33
|
Rosenkranz AA, Slastnikova TA. Prospects of Using Protein Engineering for Selective Drug Delivery into a Specific Compartment of Target Cells. Pharmaceutics 2023; 15:pharmaceutics15030987. [PMID: 36986848 PMCID: PMC10055131 DOI: 10.3390/pharmaceutics15030987] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2023] [Revised: 03/13/2023] [Accepted: 03/17/2023] [Indexed: 03/30/2023] Open
Abstract
A large number of proteins are successfully used to treat various diseases. These include natural polypeptide hormones, their synthetic analogues, antibodies, antibody mimetics, enzymes, and other drugs based on them. Many of them are demanded in clinical settings and commercially successful, mainly for cancer treatment. The targets for most of the aforementioned drugs are located at the cell surface. Meanwhile, the vast majority of therapeutic targets, which are usually regulatory macromolecules, are located inside the cell. Traditional low molecular weight drugs freely penetrate all cells, causing side effects in non-target cells. In addition, it is often difficult to elaborate a small molecule that can specifically affect protein interactions. Modern technologies make it possible to obtain proteins capable of interacting with almost any target. However, proteins, like other macromolecules, cannot, as a rule, freely penetrate into the desired cellular compartment. Recent studies allow us to design multifunctional proteins that solve these problems. This review considers the scope of application of such artificial constructs for the targeted delivery of both protein-based and traditional low molecular weight drugs, the obstacles met on the way of their transport to the specified intracellular compartment of the target cells after their systemic bloodstream administration, and the means to overcome those difficulties.
Collapse
Affiliation(s)
- Andrey A Rosenkranz
- Laboratory of Molecular Genetics of Intracellular Transport, Institute of Gene Biology of Russian Academy of Sciences, 34/5 Vavilov St., 119334 Moscow, Russia
- Department of Biophysics, Faculty of Biology, Lomonosov Moscow State University, 1-12 Leninskie Gory St., 119234 Moscow, Russia
| | - Tatiana A Slastnikova
- Laboratory of Molecular Genetics of Intracellular Transport, Institute of Gene Biology of Russian Academy of Sciences, 34/5 Vavilov St., 119334 Moscow, Russia
| |
Collapse
|
34
|
Yang Z, Zeng X, Zhao Y, Chen R. AlphaFold2 and its applications in the fields of biology and medicine. Signal Transduct Target Ther 2023; 8:115. [PMID: 36918529 PMCID: PMC10011802 DOI: 10.1038/s41392-023-01381-z] [Citation(s) in RCA: 115] [Impact Index Per Article: 115.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 12/27/2022] [Accepted: 02/16/2023] [Indexed: 03/16/2023] Open
Abstract
AlphaFold2 (AF2) is an artificial intelligence (AI) system developed by DeepMind that can predict three-dimensional (3D) structures of proteins from amino acid sequences with atomic-level accuracy. Protein structure prediction is one of the most challenging problems in computational biology and chemistry, and has puzzled scientists for 50 years. The advent of AF2 presents an unprecedented progress in protein structure prediction and has attracted much attention. Subsequent release of structures of more than 200 million proteins predicted by AF2 further aroused great enthusiasm in the science community, especially in the fields of biology and medicine. AF2 is thought to have a significant impact on structural biology and research areas that need protein structure information, such as drug discovery, protein design, prediction of protein function, et al. Though the time is not long since AF2 was developed, there are already quite a few application studies of AF2 in the fields of biology and medicine, with many of them having preliminarily proved the potential of AF2. To better understand AF2 and promote its applications, we will in this article summarize the principle and system architecture of AF2 as well as the recipe of its success, and particularly focus on reviewing its applications in the fields of biology and medicine. Limitations of current AF2 prediction will also be discussed.
Collapse
Affiliation(s)
- Zhenyu Yang
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Xiaoxi Zeng
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
| | - Yi Zhao
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China.
| | - Runsheng Chen
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Key Laboratory of RNA Biology, Center for Big Data Research in Health, Institute of Biophysics, Chinese Academy of Sciences, Beijing, 100101, China.
- Pingshan Translational Medicine Center, Shenzhen Bay Laboratory, Shenzhen, 518118, China.
| |
Collapse
|
35
|
Curti M, Maffeis V, Teixeira Alves Duarte LG, Shareef S, Hallado LX, Curutchet C, Romero E. Engineering excitonically coupled dimers in an artificial protein for light harvesting via computational modeling. Protein Sci 2023; 32:e4579. [PMID: 36715022 PMCID: PMC9951196 DOI: 10.1002/pro.4579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 01/23/2023] [Accepted: 01/25/2023] [Indexed: 01/31/2023]
Abstract
In photosynthesis, pigment-protein complexes achieve outstanding photoinduced charge separation efficiencies through a set of strategies in which excited states delocalization over multiple pigments ("excitons") and charge-transfer states play key roles. These concepts, and their implementation in bioinspired artificial systems, are attracting increasing attention due to the vast potential that could be tapped by realizing efficient photochemical reactions. In particular, de novo designed proteins provide a diverse structural toolbox that can be used to manipulate the geometric and electronic properties of bound chromophore molecules. However, achieving excitonic and charge-transfer states requires closely spaced chromophores, a non-trivial aspect since a strong binding with the protein matrix needs to be maintained. Here, we show how a general-purpose artificial protein can be optimized via molecular dynamics simulations to improve its binding capacity of a chlorophyll derivative, achieving complexes in which chromophores form two closely spaced and strongly interacting dimers. Based on spectroscopy results and computational modeling, we demonstrate each dimer is excitonically coupled, and propose they display signatures of charge-transfer state mixing. This work could open new avenues for the rational design of chromophore-protein complexes with advanced functionalities.
Collapse
Affiliation(s)
- Mariano Curti
- Institute of Chemical Research of Catalonia (ICIQ), Barcelona Institute of Science and Technology (BIST)TarragonaSpain
| | - Valentin Maffeis
- Institute of Chemical Research of Catalonia (ICIQ), Barcelona Institute of Science and Technology (BIST)TarragonaSpain
- Laboratoire de Chimie, UMR 5182, ENS Lyon, CNRSUniversité Lyon 1LyonFrance
| | | | - Saeed Shareef
- Institute of Chemical Research of Catalonia (ICIQ), Barcelona Institute of Science and Technology (BIST)TarragonaSpain
- Departament de Química Física i InorgànicaUniversitat Rovira i VirgiliTarragonaSpain
| | - Luisa Xiomara Hallado
- Institute of Chemical Research of Catalonia (ICIQ), Barcelona Institute of Science and Technology (BIST)TarragonaSpain
- Departament de Química Física i InorgànicaUniversitat Rovira i VirgiliTarragonaSpain
| | - Carles Curutchet
- Departament de Farmàcia i Tecnologia Farmacèutica i Fisicoquímica, Facultat de Farmàcia i Ciències de l'AlimentacióUniversitat de Barcelona (UB)BarcelonaSpain
- Institut de Química Teòrica i Computacional (IQTCUB), Universitat de Barcelona (UB)BarcelonaSpain
| | - Elisabet Romero
- Institute of Chemical Research of Catalonia (ICIQ), Barcelona Institute of Science and Technology (BIST)TarragonaSpain
| |
Collapse
|
36
|
Bertoline LMF, Lima AN, Krieger JE, Teixeira SK. Before and after AlphaFold2: An overview of protein structure prediction. FRONTIERS IN BIOINFORMATICS 2023; 3:1120370. [PMID: 36926275 PMCID: PMC10011655 DOI: 10.3389/fbinf.2023.1120370] [Citation(s) in RCA: 59] [Impact Index Per Article: 59.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 02/17/2023] [Indexed: 03/08/2023] Open
Abstract
Three-dimensional protein structure is directly correlated with its function and its determination is critical to understanding biological processes and addressing human health and life science problems in general. Although new protein structures are experimentally obtained over time, there is still a large difference between the number of protein sequences placed in Uniprot and those with resolved tertiary structure. In this context, studies have emerged to predict protein structures by methods based on a template or free modeling. In the last years, different methods have been combined to overcome their individual limitations, until the emergence of AlphaFold2, which demonstrated that predicting protein structure with high accuracy at unprecedented scale is possible. Despite its current impact in the field, AlphaFold2 has limitations. Recently, new methods based on protein language models have promised to revolutionize the protein structural biology allowing the discovery of protein structure and function only from evolutionary patterns present on protein sequence. Even though these methods do not reach AlphaFold2 accuracy, they already covered some of its limitations, being able to predict with high accuracy more than 200 million proteins from metagenomic databases. In this mini-review, we provide an overview of the breakthroughs in protein structure prediction before and after AlphaFold2 emergence.
Collapse
|
37
|
Non-synonymous variation and protein structure of candidate genes associated with selection in farm and wild populations of turbot (Scophthalmus maximus). Sci Rep 2023; 13:3019. [PMID: 36810752 PMCID: PMC9944912 DOI: 10.1038/s41598-023-29826-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 02/10/2023] [Indexed: 02/24/2023] Open
Abstract
Non-synonymous variation (NSV) of protein coding genes represents raw material for selection to improve adaptation to the diverse environmental scenarios in wild and livestock populations. Many aquatic species face variations in temperature, salinity and biological factors throughout their distribution range that is reflected by the presence of allelic clines or local adaptation. The turbot (Scophthalmus maximus) is a flatfish of great commercial value with a flourishing aquaculture which has promoted the development of genomic resources. In this study, we developed the first atlas of NSVs in the turbot genome by resequencing 10 individuals from Northeast Atlantic Ocean. More than 50,000 NSVs where detected in the ~ 21,500 coding genes of the turbot genome, and we selected 18 NSVs to be genotyped using a single Mass ARRAY multiplex on 13 wild populations and three turbot farms. We detected signals of divergent selection on several genes related to growth, circadian rhythms, osmoregulation and oxygen binding in the different scenarios evaluated. Furthermore, we explored the impact of NSVs identified on the 3D structure and functional relationship of the correspondent proteins. In summary, our study provides a strategy to identify NSVs in species with consistently annotated and assembled genomes to ascertain their role in adaptation.
Collapse
|
38
|
Guzzi PH, di Paola L, Puccio B, Lomoio U, Giuliani A, Veltri P. Computational analysis of the sequence-structure relation in SARS-CoV-2 spike protein using protein contact networks. Sci Rep 2023; 13:2837. [PMID: 36808182 PMCID: PMC9936485 DOI: 10.1038/s41598-023-30052-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 02/15/2023] [Indexed: 02/19/2023] Open
Abstract
The structure of proteins impacts directly on the function they perform. Mutations in the primary sequence can provoke structural changes with consequent modification of functional properties. SARS-CoV-2 proteins have been extensively studied during the pandemic. This wide dataset, related to sequence and structure, has enabled joint sequence-structure analysis. In this work, we focus on the SARS-CoV-2 S (Spike) protein and the relations between sequence mutations and structure variations, in order to shed light on the structural changes stemming from the position of mutated amino acid residues in three different SARS-CoV-2 strains. We propose the use of protein contact network (PCN) formalism to: (i) obtain a global metric space and compare various molecular entities, (ii) give a structural explanation of the observed phenotype, and (iii) provide context dependent descriptors of single mutations. PCNs have been used to compare sequence and structure of the Alpha, Delta, and Omicron SARS-CoV-2 variants, and we found that omicron has a unique mutational pattern leading to different structural consequences from mutations of other strains. The non-random distribution of changes in network centrality along the chain has allowed to shed light on the structural (and functional) consequences of mutations.
Collapse
Affiliation(s)
- Pietro Hiram Guzzi
- Department of Surgical and Medical Sciences, Magna Graecia University of Catanzaro, Catanzaro, Italy.
| | - Luisa di Paola
- grid.9657.d0000 0004 1757 5329Unit of Chemical-Physics Fundamentals in Chemical Engineering, Department of Engineering, Universita Campus Bio-Medico di Roma, via Alvaro del Portillo 21, 00128 Rome, Italy
| | - Barbara Puccio
- grid.411489.10000 0001 2168 2547Department of Surgical and Medical Sciences, Magna Graecia University of Catanzaro, Catanzaro, Italy
| | - Ugo Lomoio
- grid.411489.10000 0001 2168 2547Department of Surgical and Medical Sciences, Magna Graecia University of Catanzaro, Catanzaro, Italy
| | - Alessandro Giuliani
- grid.416651.10000 0000 9120 6856Environment and Health Department, Istituto Superiore di Sanita, Rome, Italy
| | - Pierangelo Veltri
- grid.411489.10000 0001 2168 2547Department of Surgical and Medical Sciences, Magna Graecia University of Catanzaro, Catanzaro, Italy ,grid.7778.f0000 0004 1937 0319Department of Computer, Modeling, Electronics and System Engineering, University of Calabria, Rende, Italy
| |
Collapse
|
39
|
Pandya N, Kumar A. An immunoinformatics analysis: design of a multi-epitope vaccine against Cryptosporidium hominis by employing heat shock protein triggers the innate and adaptive immune responses. J Biomol Struct Dyn 2023; 41:13563-13579. [PMID: 36764824 DOI: 10.1080/07391102.2023.2175373] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Accepted: 01/28/2023] [Indexed: 02/12/2023]
Abstract
Cryptosporidium hominis, an anthropologically transferred species in the Cryptosporidium genus, represents many clinical studies in several countries. Its growth in the recent decade is primarily owing to epidemiologic studies. This parasite has complicated life cycles that require differentiation through a variety of phases of development and passage across two or more hosts throughout their lifetimes. As they move from host to host and environment to environment, pathogenic organisms are continually exposed to unexpected changes in the circumstances under which they develop. Heat shock proteins (HSPs) are targets of the host immune response; they are involved in the progression of diseases and play a significant part in this process. It has been discovered that the immunodominant immunogenic antigens in parasite infections HSPs. In this study, we have generated a multi-epitope vaccine against Cryptosporidium hominis (C. hominis) by using heat shock proteins. The epitopes that were selected had a substantial binding affinity for the B- and T-cell reference set of alleles, a high antigenicity score, a nature that was not allergic, a high solubility, non-toxicity and good binders. The epitopes were incorporated into a chimeric vaccine by using appropriate linkers. In order to increase the immunogenicity of the connected epitopes and effectively activate both innate and adaptive immunity, an adjuvant was attached to the epitopes. We have also analyzed the physiochemical characteristics of the vaccine which were satisfactory and then lead to the development of a 3D model. In addition, the binding confirmation of the vaccine to the TLR-4 innate immune receptor was also determined using molecular docking and molecular dynamics (MD) simulation. The results of this simulation show that the vaccine has a strong binding affinity for TLR4, which indicates that the vaccine is highly effective. In general, the vaccine that has been described here has a good potential for inducing protective and targeted immunogenicity, however, this hypothesis is contingent upon more experimental testing.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Nirali Pandya
- Department of Chemistry, National University of Singapore, Singapore, Singapore
| | - Amit Kumar
- Department of Biosciences and Biomedical Engineering, Indian Institute of Technology Indore, Indore, Madhya Pradesh, India
| |
Collapse
|
40
|
Nallasamy V, Seshiah M. Energy Profile Bayes and Thompson Optimized Convolutional Neural Network protein structure prediction. Neural Comput Appl 2023; 35:1983-2006. [PMID: 36245797 PMCID: PMC9542649 DOI: 10.1007/s00521-022-07868-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 09/21/2022] [Indexed: 01/12/2023]
Abstract
In living organisms, proteins are considered as the executants of biological functions. Owing to its pivotal role played in protein folding patterns, comprehension of protein structure is a challenging issue. Moreover, owing to numerous protein sequence exploration in protein data banks and complication of protein structures, experimental methods are found to be inadequate for protein structural class prediction. Hence, it is very much advantageous to design a reliable computational method to predict protein structural classes from protein sequences. In the recent few years there has been an elevated interest in using deep learning to assist protein structure prediction as protein structure prediction models can be utilized to screen a large number of novel sequences. In this regard, we propose a model employing Energy Profile for atom pairs in conjunction with the Legion-Class Bayes function called Energy Profile Legion-Class Bayes Protein Structure Identification model. Followed by this, we use a Thompson Optimized convolutional neural network to extract features between amino acids and then the Thompson Optimized SoftMax function is employed to extract associations between protein sequences for predicting secondary protein structure. The proposed Energy Profile Bayes and Thompson Optimized Convolutional Neural Network (EPB-OCNN) method tested distinct unique protein data and was compared to the state-of-the-art methods, the Template-Based Modeling, Protein Design using Deep Graph Neural Networks, a deep learning-based S-glutathionylation sites prediction tool called a Computational Framework, the Deep Learning and a distance-based protein structure prediction using deep learning. The results obtained when applied with the Biopython tool with respect to protein structure prediction time, protein structure prediction accuracy, specificity, recall, F-measure, and precision, respectively, are measured. The proposed EPB-OCNN method outperformed the state-of-the-art methods, thereby corroborating the objective.
Collapse
Affiliation(s)
- Varanavasi Nallasamy
- Cognizant Technology Solutions Pvt. Ltd, CHIL SEZ IT Park, Keeranatham, Saravanam Patti, Coimbatore, Tamil Nadu 641035 India
| | - Malarvizhi Seshiah
- Department of Computer Science, Thiruvalluvar Government Arts College, Rasipuram, Namakkal, Tamil Nadu India
| |
Collapse
|
41
|
Miller NL, Clark T, Raman R, Sasisekharan R. Learned features of antibody-antigen binding affinity. Front Mol Biosci 2023; 10:1112738. [PMID: 36895805 PMCID: PMC9989197 DOI: 10.3389/fmolb.2023.1112738] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 01/18/2023] [Indexed: 02/23/2023] Open
Abstract
Defining predictors of antigen-binding affinity of antibodies is valuable for engineering therapeutic antibodies with high binding affinity to their targets. However, this task is challenging owing to the huge diversity in the conformations of the complementarity determining regions of antibodies and the mode of engagement between antibody and antigen. In this study, we used the structural antibody database (SAbDab) to identify features that can discriminate high- and low-binding affinity across a 5-log scale. First, we abstracted features based on previously learned representations of protein-protein interactions to derive 'complex' feature sets, which include energetic, statistical, network-based, and machine-learned features. Second, we contrasted these complex feature sets with additional 'simple' feature sets based on counts of contacts between antibody and antigen. By investigating the predictive potential of 700 features contained in the eight complex and simple feature sets, we observed that simple feature sets perform comparably to complex feature sets in classification of binding affinity. Moreover, combining features from all eight feature-sets provided the best classification performance (median cross-validation AUROC and F1-score of 0.72). Of note, classification performance is substantially improved when several sources of data leakage (e.g., homologous antibodies) are not removed from the dataset, emphasizing a potential pitfall in this task. We additionally observe a classification performance plateau across diverse featurization approaches, highlighting the need for additional affinity-labeled antibody-antigen structural data. The findings from our present study set the stage for future studies aimed at multiple-log enhancement of antibody affinity through feature-guided engineering.
Collapse
Affiliation(s)
- Nathaniel L Miller
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, United States.,Koch Institute of Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, United States
| | - Thomas Clark
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, United States.,Koch Institute of Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, United States
| | - Rahul Raman
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, United States.,Koch Institute of Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, United States
| | - Ram Sasisekharan
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, United States.,Koch Institute of Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, United States
| |
Collapse
|
42
|
Kennedy EN, Foster CA, Barr SA, Bourret RB. General strategies for using amino acid sequence data to guide biochemical investigation of protein function. Biochem Soc Trans 2022; 50:1847-1858. [PMID: 36416676 PMCID: PMC10257402 DOI: 10.1042/bst20220849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 11/04/2022] [Accepted: 11/09/2022] [Indexed: 11/24/2022]
Abstract
The rapid increase of '-omics' data warrants the reconsideration of experimental strategies to investigate general protein function. Studying individual members of a protein family is likely insufficient to provide a complete mechanistic understanding of family functions, especially for diverse families with thousands of known members. Strategies that exploit large amounts of available amino acid sequence data can inspire and guide biochemical experiments, generating broadly applicable insights into a given family. Here we review several methods that utilize abundant sequence data to focus experimental efforts and identify features truly representative of a protein family or domain. First, coevolutionary relationships between residues within primary sequences can be successfully exploited to identify structurally and/or functionally important positions for experimental investigation. Second, functionally important variable residue positions typically occupy a limited sequence space, a property useful for guiding biochemical characterization of the effects of the most physiologically and evolutionarily relevant amino acids. Third, amino acid sequence variation within domains shared between different protein families can be used to sort a particular domain into multiple subtypes, inspiring further experimental designs. Although generally applicable to any kind of protein domain because they depend solely on amino acid sequences, the second and third approaches are reviewed in detail because they appear to have been used infrequently and offer immediate opportunities for new advances. Finally, we speculate that future technologies capable of analyzing and manipulating conserved and variable aspects of the three-dimensional structures of a protein family could lead to broad insights not attainable by current methods.
Collapse
Affiliation(s)
- Emily N. Kennedy
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| | - Clay A. Foster
- Department of Pediatrics, Section Hematology/Oncology, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, United States of America
| | - Sarah A. Barr
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| | - Robert B. Bourret
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| |
Collapse
|
43
|
Protein structure prediction in the deep learning era. Curr Opin Struct Biol 2022; 77:102495. [PMID: 36371845 DOI: 10.1016/j.sbi.2022.102495] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 10/03/2022] [Accepted: 10/04/2022] [Indexed: 11/11/2022]
Abstract
Significant advances have been achieved in protein structure prediction, especially with the recent development of the AlphaFold2 and the RoseTTAFold systems. This article reviews the progress in deep learning-based protein structure prediction methods in the past two years. First, we divide the representative methods into two categories: the two-step approach and the end-to-end approach. Then, we show that the two-step approach is possible to achieve similar accuracy to the state-of-the-art end-to-end approach AlphaFold2. Compared to the end-to-end approach, the two-step approach requires fewer computing resources. We conclude that it is valuable to keep developing both approaches. Finally, a few outstanding challenges in function-orientated protein structure prediction are pointed out for future development.
Collapse
|
44
|
Abstract
A key goal of synthetic biology is to enable designed modification of peptides and proteins, both in vivo and in vitro. N- and C-Terminal modification enzymes are crucial in this regard, but there are a few enzymatic options to protect peptide termini. AgeMTPT protects the N-terminus of short peptides with isoprene and the C-terminus as a methyl ester, but its substrate scope is unknown, limiting its application. Here, we investigate the substrate selectivity of the prenyltransferase domain, revealing a requirement for N-terminal aromatic amino acids, but with tolerance for diverse uncharged amino acids in the remaining positions. To demonstrate the potential of the enzyme, substrate selectivity data were used in the enzymatic modification of leu-enkephalin at the critical N-terminal residue. AgeMTPT active site mutagenesis led to an enzyme with expanded substrate scope, including the reverse geranylation of the N-termini of peptides. These data reveal potential applications of enzymatic peptide protection in synthetic biology.
Collapse
Affiliation(s)
- Ying Cong
- Department of Medicinal Chemistry, University of Utah, Salt Lake City, UT, 84112, USA
| | - Paul D. Scesa
- Department of Medicinal Chemistry, University of Utah, Salt Lake City, UT, 84112, USA
| | - Eric W. Schmidt
- Department of Medicinal Chemistry, University of Utah, Salt Lake City, UT, 84112, USA
| |
Collapse
|
45
|
Cui L, Cui A, Li Q, Yang L, Liu H, Shao W, Feng Y. Molecular Evolution of an Aminotransferase Based on Substrate–Enzyme Binding Energy Analysis for Efficient Valienamine Synthesis. ACS Catal 2022. [DOI: 10.1021/acscatal.2c03784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Li Cui
- State Key Laboratory of Microbial Metabolism, School of Life Science & Biotechnology, and Joint International Research Laboratory of Metabolic & Developmental Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Anqi Cui
- State Key Laboratory of Microbial Metabolism, School of Life Science & Biotechnology, and Joint International Research Laboratory of Metabolic & Developmental Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Qitong Li
- State Key Laboratory of Microbial Metabolism, School of Life Science & Biotechnology, and Joint International Research Laboratory of Metabolic & Developmental Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Lezhou Yang
- State Key Laboratory of Microbial Metabolism, School of Life Science & Biotechnology, and Joint International Research Laboratory of Metabolic & Developmental Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Hao Liu
- State Key Laboratory of Microbial Metabolism, School of Life Science & Biotechnology, and Joint International Research Laboratory of Metabolic & Developmental Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Wenguang Shao
- State Key Laboratory of Microbial Metabolism, School of Life Science & Biotechnology, and Joint International Research Laboratory of Metabolic & Developmental Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yan Feng
- State Key Laboratory of Microbial Metabolism, School of Life Science & Biotechnology, and Joint International Research Laboratory of Metabolic & Developmental Sciences, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
46
|
Azzaz F, Yahi N, Chahinian H, Fantini J. The Epigenetic Dimension of Protein Structure Is an Intrinsic Weakness of the AlphaFold Program. Biomolecules 2022; 12:biom12101527. [PMID: 36291736 PMCID: PMC9599222 DOI: 10.3390/biom12101527] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 10/12/2022] [Accepted: 10/16/2022] [Indexed: 12/02/2022] Open
Abstract
One of the most important lessons we have learned from sequencing the human genome is that not all proteins have a 3D structure. In fact, a large part of the human proteome is made up of intrinsically disordered proteins (IDPs) which can adopt multiple structures, and therefore, multiple functions, depending on the ligands with which they interact. Under these conditions, one can wonder about the value of algorithms developed for predicting the structure of proteins, in particular AlphaFold, an AI which claims to have solved the problem of protein structure. In a recent study, we highlighted a particular weakness of AlphaFold for membrane proteins. Based on this observation, we have proposed a paradigm, referred to as “Epigenetic Dimension of Protein Structure” (EDPS), which takes into account all environmental parameters that control the structure of a protein beyond the amino acid sequence (hence “epigenetic”). In this new study, we compare the reliability of the AlphaFold and Robetta algorithms’ predictions for a new set of membrane proteins involved in human pathologies. We found that Robetta was generally more accurate than AlphaFold for ascribing a membrane-compatible topology. Raft lipids (e.g., gangliosides), which control the structural dynamics of membrane protein structure through chaperone effects, were identified as major actors of the EDPS paradigm. We conclude that the epigenetic dimension of a protein structure is an intrinsic weakness of AI-based protein structure prediction, especially AlphaFold, which warrants further development.
Collapse
|
47
|
Prabhu GRD, Yang TH, Shiu RT, Witek HA, Urban PL. Scanning pH-metry for Observing Reversibility in Protein Folding. Biochemistry 2022; 61:2377-2389. [PMID: 36251331 DOI: 10.1021/acs.biochem.2c00453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
One of the main factors affecting protein structure in solution is pH. Traditionally, to study pH-dependent conformational changes in proteins, the concentration of the H+ ions is adjusted manually, complicating real-time analyses, hampering dynamic pH regulation, and consequently leading to a limited number of tested pH levels. Here, we present a programmable device, a scanning pH-meter, that can automatically generate different types of pH ramps and waveforms in a solution. A feedback loop algorithm calculates the required flow rates of the acid/base titrants, allowing one, for example, to generate periodic pH sine waveforms to study the reversibility of protein folding by fluorescence spectroscopy. Interestingly, for some proteins, the fluorescence intensity profiles recorded in such a periodically oscillating pH environment display hysteretic behavior indicating an asymmetry in the sequence of the protein unfolding/refolding events, which can most likely be attributed to their distinct kinetics. Another useful application of the scanning pH-meter concerns coupling it with an electrospray ionization mass spectrometer to observe pH-induced structural changes in proteins as revealed by their varying charge-state distributions. We anticipate a broad range of applications of the scanning pH-meter developed here, including protein folding studies, determination of the optimum pH for achieving maximum fluorescence intensity, and characterization of fluorescent dyes and other synthetic materials.
Collapse
Affiliation(s)
- Gurpur Rakesh D Prabhu
- Department of Chemistry, National Tsing Hua University, 101, Sec 2, Kuang-Fu Road, Hsinchu300044, Taiwan
- Department of Applied Chemistry, National Yang Ming Chiao Tung University, 1001 University Road, Hsinchu300093, Taiwan
| | - Tzu-Hsin Yang
- Department of Chemistry, National Tsing Hua University, 101, Sec 2, Kuang-Fu Road, Hsinchu300044, Taiwan
| | - Ruei-Tzung Shiu
- Department of Chemistry, National Tsing Hua University, 101, Sec 2, Kuang-Fu Road, Hsinchu300044, Taiwan
| | - Henryk A Witek
- Department of Applied Chemistry, National Yang Ming Chiao Tung University, 1001 University Road, Hsinchu300093, Taiwan
- Center for Emergent Functional Matter Science, National Yang Ming Chiao Tung University, 1001 University Road, Hsinchu300093, Taiwan
| | - Pawel L Urban
- Department of Chemistry, National Tsing Hua University, 101, Sec 2, Kuang-Fu Road, Hsinchu300044, Taiwan
- Frontier Research Center on Fundamental and Applied Sciences of Matters, National Tsing Hua University, 101, Sec 2, Kuang-Fu Road, Hsinchu300044, Taiwan
| |
Collapse
|
48
|
Peng CX, Zhou XG, Xia YH, Liu J, Hou MH, Zhang GJ. Structural analogue-based protein structure domain assembly assisted by deep learning. Bioinformatics 2022; 38:4513-4521. [PMID: 35962986 DOI: 10.1093/bioinformatics/btac553] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Revised: 07/27/2022] [Accepted: 08/08/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION With the breakthrough of AlphaFold2, the protein structure prediction problem has made remarkable progress through deep learning end-to-end techniques, in which correct folds could be built for nearly all single-domain proteins. However, the full-chain modelling appears to be lower on average accuracy than that for the constituent domains and requires higher demand on computing hardware, indicating the performance of full-chain modelling still needs to be improved. In this study, we investigate whether the predicted accuracy of the full-chain model can be further improved by domain assembly assisted by deep learning. RESULTS In this article, we developed a structural analogue-based protein structure domain assembly method assisted by deep learning, named SADA. In SADA, a multi-domain protein structure database was constructed for the full-chain analogue detection using individual domain models. Starting from the initial model constructed from the analogue, the domain assembly simulation was performed to generate the full-chain model through a two-stage differential evolution algorithm guided by the energy function with an inter-residue distance potential predicted by deep learning. SADA was compared with the state-of-the-art domain assembly methods on 356 benchmark proteins, and the average TM-score of SADA models is 8.1% and 27.0% higher than that of DEMO and AIDA, respectively. We also assembled 293 human multi-domain proteins, where the average TM-score of the full-chain model after the assembly by SADA is 1.1% higher than that of the model by AlphaFold2. To conclude, we find that the domains often interact in the similar way in the quaternary orientations if the domains have similar tertiary structures. Furthermore, homologous templates and structural analogues are complementary for multi-domain protein full-chain modelling. AVAILABILITY AND IMPLEMENTATION http://zhanglab-bioinf.com/SADA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chun-Xiang Peng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiao-Gen Zhou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Yu-Hao Xia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Ming-Hua Hou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
49
|
Characterization of Treponema denticola Major Surface Protein (Msp) by Deletion Analysis and Advanced Molecular Modeling. J Bacteriol 2022; 204:e0022822. [PMID: 35913147 PMCID: PMC9487533 DOI: 10.1128/jb.00228-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
Treponema denticola, a keystone pathogen in periodontitis, is a model organism for studying Treponema physiology and host-microbe interactions. Its major surface protein Msp forms an oligomeric outer membrane complex that binds fibronectin, has cytotoxic pore-forming activity, and disrupts several intracellular processes in host cells. T. denticola msp is an ortholog of the Treponema pallidum tprA to -K gene family that includes tprK, whose remarkable in vivo hypervariability is proposed to contribute to T. pallidum immune evasion. We recently identified the primary Msp surface-exposed epitope and proposed a model of the Msp protein as a β-barrel protein similar to Gram-negative bacterial porins. Here, we report fine-scale Msp mutagenesis demonstrating that both the N and C termini as well as the centrally located Msp surface epitope are required for native Msp oligomer expression. Removal of as few as three C-terminal amino acids abrogated Msp detection on the T. denticola cell surface, and deletion of four residues resulted in complete loss of detectable Msp. Substitution of a FLAG tag for either residues 6 to 13 of mature Msp or an 8-residue portion of the central Msp surface epitope resulted in expression of full-length Msp but absence of the oligomer, suggesting roles for both domains in oligomer formation. Consistent with previously reported Msp N-glycosylation, proteinase K treatment of intact cells released a 25 kDa polypeptide containing the Msp surface epitope into culture supernatants. Molecular modeling of Msp using novel metagenome-derived multiple sequence alignment (MSA) algorithms supports the hypothesis that Msp is a large-diameter, trimeric outer membrane porin-like protein whose potential transport substrate remains to be identified. IMPORTANCE The Treponema denticola gene encoding its major surface protein (Msp) is an ortholog of the T. pallidum tprA to -K gene family that includes tprK, whose remarkable in vivo hypervariability is proposed to contribute to T. pallidum immune evasion. Using a combined strategy of fine-scale mutagenesis and advanced predictive molecular modeling, we characterized the Msp protein and present a high-confidence model of its structure as an oligomer embedded in the outer membrane. This work adds to knowledge of Msp-like proteins in oral treponemes and may contribute to understanding the evolutionary and potential functional relationships between T. denticola Msp and the orthologous T. pallidum Tpr proteins.
Collapse
|
50
|
Mahmoud NA, Elshafei AM, Almofti YA. A novel strategy for developing vaccine candidate against Jaagsiekte sheep retrovirus from the envelope and gag proteins: an in-silico approach. BMC Vet Res 2022; 18:343. [PMID: 36085036 PMCID: PMC9463060 DOI: 10.1186/s12917-022-03431-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 08/29/2022] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Sheep pulmonary adenocarcinoma (OPA) is a contagious lung cancer of sheep caused by the Jaagsiekte retrovirus (JSRV). OPA typically has a serious economic impact worldwide. A vaccine has yet to be developed, even though the disease has been globally spread, along with its complications. This study aimed to construct an effective multi-epitopes vaccine against JSRV eliciting B and T lymphocytes using immunoinformatics tools. RESULTS The designed vaccine was composed of 499 amino acids. Before the vaccine was computationally validated, all critical parameters were taken into consideration; including antigenicity, allergenicity, toxicity, and stability. The physiochemical properties of the vaccine displayed an isoelectric point of 9.88. According to the Instability Index (II), the vaccine was stable at 28.28. The vaccine scored 56.51 on the aliphatic index and -0.731 on the GRAVY, indicating that the vaccine was hydrophilic. The RaptorX server was used to predict the vaccine's tertiary structure, the GalaxyWEB server refined the structure, and the Ramachandran plot and the ProSA-web server validated the vaccine's tertiary structure. Protein-sol and the SOLPro servers showed the solubility of the vaccine. Moreover, the high mobile regions in the vaccine's structure were reduced and the vaccine's stability was improved by disulfide engineering. Also, the vaccine construct was docked with an ovine MHC-1 allele and showed efficient binding energy. Immune simulation remarkably showed high levels of immunoglobulins, T lymphocytes, and INF-γ secretions. The molecular dynamic simulation provided the stability of the constructed vaccine. Finally, the vaccine was back-transcribed into a DNA sequence and cloned into a pET-30a ( +) vector to affirm the potency of translation and microbial expression. CONCLUSION A novel multi-epitopes vaccine construct against JSRV, was formed from B and T lymphocytes epitopes, and was produced with potential protection. This study might help in controlling and eradicating OPA.
Collapse
Affiliation(s)
- Nuha Amin Mahmoud
- Department of Biochemistry, Genetics and Molecular Biology/ Faculty of Medicine and Surgery, National University, Khartoum, Sudan
| | - Abdelmajeed M Elshafei
- Department of Biochemistry, Genetics and Molecular Biology/ Faculty of Medicine and Surgery, National University, Khartoum, Sudan
| | - Yassir A Almofti
- Department of Biochemistry, Genetics and Molecular Biology/ Faculty of Medicine and Surgery, National University, Khartoum, Sudan.
- Department of Molecular Biology and Bioinformatics, College of Veterinary Medicine, University of Bahri, Khartoum, Sudan.
| |
Collapse
|