1
|
Kamal MM, Mia MS, Faruque MO, Rabby MG, Islam MN, Talukder MEK, Wani TA, Rahman MA, Hasan MM. In silico functional, structural and pathogenicity analysis of missense single nucleotide polymorphisms in human MCM6 gene. Sci Rep 2024; 14:11607. [PMID: 38773180 PMCID: PMC11109216 DOI: 10.1038/s41598-024-62299-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Accepted: 05/15/2024] [Indexed: 05/23/2024] Open
Abstract
Single nucleotide polymorphisms (SNPs) are one of the most common determinants and potential biomarkers of human disease pathogenesis. SNPs could alter amino acid residues, leading to the loss of structural and functional integrity of the encoded protein. In humans, members of the minichromosome maintenance (MCM) family play a vital role in cell proliferation and have a significant impact on tumorigenesis. Among the MCM members, the molecular mechanism of how missense SNPs of minichromosome maintenance complex component 6 (MCM6) contribute to DNA replication and tumor pathogenesis is underexplored and needs to be elucidated. Hence, a series of sequence and structure-based computational tools were utilized to determine how mutations affect the corresponding MCM6 protein. From the dbSNP database, among 15,009 SNPs in the MCM6 gene, 642 missense SNPs (4.28%), 291 synonymous SNPs (1.94%), and 12,500 intron SNPs (83.28%) were observed. Out of the 642 missense SNPs, 33 were found to be deleterious during the SIFT analysis. Among these, 11 missense SNPs (I123S, R207C, R222C, L449F, V456M, D463G, H556Y, R602H, R633W, R658C, and P815T) were found as deleterious, probably damaging, affective and disease-associated. Then, I123S, R207C, R222C, V456M, D463G, R602H, R633W, and R658C missense SNPs were found to be highly harmful. Six missense SNPs (I123S, R207C, V456M, D463G, R602H, and R633W) had the potential to destabilize the corresponding protein as predicted by DynaMut2. Interestingly, five high-risk mutations (I123S, V456M, D463G, R602H, and R633W) were distributed in two domains (PF00493 and PF14551). During molecular dynamics simulations analysis, consistent fluctuation in RMSD and RMSF values, high Rg and hydrogen bonds in mutant proteins compared to wild-type revealed that these mutations might alter the protein structure and stability of the corresponding protein. Hence, the results from the analyses guide the exploration of the mechanism by which these missense SNPs of the MCM6 gene alter the structural integrity and functional properties of the protein, which could guide the identification of ways to minimize the harmful effects of these mutations in humans.
Collapse
Affiliation(s)
- Md Mostafa Kamal
- Department of Nutrition and Food Technology, Jashore University of Science and Technology, Jashore, 7408, Bangladesh
| | - Md Sohel Mia
- Department of Nutrition and Food Technology, Jashore University of Science and Technology, Jashore, 7408, Bangladesh
| | - Md Omar Faruque
- Department of Nutrition and Food Technology, Jashore University of Science and Technology, Jashore, 7408, Bangladesh
| | - Md Golam Rabby
- Department of Nutrition and Food Technology, Jashore University of Science and Technology, Jashore, 7408, Bangladesh
| | - Md Numan Islam
- Department of Food Engineering, North Pacific International University of Bangladesh, Dhaka, Bangladesh
| | | | - Tanveer A Wani
- Department of Pharmaceutical Chemistry, College of Pharmacy, King Saud University, 11451, Riyadh, Saudi Arabia
| | - M Atikur Rahman
- Department of Biological Sciences, Alabama State University, 915 S Jackson St, Montgomery, AL, 36104, USA.
| | - Md Mahmudul Hasan
- Department of Nutrition and Food Technology, Jashore University of Science and Technology, Jashore, 7408, Bangladesh.
| |
Collapse
|
2
|
Nazir A, Shad M, Rehman HM, Azim N, Sajjad M. Application of SUMO fusion technology for the enhancement of stability and activity of lysophospholipase from Pyrococcus abyssi. World J Microbiol Biotechnol 2024; 40:183. [PMID: 38722449 DOI: 10.1007/s11274-024-03998-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Accepted: 04/21/2024] [Indexed: 05/18/2024]
Abstract
Heterologous production of proteins in Escherichia coli has raised several challenges including soluble production of target proteins, high levels of expression and purification. Fusion tags can serve as the important tools to overcome these challenges. SUMO (small ubiquitin-related modifier) is one of these tags whose fusion to native protein sequence can enhance its solubility and stability. In current research, a simple, efficient and cost-effective method is being discussed for the construction of pET28a-SUMO vector. In order to improve the stability and activity of lysophospholipase from Pyrococcus abyssi (Pa-LPL), a 6xHis-SUMO tag was fused to N-terminal of Pa-LPL by using pET28a-SUMO vector. Recombinant SUMO-fused enzyme (6 H-S-PaLPL) works optimally at 35 °C and pH 6.5 with remarkable thermostability at 35-95 °C. Thermo-inactivation kinetics of 6 H-S-PaLPL were also studied at 35-95 °C with first order rate constant (kIN) of 5.58 × 10- 2 h-1 and half-life of 12 ± 0 h at 95 °C. Km and Vmax for the hydrolysis of 4-nitrophenyl butyrate were calculated to be 2 ± 0.015 mM and 3882 ± 22.368 U/mg, respectively. 2.4-fold increase in Vmax of Pa-LPL was observed after fusion of 6xHis-SUMO tag to its N-terminal. It is the first report on the utilization of SUMO fusion tag to enhance the overall stability and activity of Pa-LPL. Fusion of 6xHis-SUMO tag not only aided in the purification process but also played a crucial role in increasing the thermostability and activity of the enzyme. SUMO-fused enzyme, thus generated, can serve as an important candidate for degumming of vegetable oils at industrial scale.
Collapse
Affiliation(s)
- Arshia Nazir
- School of Biological Sciences, University of the Punjab, Lahore, Pakistan
| | - Mohsin Shad
- School of Biological Sciences, University of the Punjab, Lahore, Pakistan
| | | | - Naseema Azim
- School of Biological Sciences, University of the Punjab, Lahore, Pakistan
| | - Muhammad Sajjad
- School of Biological Sciences, University of the Punjab, Lahore, Pakistan.
| |
Collapse
|
3
|
Shibata M, Lin X, Onuchic JN, Yura K, Cheng RR. Residue coevolution and mutational landscape for OmpR and NarL response regulator subfamilies. Biophys J 2024; 123:681-692. [PMID: 38291753 PMCID: PMC10995415 DOI: 10.1016/j.bpj.2024.01.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 12/31/2023] [Accepted: 01/24/2024] [Indexed: 02/01/2024] Open
Abstract
DNA-binding response regulators (DBRRs) are a broad class of proteins that operate in tandem with their partner kinase proteins to form two-component signal transduction systems in bacteria. Typical DBRRs are composed of two domains where the conserved N-terminal domain accepts transduced signals and the evolutionarily diverse C-terminal domain binds to DNA. These domains are assumed to be functionally independent, and hence recombination of the two domains should yield novel DBRRs of arbitrary input/output response, which can be used as biosensors. This idea has been proved to be successful in some cases; yet, the error rate is not trivial. Improvement of the success rate of this technique requires a deeper understanding of the linker-domain and inter-domain residue interactions, which have not yet been thoroughly examined. Here, we studied residue coevolution of DBRRs of the two main subfamilies (OmpR and NarL) using large collections of bacterial amino acid sequences to extensively investigate the evolutionary signatures of linker-domain and inter-domain residue interactions. Coevolutionary analysis uncovered evolutionarily selected linker-domain and inter-domain residue interactions of known experimental structures, as well as previously unknown inter-domain residue interactions. We examined the possibility of these inter-domain residue interactions as contacts that stabilize an inactive conformation of the DBRR where DNA binding is inhibited for both subfamilies. The newly gained insights on linker-domain/inter-domain residue interactions and shared inactivation mechanisms improve the understanding of the functional mechanism of DBRRs, providing clues to efficiently create functional DBRR-based biosensors. Additionally, we show the feasibility of applying coevolutionary landscape models to predict the functionality of domain-swapped DBRR proteins. The presented result demonstrates that sequence information can be used to filter out bioengineered DBRR proteins that are predicted to be nonfunctional due to a high negative predictive value.
Collapse
Affiliation(s)
- Mayu Shibata
- Graduate School of Humanities and Sciences, Ochanomizu University, Bunkyo, Tokyo, Japan; Center for Theoretical Biological Physics, Rice University, Houston Texas
| | - Xingcheng Lin
- Department of Physics, North Carolina State University, Raleigh, North Carolina; Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina
| | - José N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston Texas; Department of Physics and Astronomy, Chemistry, and Biosciences, Rice University, Houston, Texas
| | - Kei Yura
- Graduate School of Humanities and Sciences, Ochanomizu University, Bunkyo, Tokyo, Japan; Center for Interdisciplinary AI and Data Science, Ochanomizu University, Bunkyo, Tokyo, Japan; Graduate School of Advanced Science and Engineering, Waseda University, Shinjuku, Tokyo, Japan
| | - Ryan R Cheng
- Department of Chemistry, University of Kentucky, Lexington, Kentucky.
| |
Collapse
|
4
|
Versini R, Sritharan S, Aykac Fas B, Tubiana T, Aimeur SZ, Henri J, Erard M, Nüsse O, Andreani J, Baaden M, Fuchs P, Galochkina T, Chatzigoulas A, Cournia Z, Santuz H, Sacquin-Mora S, Taly A. A Perspective on the Prospective Use of AI in Protein Structure Prediction. J Chem Inf Model 2024; 64:26-41. [PMID: 38124369 DOI: 10.1021/acs.jcim.3c01361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
AlphaFold2 (AF2) and RoseTTaFold (RF) have revolutionized structural biology, serving as highly reliable and effective methods for predicting protein structures. This article explores their impact and limitations, focusing on their integration into experimental pipelines and their application in diverse protein classes, including membrane proteins, intrinsically disordered proteins (IDPs), and oligomers. In experimental pipelines, AF2 models help X-ray crystallography in resolving the phase problem, while complementarity with mass spectrometry and NMR data enhances structure determination and protein flexibility prediction. Predicting the structure of membrane proteins remains challenging for both AF2 and RF due to difficulties in capturing conformational ensembles and interactions with the membrane. Improvements in incorporating membrane-specific features and predicting the structural effect of mutations are crucial. For intrinsically disordered proteins, AF2's confidence score (pLDDT) serves as a competitive disorder predictor, but integrative approaches including molecular dynamics (MD) simulations or hydrophobic cluster analyses are advocated for accurate dynamics representation. AF2 and RF show promising results for oligomeric models, outperforming traditional docking methods, with AlphaFold-Multimer showing improved performance. However, some caveats remain in particular for membrane proteins. Real-life examples demonstrate AF2's predictive capabilities in unknown protein structures, but models should be evaluated for their agreement with experimental data. Furthermore, AF2 models can be used complementarily with MD simulations. In this Perspective, we propose a "wish list" for improving deep-learning-based protein folding prediction models, including using experimental data as constraints and modifying models with binding partners or post-translational modifications. Additionally, a meta-tool for ranking and suggesting composite models is suggested, driving future advancements in this rapidly evolving field.
Collapse
Affiliation(s)
- Raphaelle Versini
- Laboratoire de Biochimie Théorique, CNRS (UPR9080), Université Paris Cité, F-75005 Paris, France
| | - Sujith Sritharan
- Laboratoire de Biochimie Théorique, CNRS (UPR9080), Université Paris Cité, F-75005 Paris, France
| | - Burcu Aykac Fas
- Laboratoire de Biochimie Théorique, CNRS (UPR9080), Université Paris Cité, F-75005 Paris, France
| | - Thibault Tubiana
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Sana Zineb Aimeur
- Université Paris-Saclay, CNRS, Institut de Chimie Physique, 91405 Orsay, France
| | - Julien Henri
- Sorbonne Université, CNRS, Laboratoire de Biologie, Computationnelle et Quantitative UMR 7238, Institut de Biologie Paris-Seine, 4 Place Jussieu, F-75005 Paris, France
| | - Marie Erard
- Université Paris-Saclay, CNRS, Institut de Chimie Physique, 91405 Orsay, France
| | - Oliver Nüsse
- Université Paris-Saclay, CNRS, Institut de Chimie Physique, 91405 Orsay, France
| | - Jessica Andreani
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Marc Baaden
- Laboratoire de Biochimie Théorique, CNRS (UPR9080), Université Paris Cité, F-75005 Paris, France
| | - Patrick Fuchs
- Sorbonne Université, École Normale Supérieure, PSL University, CNRS, Laboratoire des Biomolécules, LBM, 75005 Paris, France
- Université de Paris, UFR Sciences du Vivant, 75013 Paris, France
| | - Tatiana Galochkina
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75014 Paris, France
| | - Alexios Chatzigoulas
- Biomedical Research Foundation, Academy of Athens, 11527 Athens, Greece
- Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, 15784 Athens, Greece
| | - Zoe Cournia
- Biomedical Research Foundation, Academy of Athens, 11527 Athens, Greece
- Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, 15784 Athens, Greece
| | - Hubert Santuz
- Laboratoire de Biochimie Théorique, CNRS (UPR9080), Université Paris Cité, F-75005 Paris, France
| | - Sophie Sacquin-Mora
- Laboratoire de Biochimie Théorique, CNRS (UPR9080), Université Paris Cité, F-75005 Paris, France
| | - Antoine Taly
- Laboratoire de Biochimie Théorique, CNRS (UPR9080), Université Paris Cité, F-75005 Paris, France
| |
Collapse
|
5
|
Vander Meersche Y, Cretin G, Gheeraert A, Gelly JC, Galochkina T. ATLAS: protein flexibility description from atomistic molecular dynamics simulations. Nucleic Acids Res 2024; 52:D384-D392. [PMID: 37986215 PMCID: PMC10767941 DOI: 10.1093/nar/gkad1084] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 10/15/2023] [Accepted: 10/30/2023] [Indexed: 11/22/2023] Open
Abstract
Dynamical behaviour is one of the most crucial protein characteristics. Despite the advances in the field of protein structure resolution and prediction, analysis and prediction of protein dynamic properties remains a major challenge, mostly due to the low accessibility of data and its diversity and heterogeneity. To address this issue, we present ATLAS, a database of standardised all-atom molecular dynamics simulations, accompanied by their analysis in the form of interactive diagrams and trajectory visualisation. ATLAS offers a large-scale view and valuable insights on protein dynamics for a large and representative set of proteins, by combining data obtained through molecular dynamics simulations with information extracted from experimental structures. Users can easily analyse dynamic properties of functional protein regions, such as domain limits (hinge positions) and residues involved in interaction with other biological molecules. Additionally, the database enables exploration of proteins with uncommon dynamic properties conditioned by their environment such as chameleon subsequences and Dual Personality Fragments. The ATLAS database is freely available at https://www.dsimb.inserm.fr/ATLAS.
Collapse
Affiliation(s)
- Yann Vander Meersche
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75014 Paris, France
| | - Gabriel Cretin
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75014 Paris, France
| | - Aria Gheeraert
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75014 Paris, France
| | - Jean-Christophe Gelly
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75014 Paris, France
| | - Tatiana Galochkina
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75014 Paris, France
| |
Collapse
|
6
|
Chakraborty A, Hussain A, Sabnam N. Uncovering the structural stability of Magnaporthe oryzae effectors: a secretome-wide in silico analysis. J Biomol Struct Dyn 2023:1-22. [PMID: 38109060 DOI: 10.1080/07391102.2023.2292795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Accepted: 11/23/2023] [Indexed: 12/19/2023]
Abstract
Rice blast, caused by the ascomycete fungus Magnaporthe oryzae, is a deadly disease and a major threat to global food security. The pathogen secretes small proteinaceous effectors, virulence factors, inside the host to manipulate and perturb the host immune system, allowing the pathogen to colonize and establish a successful infection. While the molecular functions of several effectors are characterized, very little is known about the structural stability of these effectors. We analyzed a total of 554 small secretory proteins (SSPs) from the M. oryzae secretome to decipher key features of intrinsic disorder (ID) and the structural dynamics of the selected putative effectors through thorough and systematic in silico studies. Our results suggest that out of the total SSPs, 66% were predicted as effector proteins, released either into the apoplast or cytoplasm of the host cell. Of these, 68% were found to be intrinsically disordered effector proteins (IDEPs). Among the six distinct classes of disordered effectors, we observed peculiar relationships between the localization of several effectors in the apoplast or cytoplasm and the degree of disorder. We determined the degree of structural disorder and its impact on protein foldability across all the putative small secretory effector proteins from the blast pathogen, further validated by molecular dynamics simulation studies. This study provides definite clues toward unraveling the mystery behind the importance of structural distortions in effectors and their impact on plant-pathogen interactions. The study of these dynamical segments may help identify new effectors as well.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
| | - Afzal Hussain
- Department of Bioinformatics, Maulana Azad National Institute of Technology, Bhopal, India
| | - Nazmiara Sabnam
- Department of Life Sciences, Presidency University, Kolkata, India
| |
Collapse
|
7
|
Li R, Makogon A, Galochkina T, Lemineur JF, Kanoufi F, Shkirskiy V. Unsupervised Analysis of Optical Imaging Data for the Discovery of Reactivity Patterns in Metal Alloy. SMALL METHODS 2023; 7:e2300214. [PMID: 37382395 DOI: 10.1002/smtd.202300214] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 06/08/2023] [Indexed: 06/30/2023]
Abstract
Operando wide-field optical microscopy imaging yields a wealth of information about the reactivity of metal interfaces, yet the data are often unstructured and challenging to process. In this study, the power of unsupervised machine learning (ML) algorithms is harnessed to analyze chemical reactivity images obtained dynamically by reflectivity microscopy in combination with ex situ scanning electron microscopy to identify and cluster the chemical reactivity of particles in Al alloy. The ML analysis uncovers three distinct clusters of reactivity from unlabeled datasets. A detailed examination of representative reactivity patterns confirms the chemical communication of generated OH- fluxes within particles, as supported by statistical analysis of size distribution and finite element modelling (FEM). The ML procedures also reveal statistically significant patterns of reactivity under dynamic conditions, such as pH acidification. The results align well with a numerical model of chemical communication, underscoring the synergy between data-driven ML and physics-driven FEM approaches.
Collapse
Affiliation(s)
- Rui Li
- Université Paris Cité, ITODYS, CNRS, Paris, 75013, France
| | | | | | | | | | | |
Collapse
|
8
|
Patsch D, Eichenberger M, Voss M, Bornscheuer UT, Buller RM. LibGENiE - A bioinformatic pipeline for the design of information-enriched enzyme libraries. Comput Struct Biotechnol J 2023; 21:4488-4496. [PMID: 37736300 PMCID: PMC10510078 DOI: 10.1016/j.csbj.2023.09.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 09/13/2023] [Accepted: 09/13/2023] [Indexed: 09/23/2023] Open
Abstract
Enzymes are potent catalysts with high specificity and selectivity. To leverage nature's synthetic potential for industrial applications, various protein engineering techniques have emerged which allow to tailor the catalytic, biophysical, and molecular recognition properties of enzymes. However, the many possible ways a protein can be altered forces researchers to carefully balance between the exhaustiveness of an enzyme screening campaign and the required resources. Consequently, the optimal engineering strategy is often defined on a case-by-case basis. Strikingly, while predicting mutations that lead to an improved target function is challenging, here we show that the prediction and exclusion of deleterious mutations is a much more straightforward task as analyzed for an engineered carbonic acid anhydrase, a transaminase, a squalene-hopene cyclase and a Kemp eliminase. Combining such a pre-selection of allowed residues with advanced gene synthesis methods opens a path toward an efficient and generalizable library construction approach for protein engineering. To give researchers easy access to this methodology, we provide the website LibGENiE containing the bioinformatic tools for the library design workflow.
Collapse
Affiliation(s)
- David Patsch
- Zurich University of Applied Sciences, School of Life Sciences and Facility Management, Institute of Chemistry and Biotechnology, Einsiedlerstrasse 31, 8820 Wädenswil, Switzerland
- Institute of Biochemistry, Department of Biotechnology & Enzyme Catalysis, Greifswald University, Felix-Hausdorff-Strasse 4, 17487 Greifswald, Germany
| | - Michael Eichenberger
- Zurich University of Applied Sciences, School of Life Sciences and Facility Management, Institute of Chemistry and Biotechnology, Einsiedlerstrasse 31, 8820 Wädenswil, Switzerland
| | - Moritz Voss
- Zurich University of Applied Sciences, School of Life Sciences and Facility Management, Institute of Chemistry and Biotechnology, Einsiedlerstrasse 31, 8820 Wädenswil, Switzerland
| | - Uwe T. Bornscheuer
- Institute of Biochemistry, Department of Biotechnology & Enzyme Catalysis, Greifswald University, Felix-Hausdorff-Strasse 4, 17487 Greifswald, Germany
| | - Rebecca M. Buller
- Zurich University of Applied Sciences, School of Life Sciences and Facility Management, Institute of Chemistry and Biotechnology, Einsiedlerstrasse 31, 8820 Wädenswil, Switzerland
| |
Collapse
|
9
|
Mardikoraem M, Woldring D. Protein Fitness Prediction Is Impacted by the Interplay of Language Models, Ensemble Learning, and Sampling Methods. Pharmaceutics 2023; 15:1337. [PMID: 37242577 PMCID: PMC10224321 DOI: 10.3390/pharmaceutics15051337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 04/19/2023] [Accepted: 04/21/2023] [Indexed: 05/28/2023] Open
Abstract
Advances in machine learning (ML) and the availability of protein sequences via high-throughput sequencing techniques have transformed the ability to design novel diagnostic and therapeutic proteins. ML allows protein engineers to capture complex trends hidden within protein sequences that would otherwise be difficult to identify in the context of the immense and rugged protein fitness landscape. Despite this potential, there persists a need for guidance during the training and evaluation of ML methods over sequencing data. Two key challenges for training discriminative models and evaluating their performance include handling severely imbalanced datasets (e.g., few high-fitness proteins among an abundance of non-functional proteins) and selecting appropriate protein sequence representations (numerical encodings). Here, we present a framework for applying ML over assay-labeled datasets to elucidate the capacity of sampling techniques and protein encoding methods to improve binding affinity and thermal stability prediction tasks. For protein sequence representations, we incorporate two widely used methods (One-Hot encoding and physiochemical encoding) and two language-based methods (next-token prediction, UniRep; masked-token prediction, ESM). Elaboration on performance is provided over protein fitness, protein size, and sampling techniques. In addition, an ensemble of protein representation methods is generated to discover the contribution of distinct representations and improve the final prediction score. We then implement multiple criteria decision analysis (MCDA; TOPSIS with entropy weighting), using multiple metrics well-suited for imbalanced data, to ensure statistical rigor in ranking our methods. Within the context of these datasets, the synthetic minority oversampling technique (SMOTE) outperformed undersampling while encoding sequences with One-Hot, UniRep, and ESM representations. Moreover, ensemble learning increased the predictive performance of the affinity-based dataset by 4% compared to the best single-encoding candidate (F1-score = 97%), while ESM alone was rigorous enough in stability prediction (F1-score = 92%).
Collapse
Affiliation(s)
- Mehrsa Mardikoraem
- Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, MI 48824, USA
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Daniel Woldring
- Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, MI 48824, USA
- Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
10
|
Wang W, Su X, Liu D, Zhang H, Wang X, Zhou Y. Predicting DNA-binding protein and coronavirus protein flexibility using protein dihedral angle and sequence feature. Proteins 2023; 91:497-507. [PMID: 36321218 PMCID: PMC9877568 DOI: 10.1002/prot.26443] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Revised: 09/07/2022] [Accepted: 10/20/2022] [Indexed: 11/07/2022]
Abstract
The flexibility of protein structure is related to various biological processes, such as molecular recognition, allosteric regulation, catalytic activity, and protein stability. At the molecular level, protein dynamics and flexibility are important factors to understand protein function. DNA-binding proteins and Coronavirus proteins are of great concern and relatively unique proteins. However, exploring the flexibility of DNA-binding proteins and Coronavirus proteins through experiments or calculations is a difficult process. Since protein dihedral rotational motion can be used to predict protein structural changes, it provides key information about protein local conformation. Therefore, this paper introduces a method to improve the accuracy of protein flexibility prediction, DihProFle (Prediction of DNA-binding proteins and Coronavirus proteins flexibility introduces the calculated dihedral Angle information). Based on protein dihedral Angle information, protein evolution information, and amino acid physical and chemical properties, DihProFle realizes the prediction of protein flexibility in two cases on DNA-binding proteins and Coronavirus proteins, and assigns flexibility class to each protein sequence position. In this study, compared with the flexible prediction using sequence evolution information, and physicochemical properties of amino acids, the flexible prediction accuracy based on protein dihedral Angle information, sequence evolution information and physicochemical properties of amino acids improved by 2.2% and 3.1% in the nonstrict and strict conditions, respectively. And DihProFle achieves better performance than previous methods for protein flexibility analysis. In addition, we further analyzed the correlation of amino acid properties and protein dihedral angles with residues flexibility. The results show that the charged hydrophilic residues have higher proportion in the flexible region, and the rigid region tends to be in the angular range of the protein dihedral angle (such as the ψ angle of amino acid residues is more flexible than rigid in the range of 91°-120°). Therefore, the results indicate that hydrophilic residues and protein dihedral angle information play an important role in protein flexibility.
Collapse
Affiliation(s)
- Wei Wang
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, China.,Key Laboratory of Artificial Intelligence and Personalized Learning in Education of Henan Province, Xinxiang, China
| | - Xili Su
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
| | - Dong Liu
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
| | - Hongjun Zhang
- School of Computer Science and Technology, Anyang University, Anyang, China
| | - Xianfang Wang
- College of Computer Science and Technology Engineering, Henan Institute of Technology, Xinxiang, China
| | - Yun Zhou
- College of Computer and Information Engineering, Henan Normal University, Xinxiang, China
| |
Collapse
|
11
|
de Brevern AG. An agnostic analysis of the human AlphaFold2 proteome using local protein conformations. Biochimie 2023; 207:11-19. [PMID: 36417962 DOI: 10.1016/j.biochi.2022.11.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 10/14/2022] [Accepted: 11/17/2022] [Indexed: 11/21/2022]
Abstract
Knowledge of the 3D structure of proteins is a valuable asset for understanding their precise biological mechanisms. However, the cost of production of 3D structures and experimental difficulties limit their obtaining. The proposal of 3D structural models is consequently an appealing alternative. The release of the AlphaFold Deep Learning approach has revolutionized the field. The recent near-complete human proteome proposal makes it possible to analyse large amounts of data and evaluate the results of the approach in greater depth. The 3D human proteome was thus analysed in light of the classic secondary structures, and many less-used protein local conformations (PolyProline II helices, type of γ-turns, of β-turns and of β-bulges, curvature of the helices, and a structural alphabet). Without questioning the global quality of the approach, this analysis highlights certain local conformations, which maybe poorly predicted and they could therefore be better addressed.
Collapse
Affiliation(s)
- Alexandre G de Brevern
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM UMR_S 1134, BIGR, DSIMB Bioinformatics team, F-75014, Paris, France.
| |
Collapse
|
12
|
Vrignaud C, Mikdar M, Duval R, Reininger L, Damaraju VL, Sawyer M, Colin Y, Le Van Kim C, Gelly JC, Etchebest C, Peyrard T, Azouzi S. Molecular and structural characterization of a novel high-prevalence antigen of the Augustine blood group system. Transfusion 2023; 63:610-618. [PMID: 36744388 DOI: 10.1111/trf.17268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 12/19/2022] [Accepted: 12/27/2022] [Indexed: 02/07/2023]
Abstract
BACKGROUND An antibody directed against a high-prevalence red blood cell (RBC) antigen was detected in a 67-year-old female patient of North African ancestry with a history of a single pregnancy and blood transfusion. So far, the specificity of the proband's alloantibody remained unknown in our immunohematology reference laboratory. STUDY DESIGN AND METHODS Whole-exome sequencing (WES) was performed on the proband's DNA. The reactivity to the SLC29A1-encoded ENT1 adenosine transporter was investigated by flow cytometry analyses of ENT1-expressing HEK293 cells, and RBCs from Augustine-typed individuals. Erythrocyte protein expression level, nucleoside-binding capacity, and molecular structure of the proband's ENT1 variant were further explored by western blot, flow cytometry, and molecular dynamics calculations, respectively. RESULTS A missense variant was identified in the SLC29A1 gene, which encodes the Augustine blood group system. It arises from homozygosity for a rare c.242A > G missense mutation that results in a nonsynonymous p.Asn81Ser substitution within the large extracellular loop of ENT1. Flow cytometry analyses demonstrated that the proband's antibody was reactive against HEK-293 cells transfected with control but not proband's SLC29A1 cDNA. Consistent with this finding, proband's antibody was found to be reactive with At(a-) (AUG:-2), but not AUG:-1 (null phenotype) RBCs. Data from structural analysis further supported that the proband's p.Asn81Ser variation does not alter ENT1 binding of its specific inhibitor NBMPR. CONCLUSION Our study provides evidence for a novel high-prevalence antigen, AUG4 (also called ATAM after the proband's name) in the Augustine blood group system, encoded by the rare SLC29A1 variant allele AUG*04 (c.242A > G, p.Asn81Ser).
Collapse
Affiliation(s)
| | | | - Romain Duval
- Université de Paris Cité, Inserm, BIGR, Paris, France
- Centre National de Référence pour les Groupes Sanguins, Établissement Français de Sang (EFS), Paris, France
| | - Luc Reininger
- Université de Paris Cité, Inserm, BIGR, Paris, France
| | - Vijaya L Damaraju
- Department of Oncology, University of Alberta, Edmonton, Alberta, Canada
| | | | - Yves Colin
- Université de Paris Cité, Inserm, BIGR, Paris, France
| | | | | | | | - Thierry Peyrard
- Université de Paris Cité, Inserm, BIGR, Paris, France
- Centre National de Référence pour les Groupes Sanguins, Établissement Français de Sang (EFS), Paris, France
| | - Slim Azouzi
- Université de Paris Cité, Inserm, BIGR, Paris, France
- Centre National de Référence pour les Groupes Sanguins, Établissement Français de Sang (EFS), Paris, France
| |
Collapse
|
13
|
The Conformation of the Intrinsically Disordered N-Terminal Region of Barrier-to-Autointegration Factor (BAF) is Regulated by pH and Phosphorylation. J Mol Biol 2023; 435:167888. [PMID: 36402223 DOI: 10.1016/j.jmb.2022.167888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 11/09/2022] [Accepted: 11/09/2022] [Indexed: 11/18/2022]
Abstract
Barrier-to-Autointegration Factor (BAF) is a highly conserved DNA binding protein important for genome integrity. Its localization and function are regulated through phosphorylation. Previously reported structures of BAF suggested that it is fully ordered, but our recent NMR analysis revealed that its N-terminal region is flexible in solution and that S4/T3 di-phosphorylation by VRK1 reduces this flexibility. Here, molecular dynamics (MD) simulation was used to unveil the conformational ensembles accessible to the N-terminal region of BAF either unphosphorylated, mono-phosphorylated on S4 or di-phosphorylated on S4/T3 (pBAF) and to reveal the interactions that contribute to define these ensembles. We show that the intrinsic flexibility observed in the N-terminal region of BAF is reduced by S4 phosphorylation and to a larger extent by S4/T3 di-phosphorylation. Thanks to the atomic description offered by MD supported by the NMR study of several BAF mutants, we identified the dynamic network of salt bridge interactions responsible for the conformational restriction involving pS4 and pT3 with residues located in helix α1 and α6. Using MD, we showed that the flexibility in the N-terminal region of BAF depends on the ionic strength and on the pH. We show that the presence of two negative charges of the phosphoryl groups is required for a substantial decrease in flexibility in pBAF. Using MD supported by NMR, we also showed that H7 deprotonation reduces the flexibility in the N-terminal region of BAF. Thus, the conformation of the intrinsically disordered N-terminal region of BAF is highly tunable, likely related to its diverse functions.
Collapse
|
14
|
Ali S, Ali U, Qamar A, Zafar I, Yaqoob M, Ain QU, Rashid S, Sharma R, Nafidi HA, Bin Jardan YA, Bourhia M. Predicting the effects of rare genetic variants on oncogenic signaling pathways: A computational analysis of HRAS protein function. Front Chem 2023; 11:1173624. [PMID: 37153521 PMCID: PMC10160440 DOI: 10.3389/fchem.2023.1173624] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Accepted: 04/10/2023] [Indexed: 05/09/2023] Open
Abstract
The HRAS gene plays a crucial role in regulating essential cellular processes for life, and this gene's misregulation is linked to the development of various types of cancers. Nonsynonymous single nucleotide polymorphisms (nsSNPs) within the coding region of HRAS can cause detrimental mutations that disrupt wild-type protein function. In the current investigation, we have employed in-silico methodologies to anticipate the consequences of infrequent genetic variations on the functional properties of the HRAS protein. We have discovered a total of 50 nsSNPs, of which 23 were located in the exon region of the HRAS gene and denoting that they were expected to cause harm or be deleterious. Out of these 23, 10 nsSNPs ([G60V], [G60D], [R123P], [D38H], [I46T], [G115R], [R123G], [P11OL], [A59L], and [G13R]) were identified as having the most delterious effect based on results of SIFT analysis and PolyPhen2 scores ranging from 0.53 to 69. The DDG values -3.21 kcal/mol to 0.87 kcal/mol represent the free energy change associated with protein stability upon mutation. Interestingly, we identified that the three mutations (Y4C, T58I, and Y12E) were found to improve the structural stability of the protein. We performed molecular dynamics (MD) simulations to investigate the structural and dynamic effects of HRAS mutations. Our results showed that the stable model of HRAS had a significantly lower energy value of -18756 kj/mol compared to the initial model of -108915 kj/mol. The RMSD value for the wild-type complex was 4.40 Å, and the binding energies for the G60V, G60D, and D38H mutants were -107.09 kcal/mol, -109.42 kcal/mol, and -107.18 kcal/mol, respectively as compared to wild-type HRAS protein had -105.85 kcal/mol. The result of our investigation presents convincing corroboration for the potential functional significance of nsSNPs in augmenting HRAS expression and adding to the activation of malignant oncogenic signalling pathways.
Collapse
Affiliation(s)
- Sadaqat Ali
- Medical Department, DHQ Hospital Bhawalnagr, Punjab, Pakistan
| | | | - Adeem Qamar
- Department of Pathology, Sahiwal Medical College Sahiwal, Punjab, Pakistan
| | - Imran Zafar
- Department of Bioinformatics and Computational Biology, Virtual University of Pakistan, Punjab, Pakistan
| | - Muhammad Yaqoob
- Department of Life Sciences, ARID University-Barani Institute of Sciences Burewala Campus, Punjab, Pakistan
| | - Qurat ul Ain
- Department of Chemistry, Government College Women University, Faisalabad, Pakistan
| | - Summya Rashid
- Department of Bioinformatics and Computational Biology, Virtual University of Pakistan, Punjab, Pakistan
| | - Rohit Sharma
- Department of Rasa Shastra and Bhaishajya Kalpana, Faculty of Ayurveda, Institute of Medical Sciences, Banaras Hindu University, Varanasi, Uttar Pradesh, India
- *Correspondence: Mohammed Bourhia, ; Rohit Sharma,
| | - Hiba-Allah Nafidi
- Department of Food Science, Faculty of Agricultural and Food Sciences, Laval University, Quebec City, QC, Canada
| | - Yousef A. Bin Jardan
- Department of Pharmaceutics, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Mohammed Bourhia
- Laboratory of Chemistry and Biochemistry, Faculty of Medicine and Pharmacy, Ibn Zohr University, Agadir, Morocco
- *Correspondence: Mohammed Bourhia, ; Rohit Sharma,
| |
Collapse
|
15
|
Turchetti B, Buzzini P, Baeza M. A genomic approach to analyze the cold adaptation of yeasts isolated from Italian Alps. Front Microbiol 2022; 13:1026102. [DOI: 10.3389/fmicb.2022.1026102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 10/07/2022] [Indexed: 11/11/2022] Open
Abstract
Microorganisms including yeasts are responsible for mineralization of organic matter in cold regions, and their characterization is critical to elucidate the ecology of such environments on Earth. Strategies developed by yeasts to survive in cold environments have been increasingly studied in the last years and applied to different biotechnological applications, but their knowledge is still limited. Microbial adaptations to cold include the synthesis of cryoprotective compounds, as well as the presence of a high number of genes encoding the synthesis of proteins/enzymes characterized by a reduced proline content and highly flexible and large catalytic active sites. This study is a comparative genomic study on the adaptations of yeasts isolated from the Italian Alps, considering their growth kinetics. The optimal temperature for growth (OTG), growth rate (Gr), and draft genome sizes considerably varied (OTG, 10°C–20°C; Gr, 0.071–0.0726; genomes, 20.7–21.5 Mpb; %GC, 50.9–61.5). A direct relationship was observed between calculated protein flexibilities and OTG, but not for Gr. Putative genes encoding for cold stress response were found, as well as high numbers of genes encoding for general, oxidative, and osmotic stresses. The cold response genes found in the studied yeasts play roles in cell membrane adaptation, compatible solute accumulation, RNA structure changes, and protein folding, i.e., dihydrolipoamide dehydrogenase, glycogen synthase, omega-6 fatty acid, stearoyl-CoA desaturase, ATP-dependent RNA helicase, and elongation of very-long-chain fatty acids. A redundancy for several putative genes was found, higher for P-loop containing nucleoside triphosphate hydrolase, alpha/beta hydrolase, armadillo repeat-containing proteins, and the major facilitator superfamily protein. Hundreds of thousands of small open reading frames (SmORFs) were found in all studied yeasts, especially in Phenoliferia glacialis. Gene clusters encoding for the synthesis of secondary metabolites such as terpene, non-ribosomal peptide, and type III polyketide were predicted in four, three, and two studied yeasts, respectively.
Collapse
|
16
|
Bon C, Cabantous S, Julien S, Guillet V, Chalut C, Rima J, Brison Y, Malaga W, Sanchez-Dafun A, Gavalda S, Quémard A, Marcoux J, Waldo GS, Guilhot C, Mourey L. Solution structure of the type I polyketide synthase Pks13 from Mycobacterium tuberculosis. BMC Biol 2022; 20:147. [PMID: 35729566 PMCID: PMC9210659 DOI: 10.1186/s12915-022-01337-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Accepted: 05/25/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Type I polyketide synthases (PKSs) are multifunctional enzymes responsible for the biosynthesis of a group of diverse natural compounds with biotechnological and pharmaceutical interest called polyketides. The diversity of polyketides is impressive despite the limited set of catalytic domains used by PKSs for biosynthesis, leading to considerable interest in deciphering their structure-function relationships, which is challenging due to high intrinsic flexibility. Among nineteen polyketide synthases encoded by the genome of Mycobacterium tuberculosis, Pks13 is the condensase required for the final condensation step of two long acyl chains in the biosynthetic pathway of mycolic acids, essential components of the cell envelope of Corynebacterineae species. It has been validated as a promising druggable target and knowledge of its structure is essential to speed up drug discovery to fight against tuberculosis. RESULTS We report here a quasi-atomic model of Pks13 obtained using small-angle X-ray scattering of the entire protein and various molecular subspecies combined with known high-resolution structures of Pks13 domains or structural homologues. As a comparison, the low-resolution structures of two other mycobacterial polyketide synthases, Mas and PpsA from Mycobacterium bovis BCG, are also presented. This study highlights a monomeric and elongated state of the enzyme with the apo- and holo-forms being identical at the resolution probed. Catalytic domains are segregated into two parts, which correspond to the condensation reaction per se and to the release of the product, a pivot for the enzyme flexibility being at the interface. The two acyl carrier protein domains are found at opposite sides of the ketosynthase domain and display distinct characteristics in terms of flexibility. CONCLUSIONS The Pks13 model reported here provides the first structural information on the molecular mechanism of this complex enzyme and opens up new perspectives to develop inhibitors that target the interactions with its enzymatic partners or between catalytic domains within Pks13 itself.
Collapse
Affiliation(s)
- Cécile Bon
- Institut de Pharmacologie et de Biologie Structurale, IPBS, Université de Toulouse, CNRS, UPS, Toulouse, France.
| | - Stéphanie Cabantous
- Institut de Pharmacologie et de Biologie Structurale, IPBS, Université de Toulouse, CNRS, UPS, Toulouse, France.,Los Alamos National Laboratory, Bioscience Division B-N2, Los Alamos, NM, 87545, USA.,Present address: Centre de Recherche en Cancérologie de Toulouse (CRCT), Inserm, Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Sylviane Julien
- Institut de Pharmacologie et de Biologie Structurale, IPBS, Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Valérie Guillet
- Institut de Pharmacologie et de Biologie Structurale, IPBS, Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Christian Chalut
- Institut de Pharmacologie et de Biologie Structurale, IPBS, Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Julie Rima
- Institut de Pharmacologie et de Biologie Structurale, IPBS, Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Yoann Brison
- Institut de Pharmacologie et de Biologie Structurale, IPBS, Université de Toulouse, CNRS, UPS, Toulouse, France.,Present address: Toulouse White Biotechnology, 31400, Toulouse, France
| | - Wladimir Malaga
- Institut de Pharmacologie et de Biologie Structurale, IPBS, Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Angelique Sanchez-Dafun
- Institut de Pharmacologie et de Biologie Structurale, IPBS, Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Sabine Gavalda
- Institut de Pharmacologie et de Biologie Structurale, IPBS, Université de Toulouse, CNRS, UPS, Toulouse, France.,Present address: Carbios, Biopole Clermont Limagne, 63360, Saint-Beauzire, France
| | - Annaïk Quémard
- Institut de Pharmacologie et de Biologie Structurale, IPBS, Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Julien Marcoux
- Institut de Pharmacologie et de Biologie Structurale, IPBS, Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Geoffrey S Waldo
- Los Alamos National Laboratory, Bioscience Division B-N2, Los Alamos, NM, 87545, USA
| | - Christophe Guilhot
- Institut de Pharmacologie et de Biologie Structurale, IPBS, Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Lionel Mourey
- Institut de Pharmacologie et de Biologie Structurale, IPBS, Université de Toulouse, CNRS, UPS, Toulouse, France.
| |
Collapse
|
17
|
Casadio R, Savojardo C, Fariselli P, Capriotti E, Martelli PL. Turning Failures into Applications: The Problem of Protein ΔΔG Prediction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2449:169-185. [PMID: 35507262 DOI: 10.1007/978-1-0716-2095-3_6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
After nearly two decades of research in the field of computational methods based on machine learning and knowledge-based potentials for ΔG and ΔΔG prediction upon variations, we now realize that all the approaches are poorly performing when tested on specific cases and that there is large space for improvement. Why this is so? Is it wrong the underlying assumption that experimental protein thermodynamics in solution reflects the thermodynamics of a single protein? Both machine learning and knowledge-based computational methods are rigorous and we know the solid theory behind. We are now in a critical situation, which suggests that predictions of protein instability upon variation should be considered with care. In the following, we will show how to cope with the problem of understanding which protein positions may be of interest for biotechnological and biomedical purposes. By applying a consensus procedure, we indicate possible strategies for the result interpretation.
Collapse
Affiliation(s)
- Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy.
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Turin, Italy
| | - Emidio Capriotti
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| |
Collapse
|
18
|
V HH Structural Modelling Approaches: A Critical Review. Int J Mol Sci 2022; 23:ijms23073721. [PMID: 35409081 PMCID: PMC8998791 DOI: 10.3390/ijms23073721] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 03/23/2022] [Accepted: 03/23/2022] [Indexed: 12/20/2022] Open
Abstract
VHH, i.e., VH domains of camelid single-chain antibodies, are very promising therapeutic agents due to their significant physicochemical advantages compared to classical mammalian antibodies. The number of experimentally solved VHH structures has significantly improved recently, which is of great help, because it offers the ability to directly work on 3D structures to humanise or improve them. Unfortunately, most VHHs do not have 3D structures. Thus, it is essential to find alternative ways to get structural information. The methods of structure prediction from the primary amino acid sequence appear essential to bypass this limitation. This review presents the most extensive overview of structure prediction methods applied for the 3D modelling of a given VHH sequence (a total of 21). Besides the historical overview, it aims at showing how model software programs have been shaping the structural predictions of VHHs. A brief explanation of each methodology is supplied, and pertinent examples of their usage are provided. Finally, we present a structure prediction case study of a recently solved VHH structure. According to some recent studies and the present analysis, AlphaFold 2 and NanoNet appear to be the best tools to predict a structural model of VHH from its sequence.
Collapse
|
19
|
Baeza M, Zúñiga S, Peragallo V, Gutierrez F, Barahona S, Alcaino J, Cifuentes V. Response to Cold: A Comparative Transcriptomic Analysis in Eight Cold-Adapted Yeasts. Front Microbiol 2022; 13:828536. [PMID: 35283858 PMCID: PMC8905146 DOI: 10.3389/fmicb.2022.828536] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 01/19/2022] [Indexed: 02/03/2023] Open
Abstract
Microorganisms have evolved to colonize all biospheres, including extremely cold environments, facing several stressor conditions, mainly low/freezing temperatures. In general, terms, the strategies developed by cold-adapted microorganisms include the synthesis of cryoprotectant and stress-protectant molecules, cold-active proteins, especially enzymes, and membrane fluidity regulation. The strategy could differ among microorganisms and concerns the characteristics of the cold environment of the microorganism, such as seasonal temperature changes. Microorganisms can develop strategies to grow efficiently at low temperatures or tolerate them and grow under favorable conditions. These differences can be found among the same kind of microorganisms and from the same cold habitat. In this work, eight cold-adapted yeasts isolated from King George Island, subAntarctic region, which differ in their growth properties, were studied about their response to low temperatures at the transcriptomic level. Sixteen ORFeomes were assembled and used for gene prediction and functional annotation, determination of gene expression changes, protein flexibilities of translated genes, and codon usage bias. Putative genes related to the response to all main kinds of stress were found. The total number of differentially expressed genes was related to the temperature variation that each yeast faced. The findings from multiple comparative analyses among yeasts based on gene expression changes and protein flexibility by cellular functions and codon usage bias raise significant differences in response to cold among the studied Antarctic yeasts. The way a yeast responds to temperature change appears to be more related to its optimal temperature for growth (OTG) than growth velocity. Yeasts with higher OTG prepare to downregulate their metabolism to enter the dormancy stage. In comparison, yeasts with lower OTG perform minor adjustments to make their metabolism adequate and maintain their growth at lower temperatures.
Collapse
Affiliation(s)
- Marcelo Baeza
- Departamento de Ciencias Ecológicas, Facultad de Ciencias, Universidad de Chile, Santiago, Chile.,Centro de Biotecnología, Facultad de Ciencias, Universidad de Chile, Santiago, Chile
| | - Sergio Zúñiga
- Departamento de Ciencias Ecológicas, Facultad de Ciencias, Universidad de Chile, Santiago, Chile
| | - Vicente Peragallo
- Departamento de Ciencias Ecológicas, Facultad de Ciencias, Universidad de Chile, Santiago, Chile
| | - Fernando Gutierrez
- Departamento de Ciencias Ecológicas, Facultad de Ciencias, Universidad de Chile, Santiago, Chile
| | - Salvador Barahona
- Centro de Biotecnología, Facultad de Ciencias, Universidad de Chile, Santiago, Chile
| | - Jennifer Alcaino
- Departamento de Ciencias Ecológicas, Facultad de Ciencias, Universidad de Chile, Santiago, Chile.,Centro de Biotecnología, Facultad de Ciencias, Universidad de Chile, Santiago, Chile
| | - Víctor Cifuentes
- Departamento de Ciencias Ecológicas, Facultad de Ciencias, Universidad de Chile, Santiago, Chile.,Centro de Biotecnología, Facultad de Ciencias, Universidad de Chile, Santiago, Chile
| |
Collapse
|
20
|
An in-frame deletion mutation in the degron tail of auxin coreceptor IAA2 confers resistance to the herbicide 2,4-D in Sisymbrium orientale. Proc Natl Acad Sci U S A 2022; 119:2105819119. [PMID: 35217601 PMCID: PMC8892348 DOI: 10.1073/pnas.2105819119] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/09/2021] [Indexed: 12/13/2022] Open
Abstract
Synthetic auxin herbicides intersect basic plant developmental biology and applied weed management. We investigated resistance to 2,4-D in the Australian weed Sisymbrium orientale (Indian hedge mustard). We identified a mechanism involving an in-frame 27-bp deletion in the degron tail of auxin coreceptor IAA2, one member of the gene family of Aux/IAA auxin co-receptors. We show that this deletion in IAA2 is a gain-of-function mutation that confers synthetic auxin resistance. This field-evolved mechanism of resistance to synthetic auxin herbicides confirms previous biochemical studies showing the role of the Aux/IAA degron tail in regulating Aux/IAA protein degradation upon auxin perception. The deletion mutation could be generated in crops using gene-editing approaches for cross-resistance to multiple synthetic auxin herbicides. The natural auxin indole-3-acetic acid (IAA) is a key regulator of many aspects of plant growth and development. Synthetic auxin herbicides such as 2,4-D mimic the effects of IAA by inducing strong auxinic-signaling responses in plants. To determine the mechanism of 2,4-D resistance in a Sisymbrium orientale (Indian hedge mustard) weed population, we performed a transcriptome analysis of 2,4-D-resistant (R) and -susceptible (S) genotypes that revealed an in-frame 27-nucleotide deletion removing nine amino acids in the degron tail (DT) of the auxin coreceptor Aux/IAA2 (SoIAA2). The deletion allele cosegregated with 2,4-D resistance in recombinant inbred lines. Further, this deletion was also detected in several 2,4-D-resistant field populations of this species. Arabidopsis transgenic lines expressing the SoIAA2 mutant allele were resistant to 2,4-D and dicamba. The IAA2-DT deletion reduced binding to TIR1 in vitro with both natural and synthetic auxins, causing reduced association and increased dissociation rates. This mechanism of synthetic auxin herbicide resistance assigns an in planta function to the DT region of this Aux/IAA coreceptor for its role in synthetic auxin binding kinetics and reveals a potential biotechnological approach to produce synthetic auxin-resistant crops using gene-editing.
Collapse
|
21
|
Zan ZY, Ge XF, Huang RR, Liu WZ. Pseudonocardia humida sp. nov., an Actinomycete Isolated from Mangrove Soil Showing Distinct Distribution Pattern of Biosynthetic Gene Clusters. Curr Microbiol 2022; 79:87. [PMID: 35129703 DOI: 10.1007/s00284-022-02784-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 01/24/2022] [Indexed: 11/03/2022]
Abstract
A novel actinomycete strain, designated S2-4T, was isolated from a mangrove soil sample, and a polyphasic approach was employed to determine its taxonomic position. Phylogenetic analysis based on 16S rRNA gene indicated that strain S2-4T formed a unique clade next to that harboring Pseudonocardia dioxanivorans CB1190T, which shared the highest sequence similarity (98.20%) with the new isolate. Phylogenetic analysis based on core genes of genomic sequences displayed a different scenario, exhibiting closer phylogenetic relationship of strain S2-4T with several species with 16S rRNA gene sequence similarities ranging from 96.95 to 98.06%, which was confirmed by the phylogenetic tree reconstructed based on genomic sequences. Further, substantial differences between the genotypic properties of strain S2-4T and its closest neighbors, including digital DNA-DNA hybridization, average nucleotide identity, and distribution patterns of biosynthetic gene clusters (BGC), indicated the taxonomic position of strain S2-4T as a novel species of the genus Pseudonocardia. Accordingly, strain S2-4T was observed to show a different distribution pattern of a predicted BGC encoding ectoine by comparative genomic analysis, which could be strongly linked to its unique habitat distinct from where its close neighbors were isolated. The major cellular fatty acids were iso-C15:0, C21:0, and iso-C16:0. The predominant menaquinone was MK-8(H4). The polar lipids were composed of diphosphatidylglycerol, phosphatidylethanolamine, phosphatidylglycerol, phosphatidyl-N-monomethylethanolamine, phosphatidylcholine, phosphatidylinositol, phosphatidylinositol mannosides, and two unidentified glycolipids. Here, we propose a novel species of the genus Pseudonocardia: Pseudonocardia humida sp. nov. with the type strain S2-4T (= JCM 34291T = CGMCC 4.7706T).
Collapse
Affiliation(s)
- Zhen-Yu Zan
- School of Food and Pharmaceutical Engineering, Nanjing Normal University, Xuelin Road No. 2, Nanjing, 210023, People's Republic of China
| | - Xian-Feng Ge
- School of Food and Pharmaceutical Engineering, Nanjing Normal University, Xuelin Road No. 2, Nanjing, 210023, People's Republic of China
| | - Rui-Rui Huang
- School of Food and Pharmaceutical Engineering, Nanjing Normal University, Xuelin Road No. 2, Nanjing, 210023, People's Republic of China
| | - Wen-Zheng Liu
- School of Food and Pharmaceutical Engineering, Nanjing Normal University, Xuelin Road No. 2, Nanjing, 210023, People's Republic of China.
| |
Collapse
|
22
|
Carugo O. Uses and Abuses of the Atomic Displacement Parameters in Structural Biology. Methods Mol Biol 2022; 2449:281-298. [PMID: 35507268 DOI: 10.1007/978-1-0716-2095-3_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
B-factors determined with X-ray crystallographic analyses are commonly used to estimate the flexibility degree of atoms, residues, and molecular moieties in biological macromolecules. In this chapter, the most recent studies and applications of B-factors in protein engineering and structural biology are briefly summarized. Particular emphasis is given to the limitations in using B-factors, in order to prevent inappropriate applications. It is eventually predicted that future applications will involve anisotropically refined B-factors, deep learning, and data produced by cryo-EM.
Collapse
|
23
|
Manfredi M, Savojardo C, Martelli PL, Casadio R. DeepREx-WS: A web server for characterising protein-solvent interaction starting from sequence. Comput Struct Biotechnol J 2021; 19:5791-5799. [PMID: 34765094 PMCID: PMC8566768 DOI: 10.1016/j.csbj.2021.10.016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 10/07/2021] [Accepted: 10/07/2021] [Indexed: 11/23/2022] Open
Abstract
Protein–solvent interaction provides important features for protein surface engineering when the structure is absent or partially solved. Presently, we can integrate the notion of solvent exposed/buried residues with that of their flexibility and intrinsic disorder to highlight regions where mutations may increase or decrease protein stability in order to modify proteins for biotechnological reasons, while preserving their functional integrity. Here we describe a web server, which provides the unique possibility of integrating knowledge of solvent and non-solvent exposure with that of residue conservation, flexibility and disorder of a protein sequence, for a better understanding of which regions are relevant for protein integrity. The core of the webserver is DeepREx, a novel deep learning-based tool that classifies each residue in the sequence as buried or exposed. DeepREx is trained on a high-quality, non-redundant dataset derived from the Protein Data Bank comprising 2332 monomeric protein chains and benchmarked on a blind test set including 200 protein sequences unrelated with the training set. Results show that DeepREx performs at the state-of-the-art in the field. In turn, the Web Server, DeepREx-WS, supplements the predictions of DeepREx with features that allow a better characterisation of exposed and buried regions: i) residue conservation derived from multiple sequence alignment; ii) local sequence hydrophobicity; iii) residue flexibility computed with MEDUSA; iv) a predictor of secondary structure; v) the presence of disordered regions as derived from MobiDB-Lite3.0. The web server allows browsing, selecting and intersecting the different features. We demonstrate a possible application of the DeepREx-WS for assisting the identification of residues to be variated in protein surface engineering processes.
Collapse
Affiliation(s)
- Matteo Manfredi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
- Corresponding author.
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Italian National Research Council (CNR), Bari, Italy
| |
Collapse
|
24
|
Discovering the Ultimate Limits of Protein Secondary Structure Prediction. Biomolecules 2021; 11:biom11111627. [PMID: 34827624 PMCID: PMC8615938 DOI: 10.3390/biom11111627] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 10/25/2021] [Accepted: 10/28/2021] [Indexed: 12/29/2022] Open
Abstract
Secondary structure prediction (SSP) of proteins is an important structural biology technique with many applications. There have been ~300 algorithms published in the past seven decades with fierce competition in accuracy. In the first 60 years, the accuracy of three-state SSP rose from ~56% to 81%; after that, it has long stayed at 81–86%. In the 1990s, the theoretical limit of three-state SSP accuracy had been estimated to be 88%. Thus, SSP is now generally considered not challenging or too challenging to improve. However, we found that the limit of three-state SSP might be underestimated. Besides, there is still much room for improving segment-based and eight-state SSPs, but the limits of these emerging topics have not been determined. This work performs large-scale sequence and structural analyses to estimate SSP accuracy limits and assess state-of-the-art SSP methods. The limit of three-state SSP is re-estimated to be ~92%, 4–5% higher than previously expected, indicating that SSP is still challenging. The estimated limit of eight-state SSP is 84–87%. Several proposals for improving future SSP algorithms are made based on our results. We hope that these findings will help move forward the development of SSP and all its applications.
Collapse
|
25
|
Cretin G, Galochkina T, de Brevern AG, Gelly JC. PYTHIA: Deep Learning Approach for Local Protein Conformation Prediction. Int J Mol Sci 2021; 22:ijms22168831. [PMID: 34445537 PMCID: PMC8396346 DOI: 10.3390/ijms22168831] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 08/09/2021] [Accepted: 08/10/2021] [Indexed: 02/07/2023] Open
Abstract
Protein Blocks (PBs) are a widely used structural alphabet describing local protein backbone conformation in terms of 16 possible conformational states, adopted by five consecutive amino acids. The representation of complex protein 3D structures as 1D PB sequences was previously successfully applied to protein structure alignment and protein structure prediction. In the current study, we present a new model, PYTHIA (predicting any conformation at high accuracy), for the prediction of the protein local conformations in terms of PBs directly from the amino acid sequence. PYTHIA is based on a deep residual inception-inside-inception neural network with convolutional block attention modules, predicting 1 of 16 PB classes from evolutionary information combined to physicochemical properties of individual amino acids. PYTHIA clearly outperforms the LOCUSTRA reference method for all PB classes and demonstrates great performance for PB prediction on particularly challenging proteins from the CASP14 free modelling category.
Collapse
Affiliation(s)
- Gabriel Cretin
- Biologie Intégrée du Globule Rouge, Université de Paris, UMR_S1134, BIGR, INSERM, 75015 Paris, France; (G.C.); (T.G.); (A.G.d.B.)
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
| | - Tatiana Galochkina
- Biologie Intégrée du Globule Rouge, Université de Paris, UMR_S1134, BIGR, INSERM, 75015 Paris, France; (G.C.); (T.G.); (A.G.d.B.)
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
| | - Alexandre G. de Brevern
- Biologie Intégrée du Globule Rouge, Université de Paris, UMR_S1134, BIGR, INSERM, 75015 Paris, France; (G.C.); (T.G.); (A.G.d.B.)
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
| | - Jean-Christophe Gelly
- Biologie Intégrée du Globule Rouge, Université de Paris, UMR_S1134, BIGR, INSERM, 75015 Paris, France; (G.C.); (T.G.); (A.G.d.B.)
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
- Correspondence:
| |
Collapse
|
26
|
Casadio R, Lenhard B, Sternberg MJE. Computational Resources for Molecular Biology 2021. J Mol Biol 2021; 433:166962. [PMID: 33774035 DOI: 10.1016/j.jmb.2021.166962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Rita Casadio
- Biocomputing Group, FABIT-University of Bologna, Italy
| | - Boris Lenhard
- Institute of Clinical Sciences, Faculty of Medicine. Imperial College London, Hammersmith Campus, Du Cane Road, London W12 0NN, UK; Computational Regulatory Genomics, MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK
| | - Michael J E Sternberg
- Structural Bioinformatics Group, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK.
| |
Collapse
|