1
|
Li C, Luo Y, Xie Y, Zhang Z, Liu Y, Zou L, Xiao F. Structural and functional prediction, evaluation, and validation in the post-sequencing era. Comput Struct Biotechnol J 2024; 23:446-451. [PMID: 38223342 PMCID: PMC10787220 DOI: 10.1016/j.csbj.2023.12.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 12/20/2023] [Accepted: 12/22/2023] [Indexed: 01/16/2024] Open
Abstract
The surge of genome sequencing data has underlined substantial genetic variants of uncertain significance (VUS). The decryption of VUS discovered by sequencing poses a major challenge in the post-sequencing era. Although experimental assays have progressed in classifying VUS, only a tiny fraction of the human genes have been explored experimentally. Thus, it is urgently needed to generate state-of-the-art functional predictors of VUS in silico. Artificial intelligence (AI) is an invaluable tool to assist in the identification of VUS with high efficiency and accuracy. An increasing number of studies indicate that AI has brought an exciting acceleration in the interpretation of VUS, and our group has already used AI to develop protein structure-based prediction models. In this review, we provide an overview of the previous research on AI-based prediction of missense variants, and elucidate the challenges and opportunities for protein structure-based variant prediction in the post-sequencing era.
Collapse
Affiliation(s)
- Chang Li
- Clinical Biobank, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
- The Key Laboratory of Geriatrics, Beijing Institute of Geriatrics, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Yixuan Luo
- Beijing Normal University, Beijing, China
| | - Yibo Xie
- Information Center, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Zaifeng Zhang
- The Key Laboratory of Geriatrics, Beijing Institute of Geriatrics, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Ye Liu
- The Key Laboratory of Geriatrics, Beijing Institute of Geriatrics, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Lihui Zou
- The Key Laboratory of Geriatrics, Beijing Institute of Geriatrics, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
| | - Fei Xiao
- Clinical Biobank, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
- The Key Laboratory of Geriatrics, Beijing Institute of Geriatrics, Beijing Hospital, National Center of Gerontology, National Health Commission, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing, China
- Beijing Normal University, Beijing, China
| |
Collapse
|
2
|
Manav N, Jit BP, Kataria B, Sharma A. Cellular and epigenetic perspective of protein stability and its implications in the biological system. Epigenomics 2024:1-22. [PMID: 38884355 DOI: 10.1080/17501911.2024.2351788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Accepted: 04/30/2024] [Indexed: 06/18/2024] Open
Abstract
Protein stability is a fundamental prerequisite in both experimental and therapeutic applications. Current advancements in high throughput experimental techniques and functional ontology approaches have elucidated that impairment in the structure and stability of proteins is intricately associated with the cause and cure of several diseases. Therefore, it is paramount to deeply understand the physical and molecular confounding factors governing the stability of proteins. In this review article, we comprehensively investigated the evolution of protein stability, examining its emergence over time, its relationship with organizational aspects and the experimental methods used to understand it. Furthermore, we have also emphasized the role of Epigenetics and its interplay with post-translational modifications (PTMs) in regulating the stability of proteins.
Collapse
Affiliation(s)
- Nisha Manav
- Department of Biochemistry, All India Institute of Medical Sciences New Delhi, Ansari Nagar, 110029, India
| | - Bimal Prasad Jit
- Department of Biochemistry, All India Institute of Medical Sciences New Delhi, Ansari Nagar, 110029, India
| | - Babita Kataria
- Department of Medical Oncology, National Cancer Institute, All India Institute of Medical Sciences, Jhajjar, 124105, India
| | - Ashok Sharma
- Department of Biochemistry, All India Institute of Medical Sciences New Delhi, Ansari Nagar, 110029, India
- Department of Biochemistry, National Cancer Institute, All India Institute of Medical Sciences, Jhajjar, 124105, India
| |
Collapse
|
3
|
Qiu Y, Huang T, Cai YD. Review of predicting protein stability changes upon variations. Proteomics 2024; 24:e2300371. [PMID: 38643379 DOI: 10.1002/pmic.202300371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 04/07/2024] [Accepted: 04/08/2024] [Indexed: 04/22/2024]
Abstract
Forecasting alterations in protein stability caused by variations holds immense importance. Improving the thermal stability of proteins is important for biomedical and industrial applications. This review discusses the latest methods for predicting the effects of mutations on protein stability, databases containing protein mutations and thermodynamic parameters, and experimental techniques for efficiently assessing protein stability in high-throughput settings. Various publicly available databases for protein stability prediction are introduced. Furthermore, state-of-the-art computational approaches for anticipating protein stability changes due to variants are reviewed. Each method's types of features, base algorithm, and prediction results are also detailed. Additionally, some experimental approaches for verifying the prediction results of computational methods are introduced. Finally, the review summarizes the progress and challenges of protein stability prediction and discusses potential models for future research directions.
Collapse
Affiliation(s)
- Yiling Qiu
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- School of Mathematics and Statistics, Guangdong University of Technology, Guangzhou, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
4
|
Li G, Yao S, Fan L. ProSTAGE: Predicting Effects of Mutations on Protein Stability by Using Protein Embeddings and Graph Convolutional Networks. J Chem Inf Model 2024; 64:340-347. [PMID: 38166383 PMCID: PMC10806799 DOI: 10.1021/acs.jcim.3c01697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 12/11/2023] [Accepted: 12/12/2023] [Indexed: 01/04/2024]
Abstract
Protein thermodynamic stability is essential to clarify the relationships among structure, function, and interaction. Therefore, developing a faster and more accurate method to predict the impact of the mutations on protein stability is helpful for protein design and understanding the phenotypic variation. Recent studies have shown that protein embedding will be particularly powerful at modeling sequence information with context dependence, such as subcellular localization, variant effect, and secondary structure prediction. Herein, we introduce a novel method, ProSTAGE, which is a deep learning method that fuses structure and sequence embedding to predict protein stability changes upon single point mutations. Our model combines graph-based techniques and language models to predict stability changes. Moreover, ProSTAGE is trained on a larger data set, which is almost twice as large as the most used S2648 data set. It consistently outperforms all existing state-of-the-art methods on mutation-affected problems as benchmarked on several independent data sets. The protein embedding as the prediction input achieves better results than the previous results, which shows the potential of protein language models in predicting the effect of mutations on proteins. ProSTAGE is implemented as a user-friendly web server.
Collapse
Affiliation(s)
- Gen Li
- Production and R&D Center
I of LSS, GenScript (Shanghai) Biotech Co.,
Ltd., Shanghai 200131, China
| | - Sijie Yao
- Production and R&D Center
I of LSS, GenScript (Shanghai) Biotech Co.,
Ltd., Shanghai 200131, China
| | - Long Fan
- Production and R&D Center
I of LSS, GenScript (Shanghai) Biotech Co.,
Ltd., Shanghai 200131, China
| |
Collapse
|
5
|
Zheng F, Liu Y, Yang Y, Wen Y, Li M. Assessing computational tools for predicting protein stability changes upon missense mutations using a new dataset. Protein Sci 2024; 33:e4861. [PMID: 38084013 PMCID: PMC10751734 DOI: 10.1002/pro.4861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 11/14/2023] [Accepted: 12/06/2023] [Indexed: 12/28/2023]
Abstract
Insight into how mutations affect protein stability is crucial for protein engineering, understanding genetic diseases, and exploring protein evolution. Numerous computational methods have been developed to predict the impact of amino acid substitutions on protein stability. Nevertheless, comparing these methods poses challenges due to variations in their training data. Moreover, it is observed that they tend to perform better at predicting destabilizing mutations than stabilizing ones. Here, we meticulously compiled a new dataset from three recently published databases: ThermoMutDB, FireProtDB, and ProThermDB. This dataset, which does not overlap with the well-established S2648 dataset, consists of 4038 single-point mutations, including over 1000 stabilizing mutations. We assessed these mutations using 27 computational methods, including the latest ones utilizing mega-scale stability datasets and transfer learning. We excluded entries with overlap or similarity to training datasets to ensure fairness. Pearson correlation coefficients for the tested tools ranged from 0.20 to 0.53 on unseen data, and none of the methods could accurately predict stabilizing mutations, even those performing well in anti-symmetric property analysis. While most methods present consistent trends for predicting destabilizing mutations across various properties such as solvent exposure and secondary conformation, stabilizing mutations do not exhibit a clear pattern. Our study also suggests that solely addressing training dataset bias may not significantly enhance accuracy of predicting stabilizing mutations. These findings emphasize the importance of developing precise predictive methods for stabilizing mutations.
Collapse
Affiliation(s)
- Feifan Zheng
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yang Liu
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yan Yang
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yuhao Wen
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Minghui Li
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| |
Collapse
|
6
|
Rollo C, Pancotti C, Birolo G, Rossi I, Sanavia T, Fariselli P. Influence of Model Structures on Predictors of Protein Stability Changes from Single-Point Mutations. Genes (Basel) 2023; 14:2228. [PMID: 38137050 PMCID: PMC10742815 DOI: 10.3390/genes14122228] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Revised: 12/14/2023] [Accepted: 12/15/2023] [Indexed: 12/24/2023] Open
Abstract
Missense variation in genomes can affect protein structure stability and, in turn, the cell physiology behavior. Predicting the impact of those variations is relevant, and the best-performing computational tools exploit the protein structure information. However, most of the current protein sequence variants are unresolved, and comparative or ab initio tools can provide a structure. Here, we evaluate the impact of model structures, compared to experimental structures, on the predictors of protein stability changes upon single-point mutations, where no significant changes are expected between the original and the mutated structures. We show that there are substantial differences among the computational tools. Methods that rely on coarse-grained representation are less sensitive to the underlying protein structures. In contrast, tools that exploit more detailed molecular representations are sensible to structures generated from comparative modeling, even on single-residue substitutions.
Collapse
Affiliation(s)
- Cesare Rollo
- Department of Medical Sciences, University Torino, 10126 Torino, Italy (G.B.); (I.R.); (T.S.); (P.F.)
| | | | | | | | | | | |
Collapse
|
7
|
Arya R, Tripathi P, Nayak K, Ganesh J, Bihani SC, Ghosh B, Prashar V, Kumar M. Insights into the evolution of mutations in SARS-CoV-2 non-spike proteins. Microb Pathog 2023; 185:106460. [PMID: 37995880 DOI: 10.1016/j.micpath.2023.106460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 10/16/2023] [Accepted: 11/17/2023] [Indexed: 11/25/2023]
Abstract
The COVID-19 pandemic has been driven by the emergence of SARS-CoV-2 variants with mutations across all the viral proteins. Although mutations in the spike protein have received significant attention, understanding the prevalence and potential impact of mutations in other viral proteins is essential for comprehending the evolution of SARS-CoV-2. Here, we conducted a comprehensive analysis of approximately 14 million sequences of SARS-CoV-2 deposited in the GISAID database until December 2022 to identify prevalent mutations in the non-spike proteins at the global and country levels. Additionally, we evaluated the energetics of each mutation to better understand their impact on protein stability. While the consequences of many mutations remain unclear, we discuss potential structural and functional significance of some mutations. Our study highlights the ongoing evolutionary process of SARS-CoV-2 and underscores the importance of understanding changes in non-spike proteins.
Collapse
Affiliation(s)
- Rimanshee Arya
- Protein Crystallography Section, Bhabha Atomic Research Centre, Mumbai, 400085, India; Homi Bhabha National Institute, Anushakti Nagar, Mumbai, 400094, India
| | - Preeti Tripathi
- Protein Crystallography Section, Bhabha Atomic Research Centre, Mumbai, 400085, India
| | - Karthik Nayak
- Protein Crystallography Section, Bhabha Atomic Research Centre, Mumbai, 400085, India; School of Chemical Sciences, UM-DAE Centre for Excellence in Basic Sciences, University of Mumbai, Vidyanagari, Mumbai, 400098, India
| | - Janani Ganesh
- Protein Crystallography Section, Bhabha Atomic Research Centre, Mumbai, 400085, India; Homi Bhabha National Institute, Anushakti Nagar, Mumbai, 400094, India
| | - Subhash C Bihani
- Protein Crystallography Section, Bhabha Atomic Research Centre, Mumbai, 400085, India; Homi Bhabha National Institute, Anushakti Nagar, Mumbai, 400094, India
| | - Biplab Ghosh
- Homi Bhabha National Institute, Anushakti Nagar, Mumbai, 400094, India; Beamline Development & Application Section, Bhabha Atomic Research Centre, Mumbai, 400085, India
| | - Vishal Prashar
- Protein Crystallography Section, Bhabha Atomic Research Centre, Mumbai, 400085, India; Homi Bhabha National Institute, Anushakti Nagar, Mumbai, 400094, India.
| | - Mukesh Kumar
- Protein Crystallography Section, Bhabha Atomic Research Centre, Mumbai, 400085, India; Homi Bhabha National Institute, Anushakti Nagar, Mumbai, 400094, India.
| |
Collapse
|
8
|
Turina P, Fariselli P, Capriotti E. K-Pro: Kinetics Data on Proteins and Mutants. J Mol Biol 2023; 435:168245. [PMID: 37625584 DOI: 10.1016/j.jmb.2023.168245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 08/16/2023] [Accepted: 08/17/2023] [Indexed: 08/27/2023]
Abstract
The study of protein folding plays a crucial role in improving our understanding of protein function and of the relationship between genetics and phenotypes. In particular, understanding the thermodynamics and kinetics of the folding process is important for uncovering the mechanisms behind human disorders caused by protein misfolding. To address this issue, it is essential to collect and curate experimental kinetic and thermodynamic data on protein folding. K-Pro is a new database designed for collecting and storing experimental kinetic data on monomeric proteins, with a two-state folding mechanism. With 1,529 records from 62 proteins corresponding to 65 structures, K-Pro contains various kinetic parameters such as the logarithm of the folding and unfolding rates, Tanford's β and the ϕ values. When available, the database also includes thermodynamic parameters associated with the kinetic data. K-Pro features a user-friendly interface that allows browsing and downloading kinetic data of interest. The graphical interface provides a visual representation of the protein and mutants, and it is cross-linked to key databases such as PDB, UniProt, and PubMed. K-Pro is open and freely accessible through https://folding.biofold.org/k-pro and supports the latest versions of popular browsers.
Collapse
Affiliation(s)
- Paola Turina
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Via F. Selmi 3, 40126 Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Emidio Capriotti
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Via F. Selmi 3, 40126 Bologna, Italy.
| |
Collapse
|
9
|
Gong H, Zhang Y, Dong C, Wang Y, Chen G, Liang B, Li H, Liu L, Xu J, Li G. Unbiased curriculum learning enhanced global-local graph neural network for protein thermodynamic stability prediction. Bioinformatics 2023; 39:btad589. [PMID: 37740312 PMCID: PMC10918760 DOI: 10.1093/bioinformatics/btad589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 08/04/2023] [Accepted: 09/21/2023] [Indexed: 09/24/2023] Open
Abstract
MOTIVATION Proteins play crucial roles in biological processes, with their functions being closely tied to thermodynamic stability. However, measuring stability changes upon point mutations of amino acid residues using physical methods can be time-consuming. In recent years, several computational methods for protein thermodynamic stability prediction (PTSP) based on deep learning have emerged. Nevertheless, these approaches either overlook the natural topology of protein structures or neglect the inherent noisy samples resulting from theoretical calculation or experimental errors. RESULTS We propose a novel Global-Local Graph Neural Network powered by Unbiased Curriculum Learning for the PTSP task. Our method first builds a Siamese graph neural network to extract protein features before and after mutation. Since the graph's topological changes stem from local node mutations, we design a local feature transformation module to make the model focus on the mutated site. To address model bias caused by noisy samples, which represent unavoidable errors from physical experiments, we introduce an unbiased curriculum learning method. This approach effectively identifies and re-weights noisy samples during the training process. Extensive experiments demonstrate that our proposed method outperforms advanced protein stability prediction methods, and surpasses state-of-the-art learning methods for regression prediction tasks. AVAILABILITY AND IMPLEMENTATION All code and data is available at https://github.com/haifangong/UCL-GLGNN.
Collapse
Affiliation(s)
- Haifan Gong
- Shanghai Artificial Intelligence Laboratory, Shanghai 200000, China
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
- SRIBD, Chinese University of Hong Kong (Shenzhen), Shenzhen 518000, China
| | - Yumeng Zhang
- Shanghai Jiao Tong University, Shanghai 200000, China
| | - Chenhe Dong
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Yue Wang
- Qilu Hospital, Shandong University, Shandong 250000, China
| | - Guanqi Chen
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Bilin Liang
- Shanghai Artificial Intelligence Laboratory, Shanghai 200000, China
| | - Haofeng Li
- SRIBD, Chinese University of Hong Kong (Shenzhen), Shenzhen 518000, China
| | - Lanxuan Liu
- Shanghai Artificial Intelligence Laboratory, Shanghai 200000, China
| | - Jie Xu
- Shanghai Artificial Intelligence Laboratory, Shanghai 200000, China
| | - Guanbin Li
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| |
Collapse
|
10
|
Berber I, Erten C, Kazan H. Predator: Predicting the Impact of Cancer Somatic Mutations on Protein-Protein Interactions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3163-3172. [PMID: 37030791 DOI: 10.1109/tcbb.2023.3262119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Since many biological processes are governed by protein-protein interactions, understanding which mutations lead to a disruption in these interactions is profoundly important for cancer research. Most of the existing methods focus on the stability of the protein without considering the specific effects of a mutation on its interactions with other proteins. Here, we focus on somatic mutations that appear on the interface regions of the protein and predict the interactions that would be affected by a mutation of interest. We build an ensemble model, Predator, that classifies the interface mutations as disruptive or nondisruptive based on the predicted effects of mutations on specific protein-protein interactions. We show that Predator outperforms existing approaches in literature in terms of prediction accuracy. We then apply Predator on various TCGA cancer cohorts and perform comprehensive analysis at cohort level, patient level, and gene level in determining the genes whose interface mutations tend to yield a disruption in its interactions. The predictions obtained by Predator shed light on interesting patterns on several genes for each cohort regarding their potential as cancer drivers. Our analyses further reveal that the identified genes and their frequently disrupted partners exhibit patterns of mutually exclusivity across cancer cohorts under study.
Collapse
|
11
|
Licata L, Via A, Turina P, Babbi G, Benevenuta S, Carta C, Casadio R, Cicconardi A, Facchiano A, Fariselli P, Giordano D, Isidori F, Marabotti A, Martelli PL, Pascarella S, Pinelli M, Pippucci T, Russo R, Savojardo C, Scafuri B, Valeriani L, Capriotti E. Resources and tools for rare disease variant interpretation. Front Mol Biosci 2023; 10:1169109. [PMID: 37234922 PMCID: PMC10206239 DOI: 10.3389/fmolb.2023.1169109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2023] [Accepted: 04/25/2023] [Indexed: 05/28/2023] Open
Abstract
Collectively, rare genetic disorders affect a substantial portion of the world's population. In most cases, those affected face difficulties in receiving a clinical diagnosis and genetic characterization. The understanding of the molecular mechanisms of these diseases and the development of therapeutic treatments for patients are also challenging. However, the application of recent advancements in genome sequencing/analysis technologies and computer-aided tools for predicting phenotype-genotype associations can bring significant benefits to this field. In this review, we highlight the most relevant online resources and computational tools for genome interpretation that can enhance the diagnosis, clinical management, and development of treatments for rare disorders. Our focus is on resources for interpreting single nucleotide variants. Additionally, we present use cases for interpreting genetic variants in clinical settings and review the limitations of these results and prediction tools. Finally, we have compiled a curated set of core resources and tools for analyzing rare disease genomes. Such resources and tools can be utilized to develop standardized protocols that will enhance the accuracy and effectiveness of rare disease diagnosis.
Collapse
Affiliation(s)
- Luana Licata
- Department of Biology, University of Rome Tor Vergata, Roma, Italy
| | - Allegra Via
- Department of Biochemical Sciences “A. Rossi Fanelli”, University of Rome “La Sapienza”, Roma, Italy
| | - Paola Turina
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Giulia Babbi
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | | | - Claudio Carta
- National Centre for Rare Diseases, Istituto Superiore di Sanità, Roma, Italy
| | - Rita Casadio
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Andrea Cicconardi
- Department of Physics, University of Genova, Genova, Italy
- Italiano di Tecnologia—IIT, Genova, Italy
| | - Angelo Facchiano
- National Research Council, Institute of Food Science, Avellino, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Torino, Italy
| | - Deborah Giordano
- National Research Council, Institute of Food Science, Avellino, Italy
| | - Federica Isidori
- Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Anna Marabotti
- Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Fisciano, SA, Italy
| | - Pier Luigi Martelli
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Stefano Pascarella
- Department of Biochemical Sciences “A. Rossi Fanelli”, University of Rome “La Sapienza”, Roma, Italy
| | - Michele Pinelli
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Napoli, Italy
| | - Tommaso Pippucci
- Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Roberta Russo
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Napoli, Italy
- CEINGE Biotecnologie Avanzate Franco Salvatore, Napoli, Italy
| | - Castrense Savojardo
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Bernardina Scafuri
- Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Fisciano, SA, Italy
| | | | - Emidio Capriotti
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| |
Collapse
|
12
|
Chen Z, Wang X, Chen X, Huang J, Wang C, Wang J, Wang Z. Accelerating therapeutic protein design with computational approaches toward the clinical stage. Comput Struct Biotechnol J 2023; 21:2909-2926. [PMID: 38213894 PMCID: PMC10781723 DOI: 10.1016/j.csbj.2023.04.027] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 04/11/2023] [Accepted: 04/27/2023] [Indexed: 01/13/2024] Open
Abstract
Therapeutic protein, represented by antibodies, is of increasing interest in human medicine. However, clinical translation of therapeutic protein is still largely hindered by different aspects of developability, including affinity and selectivity, stability and aggregation prevention, solubility and viscosity reduction, and deimmunization. Conventional optimization of the developability with widely used methods, like display technologies and library screening approaches, is a time and cost-intensive endeavor, and the efficiency in finding suitable solutions is still not enough to meet clinical needs. In recent years, the accelerated advancement of computational methodologies has ushered in a transformative era in the field of therapeutic protein design. Owing to their remarkable capabilities in feature extraction and modeling, the integration of cutting-edge computational strategies with conventional techniques presents a promising avenue to accelerate the progression of therapeutic protein design and optimization toward clinical implementation. Here, we compared the differences between therapeutic protein and small molecules in developability and provided an overview of the computational approaches applicable to the design or optimization of therapeutic protein in several developability issues.
Collapse
Affiliation(s)
- Zhidong Chen
- Department of Pathology, The Eighth Affiliated Hospital, Sun Yat-sen University, Shenzhen 518033, China
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Xinpei Wang
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Xu Chen
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Juyang Huang
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Chenglin Wang
- Shenzhen Qiyu Biotechnology Co., Ltd, Shenzhen 518107, China
| | - Junqing Wang
- School of Pharmaceutical Sciences, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
| | - Zhe Wang
- Department of Pathology, The Eighth Affiliated Hospital, Sun Yat-sen University, Shenzhen 518033, China
| |
Collapse
|
13
|
David A, Sternberg MJE. Protein structure-based evaluation of missense variants: Resources, challenges and future directions. Curr Opin Struct Biol 2023; 80:102600. [PMID: 37126977 DOI: 10.1016/j.sbi.2023.102600] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 03/30/2023] [Accepted: 03/31/2023] [Indexed: 05/03/2023]
Abstract
We provide an overview of the methods that can be used for protein structure-based evaluation of missense variants. The algorithms can be broadly divided into those that calculate the difference in free energy (ΔΔG) between the wild type and variant structures and those that use structural features to predict the damaging effect of a variant without providing a ΔΔG. A wide range of machine learning approaches have been employed to develop those algorithms. We also discuss challenges and opportunities for variant interpretation in view of the recent breakthrough in three-dimensional structural modelling using deep learning.
Collapse
Affiliation(s)
- Alessia David
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK.
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| |
Collapse
|
14
|
Cai P, Liu S, Zhang D, Xing H, Han M, Liu D, Gong L, Hu QN. SynBioTools: a one-stop facility for searching and selecting synthetic biology tools. BMC Bioinformatics 2023; 24:152. [PMID: 37069545 PMCID: PMC10111727 DOI: 10.1186/s12859-023-05281-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Accepted: 04/11/2023] [Indexed: 04/19/2023] Open
Abstract
BACKGROUND The rapid development of synthetic biology relies heavily on the use of databases and computational tools, which are also developing rapidly. While many tool registries have been created to facilitate tool retrieval, sharing, and reuse, no relatively comprehensive tool registry or catalog addresses all aspects of synthetic biology. RESULTS We constructed SynBioTools, a comprehensive collection of synthetic biology databases, computational tools, and experimental methods, as a one-stop facility for searching and selecting synthetic biology tools. SynBioTools includes databases, computational tools, and methods extracted from reviews via SCIentific Table Extraction, a scientific table-extraction tool that we built. Approximately 57% of the resources that we located and included in SynBioTools are not mentioned in bio.tools, the dominant tool registry. To improve users' understanding of the tools and to enable them to make better choices, the tools are grouped into nine modules (each with subdivisions) based on their potential biosynthetic applications. Detailed comparisons of similar tools in every classification are included. The URLs, descriptions, source references, and the number of citations of the tools are also integrated into the system. CONCLUSIONS SynBioTools is freely available at https://synbiotools.lifesynther.com/ . It provides end-users and developers with a useful resource of categorized synthetic biology databases, tools, and methods to facilitate tool retrieval and selection.
Collapse
Affiliation(s)
- Pengli Cai
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Sheng Liu
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Dachuan Zhang
- Ecological Systems Design, Institute of Environmental Engineering, ETH Zurich, 8093, Zurich, Switzerland
| | - Huadong Xing
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Mengying Han
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Dongliang Liu
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Linlin Gong
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Qian-Nan Hu
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
| |
Collapse
|
15
|
Hu R, Fu L, Chen Y, Chen J, Qiao Y, Si T. Protein engineering via Bayesian optimization-guided evolutionary algorithm and robotic experiments. Brief Bioinform 2023; 24:6958505. [PMID: 36562723 DOI: 10.1093/bib/bbac570] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 11/14/2022] [Accepted: 11/22/2022] [Indexed: 12/24/2022] Open
Abstract
Directed protein evolution applies repeated rounds of genetic mutagenesis and phenotypic screening and is often limited by experimental throughput. Through in silico prioritization of mutant sequences, machine learning has been applied to reduce wet lab burden to a level practical for human researchers. On the other hand, robotics permits large batches and rapid iterations for protein engineering cycles, but such capacities have not been well exploited in existing machine learning-assisted directed evolution approaches. Here, we report a scalable and batched method, Bayesian Optimization-guided EVOlutionary (BO-EVO) algorithm, to guide multiple rounds of robotic experiments to explore protein fitness landscapes of combinatorial mutagenesis libraries. We first examined various design specifications based on an empirical landscape of protein G domain B1. Then, BO-EVO was successfully generalized to another empirical landscape of an Escherichia coli kinase PhoQ, as well as simulated NK landscapes with up to moderate epistasis. This approach was then applied to guide robotic library creation and screening to engineer enzyme specificity of RhlA, a key biosynthetic enzyme for rhamnolipid biosurfactants. A 4.8-fold improvement in producing a target rhamnolipid congener was achieved after examining less than 1% of all possible mutants after four iterations. Overall, BO-EVO proves to be an efficient and general approach to guide combinatorial protein engineering without prior knowledge.
Collapse
Affiliation(s)
- Ruyun Hu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Lihao Fu
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.,CAS Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen 518055, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yongcan Chen
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.,CAS Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen 518055, China
| | - Junyu Chen
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Yu Qiao
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Tong Si
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China.,CAS Key Laboratory for Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen 518055, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
16
|
Hernández IM, Dehouck Y, Bastolla U, López-Blanco JR, Chacón P. Predicting protein stability changes upon mutation using a simple orientational potential. Bioinformatics 2023; 39:6984713. [PMID: 36629451 PMCID: PMC9850275 DOI: 10.1093/bioinformatics/btad011] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Revised: 11/17/2022] [Accepted: 01/10/2023] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION Structure-based stability prediction upon mutation is crucial for protein engineering and design, and for understanding genetic diseases or drug resistance events. For this task, we adopted a simple residue-based orientational potential that considers only three backbone atoms, previously applied in protein modeling. Its application to stability prediction only requires parametrizing 12 amino acid-dependent weights using cross-validation strategies on a curated dataset in which we tried to reduce the mutations that belong to protein-protein or protein-ligand interfaces, extreme conditions and the alanine over-representation. RESULTS Our method, called KORPM, accurately predicts mutational effects on an independent benchmark dataset, whether the wild-type or mutated structure is used as starting point. Compared with state-of-the-art methods on this balanced dataset, our approach obtained the lowest root mean square error (RMSE) and the highest correlation between predicted and experimental ΔΔG measures, as well as better receiver operating characteristics and precision-recall curves. Our method is almost anti-symmetric by construction, and it performs thus similarly for the direct and reverse mutations with the corresponding wild-type and mutated structures. Despite the strong limitations of the available experimental mutation data in terms of size, variability, and heterogeneity, we show competitive results with a simple sum of energy terms, which is more efficient and less prone to overfitting. AVAILABILITY AND IMPLEMENTATION https://github.com/chaconlab/korpm. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Iván Martín Hernández
- Department of Biological Physical Chemistry, Rocasolano Institute of Physical Chemistry, CSIC, 28006 Madrid, Spain
| | - Yves Dehouck
- Bioinformatic Unit, Centro de Biología Molecular “Severo Ochoa,” CSIC-UAM Cantoblanco, Madrid 28049, Spain
| | - Ugo Bastolla
- Bioinformatic Unit, Centro de Biología Molecular “Severo Ochoa,” CSIC-UAM Cantoblanco, Madrid 28049, Spain
| | - José Ramón López-Blanco
- Department of Biological Physical Chemistry, Rocasolano Institute of Physical Chemistry, CSIC, 28006 Madrid, Spain
| | | |
Collapse
|
17
|
PSP-GNM: Predicting Protein Stability Changes upon Point Mutations with a Gaussian Network Model. Int J Mol Sci 2022; 23:ijms231810711. [PMID: 36142614 PMCID: PMC9505940 DOI: 10.3390/ijms231810711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 09/05/2022] [Accepted: 09/09/2022] [Indexed: 11/26/2022] Open
Abstract
Understanding the effects of missense mutations on protein stability is a widely acknowledged significant biological problem. Genomic missense mutations may alter one or more amino acids, leading to increased or decreased stability of the encoded proteins. In this study, we describe a novel approach—Protein Stability Prediction with a Gaussian Network Model (PSP-GNM)—to measure the unfolding Gibbs free energy change (ΔΔG) and evaluate the effects of single amino acid substitutions on protein stability. Specifically, PSP-GNM employs a coarse-grained Gaussian Network Model (GNM) that has interactions between amino acids weighted by the Miyazawa–Jernigan statistical potential. We used PSP-GNM to simulate partial unfolding of the wildtype and mutant protein structures, and then used the difference in the energies and entropies of the unfolded wildtype and mutant proteins to calculate ΔΔG. The extent of the agreement between the ΔΔG calculated by PSP-GNM and the experimental ΔΔG was evaluated on three benchmark datasets: 350 forward mutations (S350 dataset), 669 forward and reverse mutations (S669 dataset) and 611 forward and reverse mutations (S611 dataset). We observed a Pearson correlation coefficient as high as 0.61, which is comparable to many of the existing state-of-the-art methods. The agreement with experimental ΔΔG further increased when we considered only those measurements made close to 25 °C and neutral pH, suggesting dependence on experimental conditions. We also assessed for the antisymmetry (ΔΔGreverse = −ΔΔGforward) between the forward and reverse mutations on the Ssym+ dataset, which has 352 forward and reverse mutations. While most available methods do not display significant antisymmetry, PSP-GNM demonstrated near-perfect antisymmetry, with a Pearson correlation of −0.97. PSP-GNM is written in Python and can be downloaded as a stand-alone code.
Collapse
|
18
|
Understanding the mutational frequency in SARS-CoV-2 proteome using structural features. Comput Biol Med 2022; 147:105708. [PMID: 35714506 PMCID: PMC9173821 DOI: 10.1016/j.compbiomed.2022.105708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Revised: 04/26/2022] [Accepted: 06/04/2022] [Indexed: 01/18/2023]
Abstract
The prolonged transmission of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus in the human population has led to demographic divergence and the emergence of several location-specific clusters of viral strains. Although the effect of mutation(s) on severity and survival of the virus is still unclear, it is evident that certain sites in the viral proteome are more/less prone to mutations. In fact, millions of SARS-CoV-2 sequences collected all over the world have provided us a unique opportunity to understand viral protein mutations and develop novel computational approaches to predict mutational patterns. In this study, we have classified the mutation sites into low and high mutability classes based on viral isolates count containing mutations. The physicochemical features and structural analysis of the SARS-CoV-2 proteins showed that features including residue type, surface accessibility, residue bulkiness, stability and sequence conservation at the mutation site were able to classify the low and high mutability sites. We further developed machine learning models using above-mentioned features, to predict low and high mutability sites at different selection thresholds (ranging 5-30% of topmost and bottommost mutated sites) and observed the improvement in performance as the selection threshold is reduced (prediction accuracy ranging from 65 to 77%). The analysis will be useful for early detection of variants of concern for the SARS-CoV-2, which can also be applied to other existing and emerging viruses for another pandemic prevention.
Collapse
|
19
|
Scafuri B, Verdino A, D'Arminio N, Marabotti A. Computational methods to assist in the discovery of pharmacological chaperones for rare diseases. Brief Bioinform 2022; 23:6590149. [PMID: 35595532 DOI: 10.1093/bib/bbac198] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 04/13/2022] [Accepted: 04/28/2022] [Indexed: 12/21/2022] Open
Abstract
Pharmacological chaperones are chemical compounds able to bind proteins and stabilize them against denaturation and following degradation. Some pharmacological chaperones have been approved, or are under investigation, for the treatment of rare inborn errors of metabolism, caused by genetic mutations that often can destabilize the structure of the wild-type proteins expressed by that gene. Given that, for rare diseases, there is a general lack of pharmacological treatments, many expectations are poured out on this type of compounds. However, their discovery is not straightforward. In this review, we would like to focus on the computational methods that can assist and accelerate the search for these compounds, showing also examples in which these methods were successfully applied for the discovery of promising molecules belonging to this new category of pharmacologically active compounds.
Collapse
Affiliation(s)
- Bernardina Scafuri
- Department of Chemistry and Biology "A. Zambelli", University of Salerno, via Giovanni Paolo II, 132, 84084 Fisciano (SA), Italy
| | - Anna Verdino
- Department of Chemistry and Biology "A. Zambelli", University of Salerno, via Giovanni Paolo II, 132, 84084 Fisciano (SA), Italy
| | - Nancy D'Arminio
- Department of Chemistry and Biology "A. Zambelli", University of Salerno, via Giovanni Paolo II, 132, 84084 Fisciano (SA), Italy
| | - Anna Marabotti
- Department of Chemistry and Biology "A. Zambelli", University of Salerno, via Giovanni Paolo II, 132, 84084 Fisciano (SA), Italy
| |
Collapse
|
20
|
Casadio R, Savojardo C, Fariselli P, Capriotti E, Martelli PL. Turning Failures into Applications: The Problem of Protein ΔΔG Prediction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2449:169-185. [PMID: 35507262 DOI: 10.1007/978-1-0716-2095-3_6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
After nearly two decades of research in the field of computational methods based on machine learning and knowledge-based potentials for ΔG and ΔΔG prediction upon variations, we now realize that all the approaches are poorly performing when tested on specific cases and that there is large space for improvement. Why this is so? Is it wrong the underlying assumption that experimental protein thermodynamics in solution reflects the thermodynamics of a single protein? Both machine learning and knowledge-based computational methods are rigorous and we know the solid theory behind. We are now in a critical situation, which suggests that predictions of protein instability upon variation should be considered with care. In the following, we will show how to cope with the problem of understanding which protein positions may be of interest for biotechnological and biomedical purposes. By applying a consensus procedure, we indicate possible strategies for the result interpretation.
Collapse
Affiliation(s)
- Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy.
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Turin, Italy
| | - Emidio Capriotti
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| |
Collapse
|
21
|
Vasina M, Velecký J, Planas-Iglesias J, Marques SM, Skarupova J, Damborsky J, Bednar D, Mazurenko S, Prokop Z. Tools for computational design and high-throughput screening of therapeutic enzymes. Adv Drug Deliv Rev 2022; 183:114143. [PMID: 35167900 DOI: 10.1016/j.addr.2022.114143] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2021] [Revised: 02/04/2022] [Accepted: 02/09/2022] [Indexed: 12/16/2022]
Abstract
Therapeutic enzymes are valuable biopharmaceuticals in various biomedical applications. They have been successfully applied for fibrinolysis, cancer treatment, enzyme replacement therapies, and the treatment of rare diseases. Still, there is a permanent demand to find new or better therapeutic enzymes, which would be sufficiently soluble, stable, and active to meet specific medical needs. Here, we highlight the benefits of coupling computational approaches with high-throughput experimental technologies, which significantly accelerate the identification and engineering of catalytic therapeutic agents. New enzymes can be identified in genomic and metagenomic databases, which grow thanks to next-generation sequencing technologies exponentially. Computational design and machine learning methods are being developed to improve catalytically potent enzymes and predict their properties to guide the selection of target enzymes. High-throughput experimental pipelines, increasingly relying on microfluidics, ensure functional screening and biochemical characterization of target enzymes to reach efficient therapeutic enzymes.
Collapse
Affiliation(s)
- Michal Vasina
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, Pekarska 53, Brno, Czech Republic
| | - Jan Velecký
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic
| | - Joan Planas-Iglesias
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, Pekarska 53, Brno, Czech Republic
| | - Sergio M Marques
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, Pekarska 53, Brno, Czech Republic
| | - Jana Skarupova
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, Pekarska 53, Brno, Czech Republic; Enantis, INBIT, Kamenice 34, Brno, Czech Republic
| | - David Bednar
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, Pekarska 53, Brno, Czech Republic.
| | - Stanislav Mazurenko
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, Pekarska 53, Brno, Czech Republic.
| | - Zbynek Prokop
- Loschmidt Laboratories, Department of Experimental Biology, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; Loschmidt Laboratories, RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno, Czech Republic; International Clinical Research Centre, St. Anne's University Hospital, Pekarska 53, Brno, Czech Republic.
| |
Collapse
|
22
|
Biancaniello C, D’Argenio A, Giordano D, Dotolo S, Scafuri B, Marabotti A, d’Acierno A, Tagliaferri R, Facchiano A. Investigating the Effects of Amino Acid Variations in Human Menin. MOLECULES (BASEL, SWITZERLAND) 2022; 27:molecules27051747. [PMID: 35268848 PMCID: PMC8911756 DOI: 10.3390/molecules27051747] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 03/01/2022] [Accepted: 03/04/2022] [Indexed: 12/14/2022]
Abstract
Human menin is a nuclear protein that participates in many cellular processes, as transcriptional regulation, DNA damage repair, cell signaling, cell division, proliferation, and migration, by interacting with many other proteins. Mutations of the gene encoding menin cause multiple endocrine neoplasia type 1 (MEN1), a rare autosomal dominant disorder associated with tumors of the endocrine glands. In order to characterize the structural and functional effects at protein level of the hundreds of missense variations, we investigated by computational methods the wild-type menin and more than 200 variants, predicting the amino acid variations that change secondary structure, solvent accessibility, salt-bridge and H-bond interactions, protein thermostability, and altering the capability to bind known protein interactors. The structural analyses are freely accessible online by means of a web interface that integrates also a 3D visualization of the structure of the wild-type and variant proteins. The results of the study offer insight into the effects of the amino acid variations in view of a more complete understanding of their pathological role.
Collapse
Affiliation(s)
- Carmen Biancaniello
- Dipartimento di Scienze Aziendali, Management and Innovation Systems, Università degli Studi di Salerno, 84084 Fisciano, Italy; (C.B.); (S.D.)
| | - Antonia D’Argenio
- National Research Council, Institute of Food Science, 83100 Avellino, Italy; (A.D.); (D.G.); (A.d.)
| | - Deborah Giordano
- National Research Council, Institute of Food Science, 83100 Avellino, Italy; (A.D.); (D.G.); (A.d.)
| | - Serena Dotolo
- Dipartimento di Scienze Aziendali, Management and Innovation Systems, Università degli Studi di Salerno, 84084 Fisciano, Italy; (C.B.); (S.D.)
| | - Bernardina Scafuri
- Dipartimento di Chimica e Biologia “A. Zambelli”, Università degli Studi di Salerno, 84084 Fisciano, Italy; (B.S.); (A.M.)
| | - Anna Marabotti
- Dipartimento di Chimica e Biologia “A. Zambelli”, Università degli Studi di Salerno, 84084 Fisciano, Italy; (B.S.); (A.M.)
| | - Antonio d’Acierno
- National Research Council, Institute of Food Science, 83100 Avellino, Italy; (A.D.); (D.G.); (A.d.)
| | - Roberto Tagliaferri
- Dipartimento di Scienze Aziendali, Management and Innovation Systems, Università degli Studi di Salerno, 84084 Fisciano, Italy; (C.B.); (S.D.)
- Correspondence: (R.T.); (A.F.)
| | - Angelo Facchiano
- National Research Council, Institute of Food Science, 83100 Avellino, Italy; (A.D.); (D.G.); (A.d.)
- Correspondence: (R.T.); (A.F.)
| |
Collapse
|
23
|
Pancotti C, Benevenuta S, Birolo G, Alberini V, Repetto V, Sanavia T, Capriotti E, Fariselli P. Predicting protein stability changes upon single-point mutation: a thorough comparison of the available tools on a new dataset. Brief Bioinform 2022; 23:6502552. [PMID: 35021190 PMCID: PMC8921618 DOI: 10.1093/bib/bbab555] [Citation(s) in RCA: 38] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 11/29/2021] [Accepted: 12/05/2021] [Indexed: 12/13/2022] Open
Abstract
Predicting the difference in thermodynamic stability between protein variants is crucial for protein design and understanding the genotype-phenotype relationships. So far, several computational tools have been created to address this task. Nevertheless, most of them have been trained or optimized on the same and ‘all’ available data, making a fair comparison unfeasible. Here, we introduce a novel dataset, collected and manually cleaned from the latest version of the ThermoMutDB database, consisting of 669 variants not included in the most widely used training datasets. The prediction performance and the ability to satisfy the antisymmetry property by considering both direct and reverse variants were evaluated across 21 different tools. The Pearson correlations of the tested tools were in the ranges of 0.21–0.5 and 0–0.45 for the direct and reverse variants, respectively. When both direct and reverse variants are considered, the antisymmetric methods perform better achieving a Pearson correlation in the range of 0.51–0.62. The tested methods seem relatively insensitive to the physiological conditions, performing well also on the variants measured with more extreme pH and temperature values. A common issue with all the tested methods is the compression of the \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$\Delta \Delta G$\end{document} predictions toward zero. Furthermore, the thermodynamic stability of the most significantly stabilizing variants was found to be more challenging to predict. This study is the most extensive comparisons of prediction methods using an entirely novel set of variants never tested before.
Collapse
Affiliation(s)
- Corrado Pancotti
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Silvia Benevenuta
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Giovanni Birolo
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Virginia Alberini
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Valeria Repetto
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Tiziana Sanavia
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Emidio Capriotti
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| |
Collapse
|
24
|
Artificial intelligence challenges for predicting the impact of mutations on protein stability. Curr Opin Struct Biol 2021; 72:161-168. [PMID: 34922207 DOI: 10.1016/j.sbi.2021.11.001] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 09/15/2021] [Accepted: 11/08/2021] [Indexed: 01/17/2023]
Abstract
Stability is a key ingredient of protein fitness, and its modification through targeted mutations has applications in various fields, such as protein engineering, drug design, and deleterious variant interpretation. Many studies have been devoted over the past decades to build new, more effective methods for predicting the impact of mutations on protein stability based on the latest developments in artificial intelligence. We discuss their features, algorithms, computational efficiency, and accuracy estimated on an independent test set. We focus on a critical analysis of their limitations, the recurrent biases toward the training set, their generalizability, and interpretability. We found that the accuracy of the predictors has stagnated at around 1 kcal/mol for over 15 years. We conclude by discussing the challenges that need to be addressed to reach improved performance.
Collapse
|
25
|
Marabotti A, Del Prete E, Scafuri B, Facchiano A. Performance of Web tools for predicting changes in protein stability caused by mutations. BMC Bioinformatics 2021; 22:345. [PMID: 34225665 PMCID: PMC8256537 DOI: 10.1186/s12859-021-04238-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Accepted: 05/18/2021] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Despite decades on developing dedicated Web tools, it is still difficult to predict correctly the changes of the thermodynamic stability of proteins caused by mutations. Here, we assessed the reliability of five recently developed Web tools, in order to evaluate the progresses in the field. RESULTS The results show that, although there are improvements in the field, the assessed predictors are still far from ideal. Prevailing problems include the bias towards destabilizing mutations, and, in general, the results are unreliable when the mutation causes a ΔΔG within the interval ± 0.5 kcal/mol. We found that using several predictors and combining their results into a consensus is a rough, but effective way to increase reliability of the predictions. CONCLUSIONS We suggest all developers to consider in their future tools the usage of balanced data sets for training of predictors, and all users to combine the results of multiple tools to increase the chances of having correct predictions about the effect of mutations on the thermodynamic stability of a protein.
Collapse
Affiliation(s)
- Anna Marabotti
- Department of Chemistry and Biology "A. Zambelli", University of Salerno, Fisciano, SA, Italy.
| | - Eugenio Del Prete
- CNR-IAC, National Research Council, Institute for Applied Mathematics "Mauro Picone", Naples, Italy
| | - Bernardina Scafuri
- Department of Chemistry and Biology "A. Zambelli", University of Salerno, Fisciano, SA, Italy
| | - Angelo Facchiano
- CNR-ISA, National Research Council, Institute of Food Science, Avellino, Italy.
| |
Collapse
|
26
|
Directed evolution of glycosyltransferase for enhanced efficiency of avermectin glucosylation. Appl Microbiol Biotechnol 2021; 105:4599-4607. [PMID: 34043077 DOI: 10.1007/s00253-021-11279-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2020] [Revised: 03/15/2021] [Accepted: 04/06/2021] [Indexed: 10/21/2022]
Abstract
Avermectin, produced by Streptomyces avermitilis, is an active compound protective against nematodes, insects, and mites. However, its potential usage is limited by its low aqueous solubility. The uridine diphosphate (UDP)-glycosyltransferase (BLC) from Bacillus licheniformis synthesizes avermectin glycosides with improved water solubility and in vitro antinematodal activity. However, enzymatic glycosylation of avermectin by BLC is limited due to the low conversion rate of this reaction. Thus, improving BLC enzyme activity is necessary for mass production of avermectin glycosides for field application. In this study, the catalytic activity of BLC toward avermectin was enhanced via directed evolution. Three mutants from the BLC mutant library (R57H, V227A, and D252V) had specific glucosylation activity for avermectin 2.0-, 1.8-, and 1.5-fold higher, respectively, than wild-type BLC. Generation of combined mutations via site-directed mutagenesis led to even further enhancement of activity. The triple mutant, R57H/V227A/D252V, had the highest activity, 2.8-fold higher than that of wild-type BLC. The catalytic efficiencies (Kcat/Km) of the best mutant (R57H/V227A/D252V) toward the substrates avermectin and UDP-glucose were improved by 2.71- and 2.29-fold, respectively, compared to those of wild-type BLC. Structural modeling analysis revealed that the free energy of the mutants was - 1.1 to - 7.1 kcal/mol lower than that of wild-type BLC, which was correlated with their improved activity. KEY POINTS: • Directed evolution improved the glucosylation activity of BLC toward avermectin. • Combinatorial site-directed mutagenesis led to further enhanced activity. • The mutants exhibited lower free energy values than wild-type BLC.
Collapse
|
27
|
Petrosino M, Novak L, Pasquo A, Chiaraluce R, Turina P, Capriotti E, Consalvi V. Analysis and Interpretation of the Impact of Missense Variants in Cancer. Int J Mol Sci 2021; 22:ijms22115416. [PMID: 34063805 PMCID: PMC8196604 DOI: 10.3390/ijms22115416] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 05/03/2021] [Accepted: 05/17/2021] [Indexed: 01/10/2023] Open
Abstract
Large scale genome sequencing allowed the identification of a massive number of genetic variations, whose impact on human health is still unknown. In this review we analyze, by an in silico-based strategy, the impact of missense variants on cancer-related genes, whose effect on protein stability and function was experimentally determined. We collected a set of 164 variants from 11 proteins to analyze the impact of missense mutations at structural and functional levels, and to assess the performance of state-of-the-art methods (FoldX and Meta-SNP) for predicting protein stability change and pathogenicity. The result of our analysis shows that a combination of experimental data on protein stability and in silico pathogenicity predictions allowed the identification of a subset of variants with a high probability of having a deleterious phenotypic effect, as confirmed by the significant enrichment of the subset in variants annotated in the COSMIC database as putative cancer-driving variants. Our analysis suggests that the integration of experimental and computational approaches may contribute to evaluate the risk for complex disorders and develop more effective treatment strategies.
Collapse
Affiliation(s)
- Maria Petrosino
- Dipartimento Scienze Biochimiche “A. Rossi Fanelli”, Sapienza University of Rome, 00185 Roma, Italy; (M.P.); (L.N.); (R.C.)
| | - Leonore Novak
- Dipartimento Scienze Biochimiche “A. Rossi Fanelli”, Sapienza University of Rome, 00185 Roma, Italy; (M.P.); (L.N.); (R.C.)
| | - Alessandra Pasquo
- ENEA CR Frascati, Diagnostics and Metrology Laboratory FSN-TECFIS-DIM, 00044 Frascati, Italy;
| | - Roberta Chiaraluce
- Dipartimento Scienze Biochimiche “A. Rossi Fanelli”, Sapienza University of Rome, 00185 Roma, Italy; (M.P.); (L.N.); (R.C.)
| | - Paola Turina
- Dipartimento di Farmacia e Biotecnologie (FaBiT), University of Bologna, 40126 Bologna, Italy;
| | - Emidio Capriotti
- Dipartimento di Farmacia e Biotecnologie (FaBiT), University of Bologna, 40126 Bologna, Italy;
- Correspondence: (E.C.); (V.C.)
| | - Valerio Consalvi
- Dipartimento Scienze Biochimiche “A. Rossi Fanelli”, Sapienza University of Rome, 00185 Roma, Italy; (M.P.); (L.N.); (R.C.)
- Correspondence: (E.C.); (V.C.)
| |
Collapse
|
28
|
de Godoi Contessoto V, Ramos FC, de Melo RR, de Oliveira VM, Scarpassa JA, de Sousa AS, Zanphorlin LM, Slade GG, Leite VBP, Ruller R. Electrostatic interaction optimization improves catalytic rates and thermotolerance on xylanases. Biophys J 2021; 120:2172-2180. [PMID: 33831390 DOI: 10.1016/j.bpj.2021.03.036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Revised: 02/08/2021] [Accepted: 03/23/2021] [Indexed: 10/21/2022] Open
Abstract
Understanding the aspects that contribute to improving proteins' biochemical properties is of high relevance for protein engineering. Properties such as the catalytic rate, thermal stability, and thermal resistance are crucial for applying enzymes in the industry. Different interactions can influence those biochemical properties of an enzyme. Among them, the surface charge-charge interactions have been a target of particular attention. In this study, we employ the Tanford-Kirkwood solvent accessibility model using the Monte Carlo algorithm (TKSA-MC) to predict possible interactions that could improve stability and catalytic rate of a WT xylanase (XynAWT) and its M6 xylanase (XynAM6) mutant. The modeling prediction indicates that mutating from a lysine in position 99 to a glutamic acid (K99E) favors the native state stabilization in both xylanases. Our lab results showed that mutated xylanases had their thermotolerance and catalytic rate increased, which conferred higher processivity of delignified sugarcane bagasse. The TKSA-MC approach employed here is presented as an efficient computational-based design strategy that can be applied to improve the thermal resistance of enzymes with industrial and biotechnological applications.
Collapse
Affiliation(s)
- Vinícius de Godoi Contessoto
- Brazilian Biorenewables National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, São Paulo, Brazil; Center for Theoretical Biological Physics, Rice University, Houston, Texas; Department of Physics, Institute of Biosciences, Letters and Exact Sciences, São Paulo State University, São José do Rio Preto, São Paulo, Brazil
| | - Felipe Cardoso Ramos
- Brazilian Biorenewables National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, São Paulo, Brazil
| | - Ricardo Rodrigues de Melo
- Brazilian Biorenewables National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, São Paulo, Brazil
| | - Vinícius Martins de Oliveira
- Brazilian Biosciences National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, São Paulo, Brazil
| | - Josiane Aniele Scarpassa
- Brazilian Biorenewables National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, São Paulo, Brazil
| | - Amanda Silva de Sousa
- Brazilian Biorenewables National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, São Paulo, Brazil
| | - Letıcia Maria Zanphorlin
- Brazilian Biorenewables National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, São Paulo, Brazil
| | - Gabriel Gouvea Slade
- Theoretical Biophysics Laboratory, Institute of Exact Sciences, Natural and Education, Federal University of Triângulo Mineiro, Uberaba, Minas Gerais, Brazil
| | - Vitor Barbanti Pereira Leite
- Department of Physics, Institute of Biosciences, Letters and Exact Sciences, São Paulo State University, São José do Rio Preto, São Paulo, Brazil.
| | - Roberto Ruller
- Microorganisms and General Biochemistry Laboratory, Institute of Bioscience, Federal University of Mato Grosso do Sul, Campo Grande, Mato Grosso do Sul, Brazil
| |
Collapse
|
29
|
Marques SM, Planas-Iglesias J, Damborsky J. Web-based tools for computational enzyme design. Curr Opin Struct Biol 2021; 69:19-34. [PMID: 33667757 DOI: 10.1016/j.sbi.2021.01.010] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 01/14/2021] [Accepted: 01/27/2021] [Indexed: 12/30/2022]
Abstract
Enzymes are in high demand for very diverse biotechnological applications. However, natural biocatalysts often need to be engineered for fine-tuning their properties towards the end applications, such as the activity, selectivity, stability to temperature or co-solvents, and solubility. Computational methods are increasingly used in this task, providing predictions that narrow down the space of possible mutations significantly and can enormously reduce the experimental burden. Many computational tools are available as web-based platforms, making them accessible to non-expert users. These platforms are typically user-friendly, contain walk-throughs, and do not require deep expertise and installations. Here we describe some of the most recent outstanding web-tools for enzyme engineering and formulate future perspectives in this field.
Collapse
Affiliation(s)
- Sérgio M Marques
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/C13, 625 00 Brno, Czech Republic; International Centre for Clinical Research, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Joan Planas-Iglesias
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/C13, 625 00 Brno, Czech Republic; International Centre for Clinical Research, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/C13, 625 00 Brno, Czech Republic; International Centre for Clinical Research, St. Anne's University Hospital Brno, Pekarska 53, 656 91 Brno, Czech Republic.
| |
Collapse
|
30
|
SAAFEC-SEQ: A Sequence-Based Method for Predicting the Effect of Single Point Mutations on Protein Thermodynamic Stability. Int J Mol Sci 2021; 22:ijms22020606. [PMID: 33435356 PMCID: PMC7827184 DOI: 10.3390/ijms22020606] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2020] [Revised: 12/23/2020] [Accepted: 01/06/2021] [Indexed: 01/04/2023] Open
Abstract
Modeling the effect of mutations on protein thermodynamics stability is useful for protein engineering and understanding molecular mechanisms of disease-causing variants. Here, we report a new development of the SAAFEC method, the SAAFEC-SEQ, which is a gradient boosting decision tree machine learning method to predict the change of the folding free energy caused by amino acid substitutions. The method does not require the 3D structure of the corresponding protein, but only its sequence and, thus, can be applied on genome-scale investigations where structural information is very sparse. SAAFEC-SEQ uses physicochemical properties, sequence features, and evolutionary information features to make the predictions. It is shown to consistently outperform all existing state-of-the-art sequence-based methods in both the Pearson correlation coefficient and root-mean-squared-error parameters as benchmarked on several independent datasets. The SAAFEC-SEQ has been implemented into a web server and is available as stand-alone code that can be downloaded and embedded into other researchers’ code.
Collapse
|
31
|
Runthala A, Sai TH, Kamjula V, Phulara SC, Rajput VS, Sangapillai K. Excavating the functionally crucial active-site residues of the DXS protein of Bacillus subtilis by exploring its closest homologues. J Genet Eng Biotechnol 2020; 18:76. [PMID: 33242110 PMCID: PMC7691408 DOI: 10.1186/s43141-020-00087-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Accepted: 10/21/2020] [Indexed: 11/10/2022]
Abstract
Abstract
Background
To achieve a high yield of terpenoid-based therapeutics, 1-deoxy-d-xylulose-5-phosphate (DXP) pathway has been significantly exploited for the production of downstream enzymes. The DXP synthase (DXS) enzyme, the initiator of this pathway, is pivotal for the convergence of carbon flux, and is computationally studied well for the industrially utilized generally regarded as safe (GRAS) bacterium Bacillus subtilis to decode its vital regions for aiding the construction of a functionally improved mutant library.
Results
For the 546 sequence dataset of DXS sequences, a representative set of 108 sequences is created, and it shows a significant evolutionary divergence across different species clubbed into 37 clades, whereas three clades are observed for the 76 sequence dataset of Bacillus subtilis. The DXS enzyme, sharing a statistically significant homology to transketolase, is shown to be evolutionarily too distant. By the mutual information-based co-evolutionary network and hotspot analysis, the most crucial loci within the active site are deciphered. The 650-residue representative structure displays a complete conservation of 114 loci, and only two co-evolving residues ASP154 and ILE371 are found to be the conserved ones. Lastly, P318D is predicted to be the top-ranked mutation causing the increase in the thermodynamic stability of 6OUW.
Conclusion
The study excavates the vital functional, phylogenetic, and conserved residues across the active site of the DXS protein, the key rate-limiting controller of the entire pathway. It would aid to computationally understand the evolutionary landscape of this industrially useful enzyme and would allow us to widen its substrate repertoire to increase the enzymatic yield of unnatural molecules for in vivo and in vitro applications.
Collapse
|