1
|
Pandey P, Alexov E. Most Monogenic Disorders Are Caused by Mutations Altering Protein Folding Free Energy. Int J Mol Sci 2024; 25:1963. [PMID: 38396641 PMCID: PMC10888012 DOI: 10.3390/ijms25041963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 01/31/2024] [Accepted: 02/02/2024] [Indexed: 02/25/2024] Open
Abstract
Revealing the molecular effect that pathogenic missense mutations have on the corresponding protein is crucial for developing therapeutic solutions. This is especially important for monogenic diseases since, for most of them, there is no treatment available, while typically, the treatment should be provided in the early development stages. This requires fast targeted drug development at a low cost. Here, we report an updated database of monogenic disorders (MOGEDO), which includes 768 proteins and the corresponding 2559 pathogenic and 1763 benign mutations, along with the functional classification of the corresponding proteins. Using the database and various computational tools that predict folding free energy change (ΔΔG), we demonstrate that, on average, 70% of pathogenic cases result in decreased protein stability. Such a large fraction indicates that one should aim at in silico screening for small molecules stabilizing the structure of the mutant protein. We emphasize that knowledge of ΔΔG is essential because one wants to develop stabilizers that compensate for ΔΔG, but do not make protein over-stable, since over-stable protein may be dysfunctional. We demonstrate that, by using ΔΔG and predicted solvent exposure of the mutation site, one can develop a predictive method that distinguishes pathogenic from benign mutations with a success rate even better than some of the leading pathogenicity predictors. Furthermore, hydrophobic-hydrophobic mutations have stronger correlations between folding free energy change and pathogenicity compared with others. Also, mutations involving Cys, Gly, Arg, Trp, and Tyr amino acids being replaced by any other amino acid are more likely to be pathogenic. To facilitate further detection of pathogenic mutations, the wild type of amino acids in the 768 proteins mentioned above was mutated to other 19 residues (14,847,817 mutations), the ΔΔG was calculated with SAAFEC-SEQ, and 5,506,051 mutations were predicted to be pathogenic.
Collapse
Affiliation(s)
| | - Emil Alexov
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA;
| |
Collapse
|
2
|
Pandey P, Alexov E. Most monogenic disorders are caused by mutations altering protein folding free energy. RESEARCH SQUARE 2023:rs.3.rs-3442589. [PMID: 37886551 PMCID: PMC10602188 DOI: 10.21203/rs.3.rs-3442589/v1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2023]
Abstract
Revealing the molecular effect that pathogenic missense mutations cause on the corresponding protein is crucial for developing therapeutic solutions. This is especially important for monogenic diseases since, for most of them, there is no treatment available, while typically, the treatment should be provided in the early development stages. This requires fast, targeted drug development at a low cost. Here, we report a database of monogenic disorders (MOGEDO), which includes 768 proteins, the corresponding 2559 pathogenic and 1763 benign mutations, along with the functional classification of the corresponding proteins. Using the database and various computational tools that predict folding free energy change (ΔΔG), we demonstrate that, on average, 70% of pathogenic cases result in decreased protein stability. Such a large fraction indicates that one should aim at in-silico screening for small molecules stabilizing the structure of the mutant protein. We emphasize that knowledge of ΔΔG is essential because one wants to develop stabilizers that compensate for ΔΔG but not to make protein over-stable since over-stable protein may be dysfunctional. We demonstrate that using ΔΔG and predicted solvent exposure of the mutation site; one can develop a predictive method that distinguishes pathogenic from benign mutation with a success rate even better than some of the leading pathogenicity predictors. Furthermore, hydrophobic-hydrophobic mutations have stronger correlations between folding free energy change and pathogenicity compared with others. Also, mutations involving Cys, Gly, Arg, Trp and Tyr amino acids being replaced by any other amino acid are more likely to be pathogenic. To facilitate further detection of pathogenic mutations, the wild type of amino acids in the 768 proteins mentioned above was mutated to other 19 residues (14,847,817 mutations), and the ΔΔG was calculated with SAAFEC-SEQ, and 5,506,051 mutations were predicted to be pathogenic.
Collapse
|
3
|
Guleken Z, Ceylan Z, Aday A, Bayrak AG, Hindilerden İY, Nalçacı M, Jakubczyk P, Jakubczyk D, Kula-Maximenko M, Depciuch J. Detection of primary myelofibrosis in blood serum via Raman spectroscopy assisted by machine learning approaches; correlation with clinical diagnosis. NANOMEDICINE : NANOTECHNOLOGY, BIOLOGY, AND MEDICINE 2023; 53:102706. [PMID: 37633405 DOI: 10.1016/j.nano.2023.102706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 08/19/2023] [Accepted: 08/19/2023] [Indexed: 08/28/2023]
Abstract
Primary myelofibrosis (PM) is one of the myeloproliferative neoplasm, where stem cell-derived clonal neoplasms was noticed. Diagnosis of this disease is based on: physical examination, peripheral blood findings, bone marrow morphology, cytogenetics, and molecular markers. However, the molecular marker of PM, which is a mutation in the JAK2V617F gene, was observed also in other myeloproliferative neoplasms such as polycythemia vera and essential thrombocythemia. Therefore, there is a need to find methods that provide a marker unique to PM and allow for higher accuracy of PM diagnosis and consequently the treatment of the disease. Continuing, in this study, we used Raman spectroscopy, Principal Components Analysis (PCA), and Partial Least Squares (PLS) analysis as helpful diagnostic tools for PM. Consequently, we used serum collected from PM patients, which were classified using clinical parameters of PM such as the dynamic international prognostic scoring system (DIPSS) for primary myelofibrosis plus score, the JAK2V617F mutation, spleen size, bone marrow reticulin fibrosis degree and use of hydroxyurea drug features. Raman spectra showed higher amounts of C-H, C-C and C-C/C-N and amide II and lower amounts of amide I and vibrations of CH3 groups in PM patients than in healthy ones. Furthermore, shifts of amides II and I vibrations in PM patients were noticed. Machine learning methods were used to analyze Raman regions: (i) 800 cm-1 and 1800 cm-1, (ii) 1600 cm-1-1700 cm-1, and (iii) 2700 cm-1-3000 cm-1 showed 100 % accuracy, sensitivity, and specificity. Differences in the spectral dynamic showed that differences in the amide II and amide I regions were the most significant in distinguishing between PM and healthy subjects. Importantly, until now, the efficacy of Raman spectroscopy has not been established in clinical diagnostics of PM disease using the correlation between Raman spectra and PM clinical prognostic scoring. Continuing, our results showed the correlation between Raman signals and bone marrow fibrosis, as well as JAKV617F. Consequently, the results revealed that Raman spectroscopy has a high potential for use in medical laboratory diagnostics to quantify multiple biomarkers simultaneously, especially in the selected Raman regions.
Collapse
Affiliation(s)
- Zozan Guleken
- Faculty of Medicine, Department of Physiology, Gaziantep Islam Science and Technology University, Gaziantep, Turkey; Faculty of Medicine, Rzeszów University, Rzeszów, Poland.
| | - Zeynep Ceylan
- Samsun University, Faculty of Engineering, Department of Industrial Engineering, Samsun, Turkey
| | - Aynur Aday
- Istanbul University, Faculty of Medicine, Department of Internal Medicine, Division of Medical Genetics, Turkey
| | - Ayşe Gül Bayrak
- Istanbul University, Faculty of Medicine, Department of Internal Medicine, Division of Medical Genetics, Turkey
| | - İpek Yönal Hindilerden
- Istanbul University Istanbul Faculty of Medicine, Department of Internal Medicine, Division of Hematology, Turkey
| | - Meliha Nalçacı
- Istanbul University Istanbul Faculty of Medicine, Department of Internal Medicine, Division of Hematology, Turkey
| | | | - Dorota Jakubczyk
- Faculty of Mathematics and Applied Physics, Rzeszow University of Technology, Powstancow Warszawy 12, PL-35959 Rzeszow, Poland
| | - Monika Kula-Maximenko
- Institute of Plant Physiology, Polish Academy of Sciences, Niezapominajek 21, 30-239 Kraków, Poland
| | - Joanna Depciuch
- Institute of Nuclear Physics, PAS, 31342 Krakow, Poland; Department of Biochemistry and Molecular Biology, Medical University of Lublin, 20-093 Lublin, Poland.
| |
Collapse
|
4
|
Kang K, Wang L, Song C. ProtRAP: Predicting Lipid Accessibility Together with Solvent Accessibility of Proteins in One Run. J Chem Inf Model 2023; 63:1058-1065. [PMID: 36693122 DOI: 10.1021/acs.jcim.2c01235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Solvent accessibility has been extensively used to characterize and predict the chemical properties of the surface residues of soluble proteins. However, there is not yet a widely accepted quantity of the same dimension for the study of lipid-accessible residues of membrane proteins. In this study, we propose that lipid accessibility, defined in a similar way to solvent accessibility, can be used to characterize the lipid-accessible residues of membrane proteins. Moreover, we developed a deep learning-based method, ProtRAP (Protein Relative Accessibility Predictor), to predict the relative lipid accessibility and relative solvent accessibility of residues from a given protein sequence, which can infer which residues are likely accessible to lipids, accessible to solvent, or buried in the protein interior in one run.
Collapse
Affiliation(s)
- Kai Kang
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing100871, China.,Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing100871, China
| | - Lei Wang
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing100871, China.,Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing100871, China
| | - Chen Song
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing100871, China.,Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing100871, China
| |
Collapse
|
5
|
Babbi G, Savojardo C, Baldazzi D, Martelli PL, Casadio R. Pathogenic variation types in human genes relate to diseases through Pfam and InterPro mapping. Front Mol Biosci 2022; 9:966927. [PMID: 36188216 PMCID: PMC9523224 DOI: 10.3389/fmolb.2022.966927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Accepted: 08/31/2022] [Indexed: 11/13/2022] Open
Abstract
Grouping residue variations in a protein according to their physicochemical properties allows a dimensionality reduction of all the possible substitutions in a variant with respect to the wild type. Here, by using a large dataset of proteins with disease-related and benign variations, as derived by merging Humsavar and ClinVar data, we investigate to which extent our physicochemical grouping procedure can help in determining whether patterns of variation types are related to specific groups of diseases and whether they occur in Pfam and/or InterPro gene domains. Here, we download 75,145 germline disease-related and benign variations of 3,605 genes, group them according to physicochemical categories and map them into Pfam and InterPro gene domains. Statistically validated analysis indicates that each cluster of genes associated to Mondo anatomical system categorizations is characterized by a specific variation pattern. Patterns identify specific Pfam and InterPro domain–Mondo category associations. Our data suggest that the association of variation patterns to Mondo categories is unique and may help in associating gene variants to genetic diseases. This work corroborates in a much larger data set previous observations from our group.
Collapse
Affiliation(s)
- Giulia Babbi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | | | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
- *Correspondence: Pier Luigi Martelli, ; Rita Casadio,
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Italian National Research Council (CNR), Bari, Italy
- *Correspondence: Pier Luigi Martelli, ; Rita Casadio,
| |
Collapse
|
6
|
Casadio R, Savojardo C, Fariselli P, Capriotti E, Martelli PL. Turning Failures into Applications: The Problem of Protein ΔΔG Prediction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2449:169-185. [PMID: 35507262 DOI: 10.1007/978-1-0716-2095-3_6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
After nearly two decades of research in the field of computational methods based on machine learning and knowledge-based potentials for ΔG and ΔΔG prediction upon variations, we now realize that all the approaches are poorly performing when tested on specific cases and that there is large space for improvement. Why this is so? Is it wrong the underlying assumption that experimental protein thermodynamics in solution reflects the thermodynamics of a single protein? Both machine learning and knowledge-based computational methods are rigorous and we know the solid theory behind. We are now in a critical situation, which suggests that predictions of protein instability upon variation should be considered with care. In the following, we will show how to cope with the problem of understanding which protein positions may be of interest for biotechnological and biomedical purposes. By applying a consensus procedure, we indicate possible strategies for the result interpretation.
Collapse
Affiliation(s)
- Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy.
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Turin, Italy
| | - Emidio Capriotti
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| |
Collapse
|
7
|
Wu X, Xu LY, Li EM, Dong G. Application of molecular dynamics simulation in biomedicine. Chem Biol Drug Des 2022; 99:789-800. [PMID: 35293126 DOI: 10.1111/cbdd.14038] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Revised: 02/25/2022] [Accepted: 03/05/2022] [Indexed: 02/05/2023]
Abstract
Molecular dynamics (MD) simulation has been widely used in the field of biomedicine to study the conformational transition of proteins caused by mutation or ligand binding/unbinding. It provides some perspectives those are difficult to find in traditional biochemical or pathological experiments, for example, detailed effects of mutations on protein structure and protein-protein/ligand interaction at the atomic level. In this review, a broad overview on conformation changes and drug discovery by MD simulation is given. We first discuss the preparation of protein structure for MD simulation, which is a key step that determines the accuracy of the simulation. Then, we summarize the applications of commonly used force fields and MD simulations in scientific research. Finally, enhanced sampling methods and common applications of these methods are introduced. In brief, MD simulation is a powerful tool and it can be used to guide experimental study. The combination of MD simulation and experimental techniques is an a priori means to solve the biomedical problems and give a deep understanding on the relationship between protein structure and function.
Collapse
Affiliation(s)
- Xiaodong Wu
- Department of Biochemistry and Molecular Biology, Shantou University Medical College, Shantou, China
| | - Li-Yan Xu
- Key Laboratory of Molecular Biology in High Cancer Incidence Coastal Area of Guangdong Higher Education Institutes, Shantou University Medical College, Shantou, China
- Cancer Research Center, Shantou University Medical College, Shantou, China
| | - En-Min Li
- Department of Biochemistry and Molecular Biology, Shantou University Medical College, Shantou, China
- Key Laboratory of Molecular Biology in High Cancer Incidence Coastal Area of Guangdong Higher Education Institutes, Shantou University Medical College, Shantou, China
| | - Geng Dong
- Department of Biochemistry and Molecular Biology, Shantou University Medical College, Shantou, China
- Medical Informatics Research Center, Shantou University Medical College, Shantou, China
| |
Collapse
|
8
|
Biru EI, Necolau MI, Zainea A, Iovu H. Graphene Oxide-Protein-Based Scaffolds for Tissue Engineering: Recent Advances and Applications. Polymers (Basel) 2022; 14:1032. [PMID: 35267854 PMCID: PMC8914712 DOI: 10.3390/polym14051032] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 02/27/2022] [Accepted: 03/01/2022] [Indexed: 01/27/2023] Open
Abstract
The field of tissue engineering is constantly evolving as it aims to develop bioengineered and functional tissues and organs for repair or replacement. Due to their large surface area and ability to interact with proteins and peptides, graphene oxides offer valuable physiochemical and biological features for biomedical applications and have been successfully employed for optimizing scaffold architectures for a wide range of organs, from the skin to cardiac tissue. This review critically focuses on opportunities to employ protein-graphene oxide structures either as nanocomposites or as biocomplexes and highlights the effects of carbonaceous nanostructures on protein conformation and structural stability for applications in tissue engineering and regenerative medicine. Herein, recent applications and the biological activity of nanocomposite bioconjugates are analyzed with respect to cell viability and proliferation, along with the ability of these constructs to sustain the formation of new and functional tissue. Novel strategies and approaches based on stem cell therapy, as well as the involvement of the extracellular matrix in the design of smart nanoplatforms, are discussed.
Collapse
Affiliation(s)
- Elena Iuliana Biru
- Advanced Polymer Materials Group, Department of Bioresources and Polymer Science, University Politehnica of Bucharest, 1-7 Gh. Polizu Street, 011061 Bucharest, Romania; (E.I.B.); (M.I.N.); (A.Z.)
| | - Madalina Ioana Necolau
- Advanced Polymer Materials Group, Department of Bioresources and Polymer Science, University Politehnica of Bucharest, 1-7 Gh. Polizu Street, 011061 Bucharest, Romania; (E.I.B.); (M.I.N.); (A.Z.)
| | - Adriana Zainea
- Advanced Polymer Materials Group, Department of Bioresources and Polymer Science, University Politehnica of Bucharest, 1-7 Gh. Polizu Street, 011061 Bucharest, Romania; (E.I.B.); (M.I.N.); (A.Z.)
| | - Horia Iovu
- Advanced Polymer Materials Group, Department of Bioresources and Polymer Science, University Politehnica of Bucharest, 1-7 Gh. Polizu Street, 011061 Bucharest, Romania; (E.I.B.); (M.I.N.); (A.Z.)
- Academy of Romanian Scientists, 54 Splaiul Independentei Street, 050094 Bucharest, Romania
| |
Collapse
|
9
|
Savojardo C, Babbi G, Baldazzi D, Martelli PL, Casadio R. A Glance into MTHFR Deficiency at a Molecular Level. Int J Mol Sci 2021; 23:167. [PMID: 35008593 PMCID: PMC8745156 DOI: 10.3390/ijms23010167] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Revised: 12/03/2021] [Accepted: 12/21/2021] [Indexed: 12/16/2022] Open
Abstract
MTHFR deficiency still deserves an investigation to associate the phenotype to protein structure variations. To this aim, considering the MTHFR wild type protein structure, with a catalytic and a regulatory domain and taking advantage of state-of-the-art computational tools, we explore the properties of 72 missense variations known to be disease associated. By computing the thermodynamic ΔΔG change according to a consensus method that we recently introduced, we find that 61% of the disease-related variations destabilize the protein, are present both in the catalytic and regulatory domain and correspond to known biochemical deficiencies. The propensity of solvent accessible residues to be involved in protein-protein interaction sites indicates that most of the interacting residues are located in the regulatory domain, and that only three of them, located at the interface of the functional protein homodimer, are both disease-related and destabilizing. Finally, we compute the protein architecture with Hidden Markov Models, one from Pfam for the catalytic domain and the second computed in house for the regulatory domain. We show that patterns of disease-associated, physicochemical variation types, both in the catalytic and regulatory domains, are unique for the MTHFR deficiency when mapped into the protein architecture.
Collapse
Affiliation(s)
- Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy; (C.S.); (G.B.); (D.B.); (R.C.)
| | - Giulia Babbi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy; (C.S.); (G.B.); (D.B.); (R.C.)
| | - Davide Baldazzi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy; (C.S.); (G.B.); (D.B.); (R.C.)
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy; (C.S.); (G.B.); (D.B.); (R.C.)
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy; (C.S.); (G.B.); (D.B.); (R.C.)
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Italian National Research Council (CNR), 70126 Bari, Italy
| |
Collapse
|
10
|
Manfredi M, Savojardo C, Martelli PL, Casadio R. DeepREx-WS: A web server for characterising protein-solvent interaction starting from sequence. Comput Struct Biotechnol J 2021; 19:5791-5799. [PMID: 34765094 PMCID: PMC8566768 DOI: 10.1016/j.csbj.2021.10.016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 10/07/2021] [Accepted: 10/07/2021] [Indexed: 11/23/2022] Open
Abstract
Protein–solvent interaction provides important features for protein surface engineering when the structure is absent or partially solved. Presently, we can integrate the notion of solvent exposed/buried residues with that of their flexibility and intrinsic disorder to highlight regions where mutations may increase or decrease protein stability in order to modify proteins for biotechnological reasons, while preserving their functional integrity. Here we describe a web server, which provides the unique possibility of integrating knowledge of solvent and non-solvent exposure with that of residue conservation, flexibility and disorder of a protein sequence, for a better understanding of which regions are relevant for protein integrity. The core of the webserver is DeepREx, a novel deep learning-based tool that classifies each residue in the sequence as buried or exposed. DeepREx is trained on a high-quality, non-redundant dataset derived from the Protein Data Bank comprising 2332 monomeric protein chains and benchmarked on a blind test set including 200 protein sequences unrelated with the training set. Results show that DeepREx performs at the state-of-the-art in the field. In turn, the Web Server, DeepREx-WS, supplements the predictions of DeepREx with features that allow a better characterisation of exposed and buried regions: i) residue conservation derived from multiple sequence alignment; ii) local sequence hydrophobicity; iii) residue flexibility computed with MEDUSA; iv) a predictor of secondary structure; v) the presence of disordered regions as derived from MobiDB-Lite3.0. The web server allows browsing, selecting and intersecting the different features. We demonstrate a possible application of the DeepREx-WS for assisting the identification of residues to be variated in protein surface engineering processes.
Collapse
Affiliation(s)
- Matteo Manfredi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
- Corresponding author.
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Italian National Research Council (CNR), Bari, Italy
| |
Collapse
|
11
|
Savojardo C, Babbi G, Martelli PL, Casadio R. Mapping OMIM Disease-Related Variations on Protein Domains Reveals an Association Among Variation Type, Pfam Models, and Disease Classes. Front Mol Biosci 2021; 8:617016. [PMID: 34026820 PMCID: PMC8138129 DOI: 10.3389/fmolb.2021.617016] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 04/09/2021] [Indexed: 12/23/2022] Open
Abstract
Human genome resequencing projects provide an unprecedented amount of data about single-nucleotide variations occurring in protein-coding regions and often leading to observable changes in the covalent structure of gene products. For many of these variations, links to Online Mendelian Inheritance in Man (OMIM) genetic diseases are available and are reported in many databases that are collecting human variation data such as Humsavar. However, the current knowledge on the molecular mechanisms that are leading to diseases is, in many cases, still limited. For understanding the complex mechanisms behind disease insurgence, the identification of putative models, when considering the protein structure and chemico-physical features of the variations, can be useful in many contexts, including early diagnosis and prognosis. In this study, we investigate the occurrence and distribution of human disease–related variations in the context of Pfam domains. The aim of this study is the identification and characterization of Pfam domains that are statistically more likely to be associated with disease-related variations. The study takes into consideration 2,513 human protein sequences with 22,763 disease-related variations. We describe patterns of disease-related variation types in biunivocal relation with Pfam domains, which are likely to be possible markers for linking Pfam domains to OMIM diseases. Furthermore, we take advantage of the specific association between disease-related variation types and Pfam domains for clustering diseases according to the Human Disease Ontology, and we establish a relation among variation types, Pfam domains, and disease classes. We find that Pfam models are specific markers of patterns of variation types and that they can serve to bridge genes, diseases, and disease classes. Data are available as Supplementary Material for 1,670 Pfam models, including 22,763 disease-related variations associated to 3,257 OMIM diseases.
Collapse
Affiliation(s)
- Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Giulia Babbi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy.,Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Bari, Italy
| |
Collapse
|
12
|
Savojardo C, Manfredi M, Martelli PL, Casadio R. Solvent Accessibility of Residues Undergoing Pathogenic Variations in Humans: From Protein Structures to Protein Sequences. Front Mol Biosci 2021; 7:626363. [PMID: 33490109 PMCID: PMC7817970 DOI: 10.3389/fmolb.2020.626363] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Accepted: 12/07/2020] [Indexed: 01/08/2023] Open
Abstract
Solvent accessibility (SASA) is a key feature of proteins for determining their folding and stability. SASA is computed from protein structures with different algorithms, and from protein sequences with machine-learning based approaches trained on solved structures. Here we ask the question as to which extent solvent exposure of residues can be associated to the pathogenicity of the variation. By this, SASA of the wild-type residue acquires a role in the context of functional annotation of protein single-residue variations (SRVs). By mapping variations on a curated database of human protein structures, we found that residues targeted by disease related SRVs are less accessible to solvent than residues involved in polymorphisms. The disease association is not evenly distributed among the different residue types: SRVs targeting glycine, tryptophan, tyrosine, and cysteine are more frequently disease associated than others. For all residues, the proportion of disease related SRVs largely increases when the wild-type residue is buried and decreases when it is exposed. The extent of the increase depends on the residue type. With the aid of an in house developed predictor, based on a deep learning procedure and performing at the state-of-the-art, we are able to confirm the above tendency by analyzing a large data set of residues subjected to variations and occurring in some 12,494 human protein sequences still lacking three-dimensional structure (derived from HUMSAVAR). Our data support the notion that surface accessible area is a distinguished property of residues that undergo variation and that pathogenicity is more frequently associated to the buried property than to the exposed one.
Collapse
Affiliation(s)
- Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnologies, University of Bologna, Bologna, Italy
| | - Matteo Manfredi
- Biocomputing Group, Department of Pharmacy and Biotechnologies, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnologies, University of Bologna, Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnologies, University of Bologna, Bologna, Italy.,Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies of the National Research Council, Bari, Italy
| |
Collapse
|
13
|
Protein-Protein Interactions Mediated by Intrinsically Disordered Protein Regions Are Enriched in Missense Mutations. Biomolecules 2020; 10:biom10081097. [PMID: 32722039 PMCID: PMC7463635 DOI: 10.3390/biom10081097] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2020] [Revised: 07/15/2020] [Accepted: 07/20/2020] [Indexed: 12/27/2022] Open
Abstract
Because proteins are fundamental to most biological processes, many genetic diseases can be traced back to single nucleotide variants (SNVs) that cause changes in protein sequences. However, not all SNVs that result in amino acid substitutions cause disease as each residue is under different structural and functional constraints. Influential studies have shown that protein–protein interaction interfaces are enriched in disease-associated SNVs and depleted in SNVs that are common in the general population. These studies focus primarily on folded (globular) protein domains and overlook the prevalent class of protein interactions mediated by intrinsically disordered regions (IDRs). Therefore, we investigated the enrichment patterns of missense mutation-causing SNVs that are associated with disease and cancer, as well as those present in the healthy population, in structures of IDR-mediated interactions with comparisons to classical globular interactions. When comparing the different categories of interaction interfaces, division of the interface regions into solvent-exposed rim residues and buried core residues reveal distinctive enrichment patterns for the various types of missense mutations. Most notably, we demonstrate a strong enrichment at the interface core of interacting IDRs in disease mutations and its depletion in neutral ones, which supports the view that the disruption of IDR interactions is a mechanism underlying many diseases. Intriguingly, we also found an asymmetry across the IDR interaction interface in the enrichment of certain missense mutation types, which may hint at an increased variant tolerance and urges further investigations of IDR interactions.
Collapse
|