1
|
Velloso JPL, de Sá AGC, Pires DEV, Ascher DB. Engineering G protein-coupled receptors for stabilization. Protein Sci 2024; 33:e5000. [PMID: 38747401 PMCID: PMC11094779 DOI: 10.1002/pro.5000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Revised: 03/21/2024] [Accepted: 04/10/2024] [Indexed: 05/19/2024]
Abstract
G protein-coupled receptors (GPCRs) are one of the most important families of targets for drug discovery. One of the limiting steps in the study of GPCRs has been their stability, with significant and time-consuming protein engineering often used to stabilize GPCRs for structural characterization and drug screening. Unfortunately, computational methods developed using globular soluble proteins have translated poorly to the rational engineering of GPCRs. To fill this gap, we propose GPCR-tm, a novel and personalized structurally driven web-based machine learning tool to study the impacts of mutations on GPCR stability. We show that GPCR-tm performs as well as or better than alternative methods, and that it can accurately rank the stability changes of a wide range of mutations occurring in various types of class A GPCRs. GPCR-tm achieved Pearson's correlation coefficients of 0.74 and 0.46 on 10-fold cross-validation and blind test sets, respectively. We observed that the (structural) graph-based signatures were the most important set of features for predicting destabilizing mutations, which points out that these signatures properly describe the changes in the environment where the mutations occur. More specifically, GPCR-tm was able to accurately rank mutations based on their effect on protein stability, guiding their rational stabilization. GPCR-tm is available through a user-friendly web server at https://biosig.lab.uq.edu.au/gpcr_tm/.
Collapse
|
2
|
Ediriweera GR, Butcher NJ, Kothapalli A, Zhao J, Blanchfield JT, Subasic CN, Grace JL, Fu C, Tan X, Quinn JF, Ascher DB, Whittaker MR, Whittaker AK, Kaminskas LM. Lipid sulfoxide polymers as potential inhalable drug delivery platforms with differential albumin binding affinity. Biomater Sci 2024; 12:2978-2992. [PMID: 38683548 DOI: 10.1039/d3bm02020g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2024]
Abstract
Inhalable nanomedicines are increasingly being developed to optimise the pharmaceutical treatment of respiratory diseases. Large lipid-based nanosystems at the forefront of the inhalable nanomedicines development pipeline, though, have a number of limitations. The objective of this study was, therefore, to investigate the utility of novel small lipidated sulfoxide polymers based on poly(2-(methylsulfinyl)ethyl acrylate) (PMSEA) as inhalable drug delivery platforms with tuneable membrane permeability imparted by differential albumin binding kinetics. Linear PMSEA (5 kDa) was used as a hydrophilic polymer backbone with excellent anti-fouling and stealth properties compared to poly(ethylene glycol). Terminal lipids comprising single (1C2, 1C12) or double (2C12) chain diglycerides were installed to provide differing affinities for albumin and, by extension, albumin trafficking pathways in the lungs. Albumin binding kinetics, cytotoxicity, lung mucus penetration and cellular uptake and permeability through key cellular barriers in the lungs were examined in vitro. The polymers showed good mucus penetration and no cytotoxicity over 24 h at up to 1 mg ml-1. While 1C2-showed no interaction with albumin, 1C12-PMSEA and 2C12-PMSEA bound albumin with KD values of approximately 76 and 10 μM, respectively. Despite binding to albumin, 2C12-PMSEA showed reduced cell uptake and membrane permeability compared to the smaller polymers and the presence of albumin had little effect on cell uptake and membrane permeability. While PMSEA strongly shielded these lipids from albumin, the data suggest that there is scope to tune the lipid component of these systems to control membrane permeability and cellular interactions in the lungs to tailor drug disposition in the lungs.
Collapse
|
3
|
Zhou Y, Myung Y, Rodrigues CHM, Ascher DB. DDMut-PPI: predicting effects of mutations on protein-protein interactions using graph-based deep learning. Nucleic Acids Res 2024:gkae412. [PMID: 38783112 DOI: 10.1093/nar/gkae412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 04/30/2024] [Accepted: 05/02/2024] [Indexed: 05/25/2024] Open
Abstract
Protein-protein interactions (PPIs) play a vital role in cellular functions and are essential for therapeutic development and understanding diseases. However, current predictive tools often struggle to balance efficiency and precision in predicting the effects of mutations on these complex interactions. To address this, we present DDMut-PPI, a deep learning model that efficiently and accurately predicts changes in PPI binding free energy upon single and multiple point mutations. Building on the robust Siamese network architecture with graph-based signatures from our prior work, DDMut, the DDMut-PPI model was enhanced with a graph convolutional network operated on the protein interaction interface. We used residue-specific embeddings from ProtT5 protein language model as node features, and a variety of molecular interactions as edge features. By integrating evolutionary context with spatial information, this framework enables DDMut-PPI to achieve a robust Pearson correlation of up to 0.75 (root mean squared error: 1.33 kcal/mol) in our evaluations, outperforming most existing methods. Importantly, the model demonstrated consistent performance across mutations that increase or decrease binding affinity. DDMut-PPI offers a significant advancement in the field and will serve as a valuable tool for researchers probing the complexities of protein interactions. DDMut-PPI is freely available as a web server and an application programming interface at https://biosig.lab.uq.edu.au/ddmut_ppi.
Collapse
|
4
|
King HR, Bycroft M, Nguyen TB, Kelly G, Vinogradov AA, Rowling PJE, Stott K, Ascher DB, Suga H, Itzhaki LS, Artavanis-Tsakonas K. Targeting the Plasmodium falciparum UCHL3 ubiquitin hydrolase using chemically constrained peptides. Proc Natl Acad Sci U S A 2024; 121:e2322923121. [PMID: 38739798 PMCID: PMC11126973 DOI: 10.1073/pnas.2322923121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 03/18/2024] [Indexed: 05/16/2024] Open
Abstract
The ubiquitin-proteasome system is essential to all eukaryotes and has been shown to be critical to parasite survival as well, including Plasmodium falciparum, the causative agent of the deadliest form of malarial disease. Despite the central role of the ubiquitin-proteasome pathway to parasite viability across its entire life-cycle, specific inhibitors targeting the individual enzymes mediating ubiquitin attachment and removal do not currently exist. The ability to disrupt P. falciparum growth at multiple developmental stages is particularly attractive as this could potentially prevent both disease pathology, caused by asexually dividing parasites, as well as transmission which is mediated by sexually differentiated parasites. The deubiquitinating enzyme PfUCHL3 is an essential protein, transcribed across both human and mosquito developmental stages. PfUCHL3 is considered hard to drug by conventional methods given the high level of homology of its active site to human UCHL3 as well as to other UCH domain enzymes. Here, we apply the RaPID mRNA display technology and identify constrained peptides capable of binding to PfUCHL3 with nanomolar affinities. The two lead peptides were found to selectively inhibit the deubiquitinase activity of PfUCHL3 versus HsUCHL3. NMR spectroscopy revealed that the peptides do not act by binding to the active site but instead block binding of the ubiquitin substrate. We demonstrate that this approach can be used to target essential protein-protein interactions within the Plasmodium ubiquitin pathway, enabling the application of chemically constrained peptides as a novel class of antimalarial therapeutics.
Collapse
|
5
|
Ryu J, Barkal S, Yu T, Jankowiak M, Zhou Y, Francoeur M, Phan QV, Li Z, Tognon M, Brown L, Love MI, Bhat V, Lettre G, Ascher DB, Cassa CA, Sherwood RI, Pinello L. Joint genotypic and phenotypic outcome modeling improves base editing variant effect quantification. Nat Genet 2024; 56:925-937. [PMID: 38658794 DOI: 10.1038/s41588-024-01726-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 03/21/2024] [Indexed: 04/26/2024]
Abstract
CRISPR base editing screens enable analysis of disease-associated variants at scale; however, variable efficiency and precision confounds the assessment of variant-induced phenotypes. Here, we provide an integrated experimental and computational pipeline that improves estimation of variant effects in base editing screens. We use a reporter construct to measure guide RNA (gRNA) editing outcomes alongside their phenotypic consequences and introduce base editor screen analysis with activity normalization (BEAN), a Bayesian network that uses per-guide editing outcomes provided by the reporter and target site chromatin accessibility to estimate variant impacts. BEAN outperforms existing tools in variant effect quantification. We use BEAN to pinpoint common regulatory variants that alter low-density lipoprotein (LDL) uptake, implicating previously unreported genes. Additionally, through saturation base editing of LDLR, we accurately quantify missense variant pathogenicity that is consistent with measurements in UK Biobank patients and identify underlying structural mechanisms. This work provides a widely applicable approach to improve the power of base editing screens for disease-associated variant characterization.
Collapse
|
6
|
Gu X, Kovacs AS, Myung Y, Ascher DB. Mutations in Glycosyltransferases and Glycosidases: Implications for Associated Diseases. Biomolecules 2024; 14:497. [PMID: 38672513 PMCID: PMC11048727 DOI: 10.3390/biom14040497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 03/20/2024] [Accepted: 03/22/2024] [Indexed: 04/28/2024] Open
Abstract
Glycosylation, a crucial and the most common post-translational modification, coordinates a multitude of biological functions through the attachment of glycans to proteins and lipids. This process, predominantly governed by glycosyltransferases (GTs) and glycoside hydrolases (GHs), decides not only biomolecular functionality but also protein stability and solubility. Mutations in these enzymes have been implicated in a spectrum of diseases, prompting critical research into the structural and functional consequences of such genetic variations. This study compiles an extensive dataset from ClinVar and UniProt, providing a nuanced analysis of 2603 variants within 343 GT and GH genes. We conduct thorough MTR score analyses for the proteins with the most documented variants using MTR3D-AF2 via AlphaFold2 (AlphaFold v2.2.4) predicted protein structure, with the analyses indicating that pathogenic mutations frequently correlate with Beta Bridge secondary structures. Further, the calculation of the solvent accessibility score and variant visualisation show that pathogenic mutations exhibit reduced solvent accessibility, suggesting the mutated residues are likely buried and their localisation is within protein cores. We also find that pathogenic variants are often found proximal to active and binding sites, which may interfere with substrate interactions. We also incorporate computational predictions to assess the impact of these mutations on protein function, utilising tools such as mCSM to predict the destabilisation effect of variants. By identifying these critical regions that are prone to disease-associated mutations, our study opens avenues for designing small molecules or biologics that can modulate enzyme function or compensate for the loss of stability due to these mutations.
Collapse
|
7
|
Myung Y, de Sá AGC, Ascher DB. Deep-PK: deep learning for small molecule pharmacokinetic and toxicity prediction. Nucleic Acids Res 2024:gkae254. [PMID: 38634808 DOI: 10.1093/nar/gkae254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Revised: 03/20/2024] [Accepted: 04/10/2024] [Indexed: 04/19/2024] Open
Abstract
Evaluating pharmacokinetic properties of small molecules is considered a key feature in most drug development and high-throughput screening processes. Generally, pharmacokinetics, which represent the fate of drugs in the human body, are described from four perspectives: absorption, distribution, metabolism and excretion-all of which are closely related to a fifth perspective, toxicity (ADMET). Since obtaining ADMET data from in vitro, in vivo or pre-clinical stages is time consuming and expensive, many efforts have been made to predict ADMET properties via computational approaches. However, the majority of available methods are limited in their ability to provide pharmacokinetics and toxicity for diverse targets, ensure good overall accuracy, and offer ease of use, interpretability and extensibility for further optimizations. Here, we introduce Deep-PK, a deep learning-based pharmacokinetic and toxicity prediction, analysis and optimization platform. We applied graph neural networks and graph-based signatures as a graph-level feature to yield the best predictive performance across 73 endpoints, including 64 ADMET and 9 general properties. With these powerful models, Deep-PK supports molecular optimization and interpretation, aiding users in optimizing and understanding pharmacokinetics and toxicity for given input molecules. The Deep-PK is freely available at https://biosig.lab.uq.edu.au/deeppk/.
Collapse
|
8
|
Soh CH, de Sá AGC, Potter E, Halabi A, Ascher DB, Marwick TH. Use of the energy waveform electrocardiogram to detect subclinical left ventricular dysfunction in patients with type 2 diabetes mellitus. Cardiovasc Diabetol 2024; 23:91. [PMID: 38448993 PMCID: PMC10918872 DOI: 10.1186/s12933-024-02141-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Accepted: 01/22/2024] [Indexed: 03/08/2024] Open
Abstract
BACKGROUND Recent guidelines propose N-terminal pro-B-type natriuretic peptide (NT-proBNP) for recognition of asymptomatic left ventricular (LV) dysfunction (Stage B Heart Failure, SBHF) in type 2 diabetes mellitus (T2DM). Wavelet Transform based signal-processing transforms electrocardiogram (ECG) waveforms into an energy distribution waveform (ew)ECG, providing frequency and energy features that machine learning can use as additional inputs to improve the identification of SBHF. Accordingly, we sought whether machine learning model based on ewECG features was superior to NT-proBNP, as well as a conventional screening tool-the Atherosclerosis Risk in Communities (ARIC) HF risk score, in SBHF screening among patients with T2DM. METHODS Participants in two clinical trials of SBHF (defined as diastolic dysfunction [DD], reduced global longitudinal strain [GLS ≤ 18%] or LV hypertrophy [LVH]) in T2DM underwent 12-lead ECG with additional ewECG feature and echocardiography. Supervised machine learning was adopted to identify the optimal combination of ewECG extracted features for SBHF screening in 178 participants in one trial and tested in 97 participants in the other trial. The accuracy of the ewECG model in SBHF screening was compared with NT-proBNP and ARIC HF. RESULTS SBHF was identified in 128 (72%) participants in the training dataset (median 72 years, 41% female) and 64 (66%) in the validation dataset (median 70 years, 43% female). Fifteen ewECG features showed an area under the curve (AUC) of 0.81 (95% CI 0.787-0.794) in identifying SBHF, significantly better than both NT-proBNP (AUC 0.56, 95% CI 0.44-0.68, p < 0.001) and ARIC HF (AUC 0.67, 95%CI 0.56-0.79, p = 0.002). ewECG features were also led to robust models screening for DD (AUC 0.74, 95% CI 0.73-0.74), reduced GLS (AUC 0.76, 95% CI 0.73-0.74) and LVH (AUC 0.90, 95% CI 0.88-0.89). CONCLUSIONS Machine learning based modelling using additional ewECG extracted features are superior to NT-proBNP and ARIC HF in SBHF screening among patients with T2DM, providing an alternative HF screening strategy for asymptomatic patients and potentially act as a guidance tool to determine those who required echocardiogram to confirm diagnosis. Trial registration LEAVE-DM, ACTRN 12619001393145 and Vic-ELF, ACTRN 12617000116325.
Collapse
|
9
|
Szot JO, Cuny H, Martin EM, Sheng DZ, Iyer K, Portelli S, Nguyen V, Gereis JM, Alankarage D, Chitayat D, Chong K, Wentzensen IM, Vincent-Delormé C, Lermine A, Burkitt-Wright E, Ji W, Jeffries L, Pais LS, Tan TY, Pitt J, Wise CA, Wright H, Andrews ID, Pruniski B, Grebe TA, Corsten-Janssen N, Bouman K, Poulton C, Prakash S, Keren B, Brown NJ, Hunter MF, Heath O, Lakhani SA, McDermott JH, Ascher DB, Chapman G, Bozon K, Dunwoodie SL. A metabolic signature for NADSYN1-dependent congenital NAD deficiency disorder. J Clin Invest 2024; 134:e174824. [PMID: 38357931 PMCID: PMC10866660 DOI: 10.1172/jci174824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 12/20/2023] [Indexed: 02/16/2024] Open
Abstract
Nicotinamide adenine dinucleotide (NAD) is essential for embryonic development. To date, biallelic loss-of-function variants in 3 genes encoding nonredundant enzymes of the NAD de novo synthesis pathway - KYNU, HAAO, and NADSYN1 - have been identified in humans with congenital malformations defined as congenital NAD deficiency disorder (CNDD). Here, we identified 13 further individuals with biallelic NADSYN1 variants predicted to be damaging, and phenotypes ranging from multiple severe malformations to the complete absence of malformation. Enzymatic assessment of variant deleteriousness in vitro revealed protein domain-specific perturbation, complemented by protein structure modeling in silico. We reproduced NADSYN1-dependent CNDD in mice and assessed various maternal NAD precursor supplementation strategies to prevent adverse pregnancy outcomes. While for Nadsyn1+/- mothers, any B3 vitamer was suitable to raise NAD, preventing embryo loss and malformation, Nadsyn1-/- mothers required supplementation with amidated NAD precursors (nicotinamide or nicotinamide mononucleotide) bypassing their metabolic block. The circulatory NAD metabolome in mice and humans before and after NAD precursor supplementation revealed a consistent metabolic signature with utility for patient identification. Our data collectively improve clinical diagnostics of NADSYN1-dependent CNDD, provide guidance for the therapeutic prevention of CNDD, and suggest an ongoing need to maintain NAD levels via amidated NAD precursor supplementation after birth.
Collapse
|
10
|
Velloso JPL, Kovacs AS, Pires DEV, Ascher DB. AI-driven GPCR analysis, engineering, and targeting. Curr Opin Pharmacol 2024; 74:102427. [PMID: 38219398 DOI: 10.1016/j.coph.2023.102427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 12/12/2023] [Accepted: 12/13/2023] [Indexed: 01/16/2024]
Abstract
This article investigates the role of recent advances in Artificial Intelligence (AI) to revolutionise the study of G protein-coupled receptors (GPCRs). AI has been applied to many areas of GPCR research, including the application of machine learning (ML) in GPCR classification, prediction of GPCR activation levels, modelling GPCR 3D structures and interactions, understanding G-protein selectivity, aiding elucidation of GPCRs structures, and drug design. Despite progress, challenges in predicting GPCR structures and addressing the complex nature of GPCRs remain, providing avenues for future research and development.
Collapse
|
11
|
Serghini A, Portelli S, Troadec G, Song C, Pan Q, Pires DEV, Ascher DB. Characterizing and predicting ccRCC-causing missense mutations in Von Hippel-Lindau disease. Hum Mol Genet 2024; 33:224-232. [PMID: 37883464 PMCID: PMC10800015 DOI: 10.1093/hmg/ddad181] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Revised: 10/19/2023] [Accepted: 10/20/2023] [Indexed: 10/28/2023] Open
Abstract
BACKGROUND Mutations within the Von Hippel-Lindau (VHL) tumor suppressor gene are known to cause VHL disease, which is characterized by the formation of cysts and tumors in multiple organs of the body, particularly clear cell renal cell carcinoma (ccRCC). A major challenge in clinical practice is determining tumor risk from a given mutation in the VHL gene. Previous efforts have been hindered by limited available clinical data and technological constraints. METHODS To overcome this, we initially manually curated the largest set of clinically validated VHL mutations to date, enabling a robust assessment of existing predictive tools on an independent test set. Additionally, we comprehensively characterized the effects of mutations within VHL using in silico biophysical tools describing changes in protein stability, dynamics and affinity to binding partners to provide insights into the structure-phenotype relationship. These descriptive properties were used as molecular features for the construction of a machine learning model, designed to predict the risk of ccRCC development as a result of a VHL missense mutation. RESULTS Analysis of our model showed an accuracy of 0.81 in the identification of ccRCC-causing missense mutations, and a Matthew's Correlation Coefficient of 0.44 on a non-redundant blind test, a significant improvement in comparison to the previous available approaches. CONCLUSION This work highlights the power of using protein 3D structure to fully explore the range of molecular and functional consequences of genomic variants. We believe this optimized model will better enable its clinical implementation and assist guiding patient risk stratification and management.
Collapse
|
12
|
Rodrigues CHM, Portelli S, Ascher DB. Exploring the effects of missense mutations on protein thermodynamics through structure-based approaches: findings from the CAGI6 challenges. Hum Genet 2024:10.1007/s00439-023-02623-4. [PMID: 38227011 DOI: 10.1007/s00439-023-02623-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 11/18/2023] [Indexed: 01/17/2024]
Abstract
Missense mutations are known contributors to diverse genetic disorders, due to their subtle, single amino acid changes imparted on the resultant protein. Because of this, understanding the impact of these mutations on protein stability and function is crucial for unravelling disease mechanisms and developing targeted therapies. The Critical Assessment of Genome Interpretation (CAGI) provides a valuable platform for benchmarking state-of-the-art computational methods in predicting the impact of disease-related mutations on protein thermodynamics. Here we report the performance of our comprehensive platform of structure-based computational approaches to evaluate mutations impacting protein structure and function on 3 challenges from CAGI6: Calmodulin, MAPK1 and MAPK3. Our stability predictors have achieved correlations of up to 0.74 and AUCs of 1 when predicting changes in ΔΔG for MAPK1 and MAPK3, respectively, and AUC of up to 0.75 in the Calmodulin challenge. Overall, our study highlights the importance of structure-based approaches in understanding the effects of missense mutations on protein thermodynamics. The results obtained from the CAGI6 challenges contribute to the ongoing efforts to enhance our understanding of disease mechanisms and facilitate the development of personalised medicine approaches.
Collapse
|
13
|
Li J, Mui JWY, da Silva BM, Pires DEV, Ascher DB, Madiedo Soler N, Goddard-Borger ED, Williams SJ. A Broad-Spectrum α-Glucosidase of Glycoside Hydrolase Family 13 from Marinovum sp., a Member of the Roseobacter Clade. Appl Biochem Biotechnol 2024:10.1007/s12010-023-04820-3. [PMID: 38180643 DOI: 10.1007/s12010-023-04820-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/19/2023] [Indexed: 01/06/2024]
Abstract
Glycoside hydrolases (GHs) are a diverse group of enzymes that catalyze the hydrolysis of glycosidic bonds. The Carbohydrate-Active enZymes (CAZy) classification organizes GHs into families based on sequence data and function, with fewer than 1% of the predicted proteins characterized biochemically. Consideration of genomic context can provide clues to infer possible enzyme activities for proteins of unknown function. We used the MultiGeneBLAST tool to discover a gene cluster in Marinovum sp., a member of the marine Roseobacter clade, that encodes homologues of enzymes belonging to the sulfoquinovose monooxygenase pathway for sulfosugar catabolism. This cluster lacks a gene encoding a classical family GH31 sulfoquinovosidase candidate, but which instead includes an uncharacterized family GH13 protein (MsGH13) that we hypothesized could be a non-classical sulfoquinovosidase. Surprisingly, recombinant MsGH13 lacks sulfoquinovosidase activity and is a broad-spectrum α-glucosidase that is active on a diverse array of α-linked disaccharides, including maltose, sucrose, nigerose, trehalose, isomaltose, and kojibiose. Using AlphaFold, a 3D model for the MsGH13 enzyme was constructed that predicted its active site shared close similarity with an α-glucosidase from Halomonas sp. H11 of the same GH13 subfamily that shows narrower substrate specificity.
Collapse
|
14
|
Abstract
The greatest challenge in drug discovery remains the high rate of attrition across the different phases of the process, which cost the industry billions of dollars every year. While all phases remain crucial to ensure pharmaceutical-level safety, quality, and efficacy of the end product, streamlining these efforts toward compounds with success potential is pivotal for a more efficient and cost-effective process. The use of artificial intelligence (AI) within the pharmaceutical industry aims at just this, and has applications in preclinical screening for biological activity, optimization of pharmacokinetic properties for improved drug formulation, early toxicity prediction which reduces attrition, and pre-emptively screening for genetic changes in the biological target to improve therapeutic longevity. Here, we present a series of in silico tools that address these applications in small molecule development and describe how they can be embedded within the current pharmaceutical development pipeline.
Collapse
|
15
|
Pan Q, Portelli S, Nguyen TB, Ascher DB. Characterization on the oncogenic effect of the missense mutations of p53 via machine learning. Brief Bioinform 2023; 25:bbad428. [PMID: 38018912 PMCID: PMC10685404 DOI: 10.1093/bib/bbad428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 10/13/2023] [Accepted: 11/05/2023] [Indexed: 11/30/2023] Open
Abstract
Dysfunctions caused by missense mutations in the tumour suppressor p53 have been extensively shown to be a leading driver of many cancers. Unfortunately, it is time-consuming and labour-intensive to experimentally elucidate the effects of all possible missense variants. Recent works presented a comprehensive dataset and machine learning model to predict the functional outcome of mutations in p53. Despite the well-established dataset and precise predictions, this tool was trained on a complicated model with limited predictions on p53 mutations. In this work, we first used computational biophysical tools to investigate the functional consequences of missense mutations in p53, informing a bias of deleterious mutations with destabilizing effects. Combining these insights with experimental assays, we present two interpretable machine learning models leveraging both experimental assays and in silico biophysical measurements to accurately predict the functional consequences on p53 and validate their robustness on clinical data. Our final model based on nine features obtained comparable predictive performance with the state-of-the-art p53 specific method and outperformed other generalized, widely used predictors. Interpreting our models revealed that information on residue p53 activity, polar atom distances and changes in p53 stability were instrumental in the decisions, consistent with a bias of the properties of deleterious mutations. Our predictions have been computed for all possible missense mutations in p53, offering clinical diagnostic utility, which is crucial for patient monitoring and the development of personalized cancer treatment.
Collapse
|
16
|
Rodrigues CHM, Ascher DB. CSM-Potential2: A comprehensive deep learning platform for the analysis of protein interacting interfaces. Proteins 2023. [PMID: 37870486 DOI: 10.1002/prot.26615] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 10/04/2023] [Accepted: 10/05/2023] [Indexed: 10/24/2023]
Abstract
Proteins are molecular machinery that participate in virtually all essential biological functions within the cell, which are tightly related to their 3D structure. The importance of understanding protein structure-function relationship is highlighted by the exponential growth of experimental structures, which has been greatly expanded by recent breakthroughs in protein structure prediction, most notably RosettaFold, and AlphaFold2. These advances have prompted the development of several computational approaches that leverage these data sources to explore potential biological interactions. However, most methods are generally limited to analysis of single types of interactions, such as protein-protein or protein-ligand interactions, and their complexity limits the usability to expert users. Here we report CSM-Potential2, a deep learning platform for the analysis of binding interfaces on protein structures. In addition to prediction of protein-protein interactions binding sites and classification of biological ligands, our new platform incorporates prediction of interactions with nucleic acids at the residue level and allows for ligand transplantation based on sequence and structure similarity to experimentally determined structures. We anticipate our platform to be a valuable resource that provides easy access to a range of state-of-the-art methods to expert and non-expert users for the study of biological interactions. Our tool is freely available as an easy-to-use web server and API available at https://biosig.lab.uq.edu.au/csm_potential.
Collapse
|
17
|
Al-Jarf R, Karmakar M, Myung Y, Ascher DB. Uncovering the Molecular Drivers of NHEJ DNA Repair-Implicated Missense Variants and Their Functional Consequences. Genes (Basel) 2023; 14:1890. [PMID: 37895239 PMCID: PMC10606680 DOI: 10.3390/genes14101890] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Revised: 09/24/2023] [Accepted: 09/27/2023] [Indexed: 10/29/2023] Open
Abstract
Variants in non-homologous end joining (NHEJ) DNA repair genes are associated with various human syndromes, including microcephaly, growth delay, Fanconi anemia, and different hereditary cancers. However, very little has been done previously to systematically record the underlying molecular consequences of NHEJ variants and their link to phenotypic outcomes. In this study, a list of over 2983 missense variants of the principal components of the NHEJ system, including DNA Ligase IV, DNA-PKcs, Ku70/80 and XRCC4, reported in the clinical literature, was initially collected. The molecular consequences of variants were evaluated using in silico biophysical tools to quantitatively assess their impact on protein folding, dynamics, stability, and interactions. Cancer-causing and population variants within these NHEJ factors were statistically analyzed to identify molecular drivers. A comprehensive catalog of NHEJ variants from genes known to be mutated in cancer was curated, providing a resource for better understanding their role and molecular mechanisms in diseases. The variant analysis highlighted different molecular drivers among the distinct proteins, where cancer-driving variants in anchor proteins, such as Ku70/80, were more likely to affect key protein-protein interactions, whilst those in the enzymatic components, such as DNA-PKcs, were likely to be found in intolerant regions undergoing purifying selection. We believe that the information acquired in our database will be a powerful resource to better understand the role of non-homologous end-joining DNA repair in genetic disorders, and will serve as a source to inspire other investigations to understand the disease further, vital for the development of improved therapeutic strategies.
Collapse
|
18
|
Ryu J, Barkal S, Yu T, Jankowiak M, Zhou Y, Francoeur M, Phan QV, Li Z, Tognon M, Brown L, Love MI, Lettre G, Ascher DB, Cassa CA, Sherwood RI, Pinello L. Joint genotypic and phenotypic outcome modeling improves base editing variant effect quantification. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.09.08.23295253. [PMID: 37732177 PMCID: PMC10508837 DOI: 10.1101/2023.09.08.23295253] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
CRISPR base editing screens are powerful tools for studying disease-associated variants at scale. However, the efficiency and precision of base editing perturbations vary, confounding the assessment of variant-induced phenotypic effects. Here, we provide an integrated pipeline that improves the estimation of variant impact in base editing screens. We perform high-throughput ABE8e-SpRY base editing screens with an integrated reporter construct to measure the editing efficiency and outcomes of each gRNA alongside their phenotypic consequences. We introduce BEAN, a Bayesian network that accounts for per-guide editing outcomes and target site chromatin accessibility to estimate variant impacts. We show this pipeline attains superior performance compared to existing tools in variant classification and effect size quantification. We use BEAN to pinpoint common variants that alter LDL uptake, implicating novel genes. Additionally, through saturation base editing of LDLR, we enable accurate quantitative prediction of the effects of missense variants on LDL-C levels, which aligns with measurements in UK Biobank individuals, and identify structural mechanisms underlying variant pathogenicity. This work provides a widely applicable approach to improve the power of base editor screens for disease-associated variant characterization.
Collapse
|
19
|
Portelli S, Heaton R, Ascher DB. Identifying Innate Resistance Hotspots for SARS-CoV-2 Antivirals Using In Silico Protein Techniques. Genes (Basel) 2023; 14:1699. [PMID: 37761839 PMCID: PMC10531314 DOI: 10.3390/genes14091699] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 08/02/2023] [Accepted: 08/22/2023] [Indexed: 09/29/2023] Open
Abstract
The development and approval of antivirals against SARS-CoV-2 has further equipped clinicians with treatment strategies against the COVID-19 pandemic, reducing deaths post-infection. Extensive clinical use of antivirals, however, can impart additional selective pressure, leading to the emergence of antiviral resistance. While we have previously characterized possible effects of circulating SARS-CoV-2 missense mutations on proteome function and stability, their direct effects on the novel antivirals remains unexplored. To address this, we have computationally calculated the consequences of mutations in the antiviral targets: RNA-dependent RNA polymerase and main protease, on target stability and interactions with their antiviral, nucleic acids, and other proteins. By analyzing circulating variants prior to antiviral approval, this work highlighted the inherent resistance potential of different genome regions. Namely, within the main protease binding site, missense mutations imparted a lower fitness cost, while the opposite was noted for the RNA-dependent RNA polymerase binding site. This suggests that resistance to nirmatrelvir/ritonavir combination treatment is more likely to occur and proliferate than that to molnupiravir. These insights are crucial both clinically in drug stewardship, and preclinically in the identification of less mutable targets for novel therapeutic design.
Collapse
|
20
|
Myung Y, Pires DEV, Ascher DB. Understanding the complementarity and plasticity of antibody-antigen interfaces. BIOINFORMATICS (OXFORD, ENGLAND) 2023:btad392. [PMID: 37382557 DOI: 10.1093/bioinformatics/btad392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 01/24/2023] [Accepted: 06/27/2023] [Indexed: 06/30/2023]
Abstract
MOTIVATION While antibodies have been ground-breaking therapeutic agents, the structural determinants for antibody binding specificity remain to be fully elucidated, which is compounded by the virtually unlimited repertoire of antigens they can recognise. Here, we have explored the structural landscapes of antibody-antigen interfaces to identify the structural determinants driving target recognition by assessing concavity and interatomic interactions. RESULTS We found that complementarity-determining regions utilised deeper concavity with their longer H3 loops, especially H3 loops of nanobody showing the deepest use of concavity. Of all amino acid residues found in complementarity-determining regions, tryptophan used deeper concavity, especially in nanobodies, making it suitable for leveraging concave antigen surfaces. Similarly, antigens utilised arginine to bind to deeper pockets of the antibody surface. Our findings fill a gap in knowledge about the antibody specificity, binding affinity, and the nature of antibody-antigen interface features, which will lead to a better understanding of how antibodies can be more effective to target druggable sites on antigen surfaces. AVAILABILITY The data and scripts are available at: https://github.com/YoochanMyung/scripts. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
21
|
Nguyen TB, de Sá AGC, Rodrigues CHM, Pires DEV, Ascher DB. LEGO-CSM: a tool for functional characterisation of proteins. BIOINFORMATICS (OXFORD, ENGLAND) 2023:btad402. [PMID: 37382560 DOI: 10.1093/bioinformatics/btad402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 02/22/2023] [Accepted: 06/27/2023] [Indexed: 06/30/2023]
Abstract
MOTIVATION With the development of sequencing techniques, the discovery of new proteins significantly exceeds the human capacity and resources for experimentally characterising protein functions. LEGO-CSM is a comprehensive web-based resource that fills this gap by leveraging the well-established and robust graph-based signatures to supervised learning models using both protein sequence and structure information to accurately model protein function in terms of Subcellular Localisation, Enzyme Commission (EC) numbers and Gene Ontology (GO) terms. RESULTS We show our models perform as well as or better than alternative approaches, achieving Area Under the Receiver Operating Characteristic Curve (ROC AUC) of up to 0.93 for subcellular localisation, up to 0.93 for EC and up to 0.81 for GO terms on independent blind tests. AVAILABILITY LEGO-CSM's web server is freely available at https://biosig.lab.uq.edu.au/lego_csm. In addition, all datasets used to train and test LEGO-CSM's models can be downloaded at https://biosig.lab.uq.edu.au/lego_csm/data. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
22
|
Jessen-Howard D, Pan Q, Ascher DB. Identifying the Molecular Drivers of Pathogenic Aldehyde Dehydrogenase Missense Mutations in Cancer and Non-Cancer Diseases. Int J Mol Sci 2023; 24:10157. [PMID: 37373306 DOI: 10.3390/ijms241210157] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 06/07/2023] [Accepted: 06/08/2023] [Indexed: 06/29/2023] Open
Abstract
Human aldehyde dehydrogenases (ALDHs) comprising 19 isoenzymes play a vital role on both endogenous and exogenous aldehyde metabolism. This NAD(P)-dependent catalytic process relies on the intact structural and functional activity of the cofactor binding, substrate interaction, and the oligomerization of ALDHs. Disruptions on the activity of ALDHs, however, could result in the accumulation of cytotoxic aldehydes, which have been linked with a wide range of diseases, including both cancers as well as neurological and developmental disorders. In our previous works, we have successfully characterised the structure-function relationships of the missense variants of other proteins. We, therefore, applied a similar analysis pipeline to identify potential molecular drivers of pathogenic ALDH missense mutations. Variants data were first carefully curated and labelled as cancer-risk, non-cancer diseases, and benign. We then leveraged various computational biophysical methods to describe the changes caused by missense mutations, informing a bias of detrimental mutations with destabilising effects. Cooperating with these insights, several machine learning approaches were further utilised to investigate the combination of features, revealing the necessity of the conservation of ALDHs. Our work aims to provide important biological perspectives on pathogenic consequences of missense mutations of ALDHs, which could be invaluable resources in the development of cancer treatment.
Collapse
|
23
|
Zhou Y, Pan Q, Pires DEV, Rodrigues CHM, Ascher DB. DDMut: predicting effects of mutations on protein stability using deep learning. Nucleic Acids Res 2023:7191416. [PMID: 37283042 PMCID: PMC10320186 DOI: 10.1093/nar/gkad472] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 05/11/2023] [Accepted: 05/18/2023] [Indexed: 06/08/2023] Open
Abstract
Understanding the effects of mutations on protein stability is crucial for variant interpretation and prioritisation, protein engineering, and biotechnology. Despite significant efforts, community assessments of predictive tools have highlighted ongoing limitations, including computational time, low predictive power, and biased predictions towards destabilising mutations. To fill this gap, we developed DDMut, a fast and accurate siamese network to predict changes in Gibbs Free Energy upon single and multiple point mutations, leveraging both forward and hypothetical reverse mutations to account for model anti-symmetry. Deep learning models were built by integrating graph-based representations of the localised 3D environment, with convolutional layers and transformer encoders. This combination better captured the distance patterns between atoms by extracting both short-range and long-range interactions. DDMut achieved Pearson's correlations of up to 0.70 (RMSE: 1.37 kcal/mol) on single point mutations, and 0.70 (RMSE: 1.84 kcal/mol) on double/triple mutants, outperforming most available methods across non-redundant blind test sets. Importantly, DDMut was highly scalable and demonstrated anti-symmetric performance on both destabilising and stabilising mutations. We believe DDMut will be a useful platform to better understand the functional consequences of mutations, and guide rational protein engineering. DDMut is freely available as a web server and API at https://biosig.lab.uq.edu.au/ddmut.
Collapse
|
24
|
da Silva BM, Ascher DB, Pires DEV. epitope1D: accurate taxonomy-aware B-cell linear epitope prediction. Brief Bioinform 2023; 24:7111720. [PMID: 37039696 DOI: 10.1093/bib/bbad114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 01/30/2023] [Accepted: 03/07/2023] [Indexed: 04/12/2023] Open
Abstract
The ability to identify B-cell epitopes is an essential step in vaccine design, immunodiagnostic tests and antibody production. Several computational approaches have been proposed to identify, from an antigen protein or peptide sequence, which residues are more likely to be part of an epitope, but have limited performance on relatively homogeneous data sets and lack interpretability, limiting biological insights that could otherwise be obtained. To address these limitations, we have developed epitope1D, an explainable machine learning method capable of accurately identifying linear B-cell epitopes, leveraging two new descriptors: a graph-based signature representation of protein sequences, based on our well-established Cutoff Scanning Matrix algorithm and Organism Ontology information. Our model achieved Areas Under the ROC curve of up to 0.935 on cross-validation and blind tests, demonstrating robust performance. A comprehensive comparison to alternative methods using distinct benchmark data sets was also employed, with our model outperforming state-of-the-art tools. epitope1D represents not only a significant advance in predictive performance, but also allows biologically meaningful features to be combined and used for model interpretation. epitope1D has been made available as a user-friendly web server interface and application programming interface at https://biosig.lab.uq.edu.au/epitope1d/.
Collapse
|
25
|
Silk M, de Sá A, Olshansky M, Ascher DB. Insights from Spatial Measures of Intolerance to Identifying Pathogenic Variants in Developmental and Epileptic Encephalopathies. Int J Mol Sci 2023; 24:ijms24065114. [PMID: 36982187 PMCID: PMC10049344 DOI: 10.3390/ijms24065114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 02/17/2023] [Accepted: 02/28/2023] [Indexed: 03/11/2023] Open
Abstract
Developmental and epileptic encephalopathies (DEEs) are a group of epilepsies with early onset and severe symptoms that sometimes lead to death. Although previous work successfully discovered several genes implicated in disease outcomes, it remains challenging to identify causative mutations within these genes from the background variation present in all individuals due to disease heterogeneity. Nevertheless, our ability to detect possible pathogenic variants has continued to improve as in silico predictors of deleteriousness have advanced. We investigate their use in prioritising likely pathogenic variants in epileptic encephalopathy patients’ whole exome sequences. We showed that the inclusion of structure-based predictors of intolerance improved upon previous attempts to demonstrate enrichment within epilepsy genes.
Collapse
|