1
|
Pan Q, Parra GB, Myung Y, Portelli S, Nguyen TB, Ascher DB. AlzDiscovery: A computational tool to identify Alzheimer's disease-causing missense mutations using protein structure information. Protein Sci 2024; 33:e5147. [PMID: 39276018 PMCID: PMC11401060 DOI: 10.1002/pro.5147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 07/14/2024] [Accepted: 07/31/2024] [Indexed: 09/16/2024]
Abstract
Alzheimer's disease (AD) is one of the most common forms of dementia and neurodegenerative diseases, characterized by the formation of neuritic plaques and neurofibrillary tangles. Many different proteins participate in this complicated pathogenic mechanism, and missense mutations can alter the folding and functions of these proteins, significantly increasing the risk of AD. However, many methods to identify AD-causing variants did not consider the effect of mutations from the perspective of a protein three-dimensional environment. Here, we present a machine learning-based analysis to classify the AD-causing mutations from their benign counterparts in 21 AD-related proteins leveraging both sequence- and structure-based features. Using computational tools to estimate the effect of mutations on protein stability, we first observed a bias of the pathogenic mutations with significant destabilizing effects on family AD-related proteins. Combining this insight, we built a generic predictive model, and improved the performance by tuning the sample weights in the training process. Our final model achieved the performance on area under the receiver operating characteristic curve up to 0.95 in the blind test and 0.70 in an independent clinical validation, outperforming all the state-of-the-art methods. Feature interpretation indicated that the hydrophobic environment and polar interaction contacts were crucial to the decision on pathogenic phenotypes of missense mutations. Finally, we presented a user-friendly web server, AlzDiscovery, for researchers to browse the predicted phenotypes of all possible missense mutations on these 21 AD-related proteins. Our study will be a valuable resource for AD screening and the development of personalized treatment.
Collapse
Affiliation(s)
- Qisheng Pan
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular BioscienceUniversity of QueenslandBrisbaneAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneAustralia
| | - Georgina Becerra Parra
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular BioscienceUniversity of QueenslandBrisbaneAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneAustralia
| | - Yoochan Myung
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular BioscienceUniversity of QueenslandBrisbaneAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneAustralia
| | - Stephanie Portelli
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular BioscienceUniversity of QueenslandBrisbaneAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneAustralia
| | - Thanh Binh Nguyen
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular BioscienceUniversity of QueenslandBrisbaneAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneAustralia
| | - David B. Ascher
- The Australian Centre for Ecogenomics, School of Chemistry and Molecular BioscienceUniversity of QueenslandBrisbaneAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneAustralia
| |
Collapse
|
2
|
Velloso JPL, de Sá AGC, Pires DEV, Ascher DB. Engineering G protein-coupled receptors for stabilization. Protein Sci 2024; 33:e5000. [PMID: 38747401 PMCID: PMC11094779 DOI: 10.1002/pro.5000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Revised: 03/21/2024] [Accepted: 04/10/2024] [Indexed: 05/19/2024]
Abstract
G protein-coupled receptors (GPCRs) are one of the most important families of targets for drug discovery. One of the limiting steps in the study of GPCRs has been their stability, with significant and time-consuming protein engineering often used to stabilize GPCRs for structural characterization and drug screening. Unfortunately, computational methods developed using globular soluble proteins have translated poorly to the rational engineering of GPCRs. To fill this gap, we propose GPCR-tm, a novel and personalized structurally driven web-based machine learning tool to study the impacts of mutations on GPCR stability. We show that GPCR-tm performs as well as or better than alternative methods, and that it can accurately rank the stability changes of a wide range of mutations occurring in various types of class A GPCRs. GPCR-tm achieved Pearson's correlation coefficients of 0.74 and 0.46 on 10-fold cross-validation and blind test sets, respectively. We observed that the (structural) graph-based signatures were the most important set of features for predicting destabilizing mutations, which points out that these signatures properly describe the changes in the environment where the mutations occur. More specifically, GPCR-tm was able to accurately rank mutations based on their effect on protein stability, guiding their rational stabilization. GPCR-tm is available through a user-friendly web server at https://biosig.lab.uq.edu.au/gpcr_tm/.
Collapse
Affiliation(s)
- João Paulo L. Velloso
- School of Chemistry and Molecular Biosciences, The Australian Centre for EcogenomicsThe University of QueenslandBrisbaneQueenslandAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
- Baker Department of Cardiometabolic HealthThe University of MelbourneParkvilleVictoriaAustralia
| | - Alex G. C. de Sá
- School of Chemistry and Molecular Biosciences, The Australian Centre for EcogenomicsThe University of QueenslandBrisbaneQueenslandAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
- Baker Department of Cardiometabolic HealthThe University of MelbourneParkvilleVictoriaAustralia
| | - Douglas E. V. Pires
- School of Computing and Information SystemsThe University of MelbourneParkvilleVictoriaAustralia
| | - David B. Ascher
- School of Chemistry and Molecular Biosciences, The Australian Centre for EcogenomicsThe University of QueenslandBrisbaneQueenslandAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
- Baker Department of Cardiometabolic HealthThe University of MelbourneParkvilleVictoriaAustralia
| |
Collapse
|
3
|
Serghini A, Portelli S, Troadec G, Song C, Pan Q, Pires DEV, Ascher DB. Characterizing and predicting ccRCC-causing missense mutations in Von Hippel-Lindau disease. Hum Mol Genet 2024; 33:224-232. [PMID: 37883464 PMCID: PMC10800015 DOI: 10.1093/hmg/ddad181] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Revised: 10/19/2023] [Accepted: 10/20/2023] [Indexed: 10/28/2023] Open
Abstract
BACKGROUND Mutations within the Von Hippel-Lindau (VHL) tumor suppressor gene are known to cause VHL disease, which is characterized by the formation of cysts and tumors in multiple organs of the body, particularly clear cell renal cell carcinoma (ccRCC). A major challenge in clinical practice is determining tumor risk from a given mutation in the VHL gene. Previous efforts have been hindered by limited available clinical data and technological constraints. METHODS To overcome this, we initially manually curated the largest set of clinically validated VHL mutations to date, enabling a robust assessment of existing predictive tools on an independent test set. Additionally, we comprehensively characterized the effects of mutations within VHL using in silico biophysical tools describing changes in protein stability, dynamics and affinity to binding partners to provide insights into the structure-phenotype relationship. These descriptive properties were used as molecular features for the construction of a machine learning model, designed to predict the risk of ccRCC development as a result of a VHL missense mutation. RESULTS Analysis of our model showed an accuracy of 0.81 in the identification of ccRCC-causing missense mutations, and a Matthew's Correlation Coefficient of 0.44 on a non-redundant blind test, a significant improvement in comparison to the previous available approaches. CONCLUSION This work highlights the power of using protein 3D structure to fully explore the range of molecular and functional consequences of genomic variants. We believe this optimized model will better enable its clinical implementation and assist guiding patient risk stratification and management.
Collapse
Affiliation(s)
- Adam Serghini
- School of Chemistry and Molecular Biosciences, Chemistry Building 68, Cooper Road, The University of Queensland, St Lucia, QLD 4072, Queensland, Australia
| | - Stephanie Portelli
- School of Chemistry and Molecular Biosciences, Chemistry Building 68, Cooper Road, The University of Queensland, St Lucia, QLD 4072, Queensland, Australia
| | - Guillaume Troadec
- School of Computing and Information Systems, University of Melbourne, Melbourne, VIC 3010, Australia
| | - Catherine Song
- School of Computing and Information Systems, University of Melbourne, Melbourne, VIC 3010, Australia
| | - Qisheng Pan
- School of Chemistry and Molecular Biosciences, Chemistry Building 68, Cooper Road, The University of Queensland, St Lucia, QLD 4072, Queensland, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, 75 Commercial Road, Melbourne, VIC 3004, Australia
| | - Douglas E V Pires
- School of Computing and Information Systems, University of Melbourne, Melbourne, VIC 3010, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, 75 Commercial Road, Melbourne, VIC 3004, Australia
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, Chemistry Building 68, Cooper Road, The University of Queensland, St Lucia, QLD 4072, Queensland, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, 75 Commercial Road, Melbourne, VIC 3004, Australia
| |
Collapse
|
4
|
Rodrigues CHM, Portelli S, Ascher DB. Exploring the effects of missense mutations on protein thermodynamics through structure-based approaches: findings from the CAGI6 challenges. Hum Genet 2024:10.1007/s00439-023-02623-4. [PMID: 38227011 DOI: 10.1007/s00439-023-02623-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 11/18/2023] [Indexed: 01/17/2024]
Abstract
Missense mutations are known contributors to diverse genetic disorders, due to their subtle, single amino acid changes imparted on the resultant protein. Because of this, understanding the impact of these mutations on protein stability and function is crucial for unravelling disease mechanisms and developing targeted therapies. The Critical Assessment of Genome Interpretation (CAGI) provides a valuable platform for benchmarking state-of-the-art computational methods in predicting the impact of disease-related mutations on protein thermodynamics. Here we report the performance of our comprehensive platform of structure-based computational approaches to evaluate mutations impacting protein structure and function on 3 challenges from CAGI6: Calmodulin, MAPK1 and MAPK3. Our stability predictors have achieved correlations of up to 0.74 and AUCs of 1 when predicting changes in ΔΔG for MAPK1 and MAPK3, respectively, and AUC of up to 0.75 in the Calmodulin challenge. Overall, our study highlights the importance of structure-based approaches in understanding the effects of missense mutations on protein thermodynamics. The results obtained from the CAGI6 challenges contribute to the ongoing efforts to enhance our understanding of disease mechanisms and facilitate the development of personalised medicine approaches.
Collapse
Affiliation(s)
- Carlos H M Rodrigues
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, 3004, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD, 4072, Australia
| | - Stephanie Portelli
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, 3004, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD, 4072, Australia
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, 3004, Australia.
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD, 4072, Australia.
| |
Collapse
|
5
|
Abstract
The greatest challenge in drug discovery remains the high rate of attrition across the different phases of the process, which cost the industry billions of dollars every year. While all phases remain crucial to ensure pharmaceutical-level safety, quality, and efficacy of the end product, streamlining these efforts toward compounds with success potential is pivotal for a more efficient and cost-effective process. The use of artificial intelligence (AI) within the pharmaceutical industry aims at just this, and has applications in preclinical screening for biological activity, optimization of pharmacokinetic properties for improved drug formulation, early toxicity prediction which reduces attrition, and pre-emptively screening for genetic changes in the biological target to improve therapeutic longevity. Here, we present a series of in silico tools that address these applications in small molecule development and describe how they can be embedded within the current pharmaceutical development pipeline.
Collapse
Affiliation(s)
- Adam Serghini
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
| | - Stephanie Portelli
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD, Australia.
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, QLD, Australia.
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia.
| |
Collapse
|
6
|
Pan Q, Portelli S, Nguyen TB, Ascher DB. Characterization on the oncogenic effect of the missense mutations of p53 via machine learning. Brief Bioinform 2023; 25:bbad428. [PMID: 38018912 PMCID: PMC10685404 DOI: 10.1093/bib/bbad428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 10/13/2023] [Accepted: 11/05/2023] [Indexed: 11/30/2023] Open
Abstract
Dysfunctions caused by missense mutations in the tumour suppressor p53 have been extensively shown to be a leading driver of many cancers. Unfortunately, it is time-consuming and labour-intensive to experimentally elucidate the effects of all possible missense variants. Recent works presented a comprehensive dataset and machine learning model to predict the functional outcome of mutations in p53. Despite the well-established dataset and precise predictions, this tool was trained on a complicated model with limited predictions on p53 mutations. In this work, we first used computational biophysical tools to investigate the functional consequences of missense mutations in p53, informing a bias of deleterious mutations with destabilizing effects. Combining these insights with experimental assays, we present two interpretable machine learning models leveraging both experimental assays and in silico biophysical measurements to accurately predict the functional consequences on p53 and validate their robustness on clinical data. Our final model based on nine features obtained comparable predictive performance with the state-of-the-art p53 specific method and outperformed other generalized, widely used predictors. Interpreting our models revealed that information on residue p53 activity, polar atom distances and changes in p53 stability were instrumental in the decisions, consistent with a bias of the properties of deleterious mutations. Our predictions have been computed for all possible missense mutations in p53, offering clinical diagnostic utility, which is crucial for patient monitoring and the development of personalized cancer treatment.
Collapse
Affiliation(s)
- Qisheng Pan
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne Victoria 3004, Australia
| | - Stephanie Portelli
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne Victoria 3004, Australia
| | - Thanh Binh Nguyen
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne Victoria 3004, Australia
| | - David B Ascher
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne Victoria 3004, Australia
| |
Collapse
|
7
|
Jessen-Howard D, Pan Q, Ascher DB. Identifying the Molecular Drivers of Pathogenic Aldehyde Dehydrogenase Missense Mutations in Cancer and Non-Cancer Diseases. Int J Mol Sci 2023; 24:10157. [PMID: 37373306 DOI: 10.3390/ijms241210157] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 06/07/2023] [Accepted: 06/08/2023] [Indexed: 06/29/2023] Open
Abstract
Human aldehyde dehydrogenases (ALDHs) comprising 19 isoenzymes play a vital role on both endogenous and exogenous aldehyde metabolism. This NAD(P)-dependent catalytic process relies on the intact structural and functional activity of the cofactor binding, substrate interaction, and the oligomerization of ALDHs. Disruptions on the activity of ALDHs, however, could result in the accumulation of cytotoxic aldehydes, which have been linked with a wide range of diseases, including both cancers as well as neurological and developmental disorders. In our previous works, we have successfully characterised the structure-function relationships of the missense variants of other proteins. We, therefore, applied a similar analysis pipeline to identify potential molecular drivers of pathogenic ALDH missense mutations. Variants data were first carefully curated and labelled as cancer-risk, non-cancer diseases, and benign. We then leveraged various computational biophysical methods to describe the changes caused by missense mutations, informing a bias of detrimental mutations with destabilising effects. Cooperating with these insights, several machine learning approaches were further utilised to investigate the combination of features, revealing the necessity of the conservation of ALDHs. Our work aims to provide important biological perspectives on pathogenic consequences of missense mutations of ALDHs, which could be invaluable resources in the development of cancer treatment.
Collapse
Affiliation(s)
- Dana Jessen-Howard
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia
| | - Qisheng Pan
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
| | - David B Ascher
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
| |
Collapse
|