1
|
AlJarf R, Rodrigues CHM, Myung Y, Pires DEV, Ascher DB. piscesCSM: prediction of anticancer synergistic drug combinations. J Cheminform 2024; 16:81. [PMID: 39030592 PMCID: PMC11264925 DOI: 10.1186/s13321-024-00859-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Accepted: 05/12/2024] [Indexed: 07/21/2024] Open
Abstract
While drug combination therapies are of great importance, particularly in cancer treatment, identifying novel synergistic drug combinations has been a challenging venture. Computational methods have emerged in this context as a promising tool for prioritizing drug combinations for further evaluation, though they have presented limited performance, utility, and interpretability. Here, we propose a novel predictive tool, piscesCSM, that leverages graph-based representations to model small molecule chemical structures to accurately predict drug combinations with favourable anticancer synergistic effects against one or multiple cancer cell lines. Leveraging these insights, we developed a general supervised machine learning model to guide the prediction of anticancer synergistic drug combinations in over 30 cell lines. It achieved an area under the receiver operating characteristic curve (AUROC) of up to 0.89 on independent non-redundant blind tests, outperforming state-of-the-art approaches on both large-scale oncology screening data and an independent test set generated by AstraZeneca (with more than a 16% improvement in predictive accuracy). Moreover, by exploring the interpretability of our approach, we found that simple physicochemical properties and graph-based signatures are predictive of chemotherapy synergism. To provide a simple and integrated platform to rapidly screen potential candidate pairs with favourable synergistic anticancer effects, we made piscesCSM freely available online at https://biosig.lab.uq.edu.au/piscescsm/ as a web server and API. We believe that our predictive tool will provide a valuable resource for optimizing and augmenting combinatorial screening libraries to identify effective and safe synergistic anticancer drug combinations. SCIENTIFIC CONTRIBUTION: This work proposes piscesCSM, a machine-learning-based framework that relies on well-established graph-based representations of small molecules to identify and provide better predictive accuracy of syngenetic drug combinations. Our model, piscesCSM, shows that combining physiochemical properties with graph-based signatures can outperform current architectures on classification prediction tasks. Furthermore, implementing our tool as a web server offers a user-friendly platform for researchers to screen for potential synergistic drug combinations with favorable anticancer effects against one or multiple cancer cell lines.
Collapse
Affiliation(s)
- Raghad AlJarf
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, VIC, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, VIC, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
| | - Carlos H M Rodrigues
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, VIC, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, VIC, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| | - Yoochan Myung
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, VIC, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, VIC, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| | - Douglas E V Pires
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, VIC, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
- School of Computing and Information Systems, University of Melbourne, Melbourne, VIC, Australia
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, VIC, Australia.
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, VIC, Australia.
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia.
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia.
| |
Collapse
|
2
|
Velloso JPL, de Sá AGC, Pires DEV, Ascher DB. Engineering G protein-coupled receptors for stabilization. Protein Sci 2024; 33:e5000. [PMID: 38747401 PMCID: PMC11094779 DOI: 10.1002/pro.5000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Revised: 03/21/2024] [Accepted: 04/10/2024] [Indexed: 05/19/2024]
Abstract
G protein-coupled receptors (GPCRs) are one of the most important families of targets for drug discovery. One of the limiting steps in the study of GPCRs has been their stability, with significant and time-consuming protein engineering often used to stabilize GPCRs for structural characterization and drug screening. Unfortunately, computational methods developed using globular soluble proteins have translated poorly to the rational engineering of GPCRs. To fill this gap, we propose GPCR-tm, a novel and personalized structurally driven web-based machine learning tool to study the impacts of mutations on GPCR stability. We show that GPCR-tm performs as well as or better than alternative methods, and that it can accurately rank the stability changes of a wide range of mutations occurring in various types of class A GPCRs. GPCR-tm achieved Pearson's correlation coefficients of 0.74 and 0.46 on 10-fold cross-validation and blind test sets, respectively. We observed that the (structural) graph-based signatures were the most important set of features for predicting destabilizing mutations, which points out that these signatures properly describe the changes in the environment where the mutations occur. More specifically, GPCR-tm was able to accurately rank mutations based on their effect on protein stability, guiding their rational stabilization. GPCR-tm is available through a user-friendly web server at https://biosig.lab.uq.edu.au/gpcr_tm/.
Collapse
Affiliation(s)
- João Paulo L. Velloso
- School of Chemistry and Molecular Biosciences, The Australian Centre for EcogenomicsThe University of QueenslandBrisbaneQueenslandAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
- Baker Department of Cardiometabolic HealthThe University of MelbourneParkvilleVictoriaAustralia
| | - Alex G. C. de Sá
- School of Chemistry and Molecular Biosciences, The Australian Centre for EcogenomicsThe University of QueenslandBrisbaneQueenslandAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
- Baker Department of Cardiometabolic HealthThe University of MelbourneParkvilleVictoriaAustralia
| | - Douglas E. V. Pires
- School of Computing and Information SystemsThe University of MelbourneParkvilleVictoriaAustralia
| | - David B. Ascher
- School of Chemistry and Molecular Biosciences, The Australian Centre for EcogenomicsThe University of QueenslandBrisbaneQueenslandAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
- Baker Department of Cardiometabolic HealthThe University of MelbourneParkvilleVictoriaAustralia
| |
Collapse
|
3
|
Tao L, Zhou T, Wu Z, Hu F, Yang S, Kong X, Li C. ESPDHot: An Effective Machine Learning-Based Approach for Predicting Protein-DNA Interaction Hotspots. J Chem Inf Model 2024; 64:3548-3557. [PMID: 38587997 DOI: 10.1021/acs.jcim.3c02011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/10/2024]
Abstract
Protein-DNA interactions are pivotal to various cellular processes. Precise identification of the hotspot residues for protein-DNA interactions holds great significance for revealing the intricate mechanisms in protein-DNA recognition and for providing essential guidance for protein engineering. Aiming at protein-DNA interaction hotspots, this work introduces an effective prediction method, ESPDHot based on a stacked ensemble machine learning framework. Here, the interface residue whose mutation leads to a binding free energy change (ΔΔG) exceeding 2 kcal/mol is defined as a hotspot. To tackle the imbalanced data set issue, the adaptive synthetic sampling (ADASYN), an oversampling technique, is adopted to synthetically generate new minority samples, thereby rectifying data imbalance. As for molecular characteristics, besides traditional features, we introduce three new characteristic types including residue interface preference proposed by us, residue fluctuation dynamics characteristics, and coevolutionary features. Combining the Boruta method with our previously developed Random Grouping strategy, we obtained an optimal set of features. Finally, a stacking classifier is constructed to output prediction results, which integrates three classical predictors, Support Vector Machine (SVM), XGBoost, and Artificial Neural Network (ANN) as the first layer, and Logistic Regression (LR) algorithm as the second one. Notably, ESPDHot outperforms the current state-of-the-art predictors, achieving superior performance on the independent test data set, with F1, MCC, and AUC reaching 0.571, 0.516, and 0.870, respectively.
Collapse
Affiliation(s)
- Lianci Tao
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Tong Zhou
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Zhixiang Wu
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Fangrui Hu
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Shuang Yang
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Xiaotian Kong
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Chunhua Li
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| |
Collapse
|
4
|
Pandey U, Behara SM, Sharma S, Patil RS, Nambiar S, Koner D, Bhukya H. DeePNAP: A Deep Learning Method to Predict Protein-Nucleic Acid Binding Affinity from Their Sequences. J Chem Inf Model 2024; 64:1806-1815. [PMID: 38458968 DOI: 10.1021/acs.jcim.3c01151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/10/2024]
Abstract
Predicting the protein-nucleic acid (PNA) binding affinity solely from their sequences is of paramount importance for the experimental design and analysis of PNA interactions (PNAIs). A large number of currently developed models for binding affinity prediction are limited to specific PNAIs while also relying on the sequence and structural information of the PNA complexes for both training and testing, and also as inputs. As the PNA complex structures available are scarce, this significantly limits the diversity and generalizability due to the small training data set. Additionally, a majority of the tools predict a single parameter, such as binding affinity or free energy changes upon mutations, rendering a model less versatile for usage. Hence, we propose DeePNAP, a machine learning-based model built from a vast and heterogeneous data set with 14,401 entries (from both eukaryotes and prokaryotes) from the ProNAB database, consisting of wild-type and mutant PNA complex binding parameters. Our model precisely predicts the binding affinity and free energy changes due to the mutation(s) of PNAIs exclusively from their sequences. While other similar tools extract features from both sequence and structure information, DeePNAP employs sequence-based features to yield high correlation coefficients between the predicted and experimental values with low root mean squared errors for PNA complexes in predicting KD and ΔΔG, implying the generalizability of DeePNAP. Additionally, we have also developed a web interface hosting DeePNAP that can serve as a powerful tool to rapidly predict binding affinities for a myriad of PNAIs with high precision toward developing a deeper understanding of their implications in various biological systems. Web interface: http://14.139.174.41:8080/.
Collapse
Affiliation(s)
- Uddeshya Pandey
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| | - Sasi M Behara
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| | - Siddhant Sharma
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| | - Rachit S Patil
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| | - Souparnika Nambiar
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| | - Debasish Koner
- Department of Chemistry, Indian Institute of Technology Hyderabad, Kandi 502284, India
| | - Hussain Bhukya
- Department of Biology, Indian Institute of Science Education and Research Tirupati, Tirupati 517507, India
| |
Collapse
|
5
|
Costa L, Bermudez-Guzman L, Benouda I, Laissue P, Morel A, Jiménez KM, Fournier T, Stouvenel L, Méhats C, Miralles F, Vaiman D. Linking genotype to trophoblast phenotype in preeclampsia and HELLP syndrome associated with STOX1 genetic variants. iScience 2024; 27:109260. [PMID: 38439971 PMCID: PMC10910284 DOI: 10.1016/j.isci.2024.109260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 12/20/2023] [Accepted: 02/13/2024] [Indexed: 03/06/2024] Open
Abstract
Preeclampsia is a major hypertensive pregnancy disorder with a 50% heritability. The first identified gene involved in the disease is STOX1, a transcription factor, whose variant Y153H predisposes to the disease. Two rare mutations were also identified in Colombian women affected by the hemolysis, elevated liver enzyme, low platelet syndrome, a complication of preeclampsia (T188N and R364X). Here, we explore the effects of these variants in trophoblast cell models (BeWo) where STOX1 was previously invalidated. We firstly showed that STOX1 knockout alters response to oxidative stress, cell proliferation, and fusion capacity. Then, we showed that mutant versions of STOX1 trigger alterations in gene profiles, growth, fusion, and oxidative stress management. The results also reveal alterations of the STOX interaction with DNA when the mutations affected the DNA-binding domain of STOX1 (Y153H and T188N). We also reveal here that a major contributor of these effects appears to be the E2F3 transcription factor.
Collapse
Affiliation(s)
- Lorenzo Costa
- Institut Cochin, Team ‘From Gametes To Birth’, INSERM U1016, CNRS UMR8104, Université de Paris, 24 rue du Faubourg St Jacques, 75014 Paris, France
- Department of Human Genetics, University of Heidelberg, Heidelberg, Germany
| | | | - Ikram Benouda
- Institut Cochin, Team ‘From Gametes To Birth’, INSERM U1016, CNRS UMR8104, Université de Paris, 24 rue du Faubourg St Jacques, 75014 Paris, France
| | - Paul Laissue
- Biopas Laboratoires, Orphan Diseases Unit, BIOPAS GROUP, Bogotá 111111, Colombia
| | - Adrien Morel
- Universidad Del Rosario, School of Medicine and Health Sciences, Center for Research in Genetics and Genomics (CIGGUR), Institute of Translational Medicine (IMT), Bogotá, Colombia
| | - Karen Marcela Jiménez
- Universidad Del Rosario, School of Medicine and Health Sciences, Center for Research in Genetics and Genomics (CIGGUR), Institute of Translational Medicine (IMT), Bogotá, Colombia
| | - Thierry Fournier
- Université Paris Cité, INSERM, UMR-S1139, Pathophysiology & Pharmacotoxicology of the Human Placenta, Pre- & Post-natal Microbiota (3PHM), 75006 Paris, France
| | - Laurence Stouvenel
- Institut Cochin, Team ‘From Gametes To Birth’, INSERM U1016, CNRS UMR8104, Université de Paris, 24 rue du Faubourg St Jacques, 75014 Paris, France
| | - Céline Méhats
- Institut Cochin, Team ‘From Gametes To Birth’, INSERM U1016, CNRS UMR8104, Université de Paris, 24 rue du Faubourg St Jacques, 75014 Paris, France
| | - Francisco Miralles
- Institut Cochin, Team ‘From Gametes To Birth’, INSERM U1016, CNRS UMR8104, Université de Paris, 24 rue du Faubourg St Jacques, 75014 Paris, France
| | - Daniel Vaiman
- Institut Cochin, Team ‘From Gametes To Birth’, INSERM U1016, CNRS UMR8104, Université de Paris, 24 rue du Faubourg St Jacques, 75014 Paris, France
| |
Collapse
|
6
|
Sharma D, Rawat P, Greiff V, Janakiraman V, Gromiha MM. Predicting the immune escape of SARS-CoV-2 neutralizing antibodies upon mutation. Biochim Biophys Acta Mol Basis Dis 2024; 1870:166959. [PMID: 37967796 DOI: 10.1016/j.bbadis.2023.166959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 10/25/2023] [Accepted: 11/07/2023] [Indexed: 11/17/2023]
Abstract
COVID-19 has resulted in millions of deaths and severe impact on economies worldwide. Moreover, the emergence of SARS-CoV-2 variants presented significant challenges in controlling the pandemic, particularly their potential to avoid the immune system and evade vaccine immunity. This has led to a growing need for research to predict how mutations in SARS-CoV-2 reduces the ability of antibodies to neutralize the virus. In this study, we assembled a set of 1813 mutations from the interface of SARS-CoV-2 spike protein's receptor binding domain (RBD) and neutralizing antibody complexes and developed a machine learning model to classify high or low escape mutations using interaction energy, inter-residue contacts and predicted binding free energy change. Our approach achieved an Area under the Receiver Operating Characteristics (ROC) Curve (AUC) of 0.91 using the Random Forest classifier on the test dataset with 217 mutations. The model was further utilized to predict the escape mutations on a dataset of 29,165 mutations located at the interface of 83 RBD-neutralizing antibody complexes. A small subset of this dataset was also validated based on available experimental data. We found that top 10 % high escape mutations were dominated by charged to nonpolar mutations whereas low escape mutations were dominated by polar to nonpolar mutations. We believe that the present method will allow prioritization of high/low escape mutations in the context of neutralizing antibodies targeting SARS-CoV-2 RBD region and assist antibody design for current and emerging variants.
Collapse
Affiliation(s)
- Divya Sharma
- Protein Bioinformatics Lab, Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu 600036, India
| | - Puneet Rawat
- University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Victor Greiff
- University of Oslo and Oslo University Hospital, Oslo, Norway
| | - Vani Janakiraman
- Infection Biology Lab, Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu 600036, India
| | - M Michael Gromiha
- Protein Bioinformatics Lab, Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu 600036, India; International Research Frontiers Initiative, School of Computing, Tokyo Institute of Technology, Yokohama 226-8501, Japan; Department of Computer Science, National University of Singapore, Singapore.
| |
Collapse
|
7
|
Mitrotti A, Di Bari I, Giliberti M, Franzin R, Conserva F, Chiusolo A, Gigante M, Accetturo M, Cafiero C, Ricciato L, Stea ED, Forleo C, Gallone A, Rossini M, Fiorentino M, Castellano G, Pontrelli P, Gesualdo L. What Is Hidden in Patients with Unknown Nephropathy? Genetic Screening Could Be the Missing Link in Kidney Transplantation Diagnosis and Management. Int J Mol Sci 2024; 25:1436. [PMID: 38338714 PMCID: PMC10855929 DOI: 10.3390/ijms25031436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 01/15/2024] [Accepted: 01/16/2024] [Indexed: 02/12/2024] Open
Abstract
Between 15-20% of patients with end stage renal disease (ESRD) do not know the cause of the primary kidney disease and can develop complications after kidney transplantation. We performed a genetic screening in 300 patients with kidney transplantation, or undiagnosed primary renal disease, in order to identify the primary disease cause and discriminate between overlapping phenotypes. We used a custom-made panel for next-generation sequencing (Agilent technology, Santa Clara, CA, USA), including genes associated with Fabry disease, podocytopaties, complement-mediated nephropathies and Alport syndrome-related diseases. We detected candidate diagnostic variants in genes associated with nephrotic syndrome and Focal Segmental Glomerulosclerosis (FSGS) in 29 out of 300 patients, solving about 10% of the probands. We also identified the same genetic cause of the disease (PAX2: c.1266dupC) in three family members with different clinical diagnoses. Interestingly we also found one female patient carrying a novel missense variant, c.1259C>A (p.Thr420Lys), in the GLA gene not previously associated with Fabry disease, which is in silico defined as a likely pathogenic and destabilizing, and associated with a mild alteration in GLA enzymatic activity. The identification of the specific genetic background may provide an opportunity to evaluate the risk of recurrence of the primary disease, especially among patient candidates living with a donor kidney transplant.
Collapse
Affiliation(s)
- Adele Mitrotti
- Department of Precision and Regenerative Medicine and Ionian Area (DIMEPRE-J), University of Bari Aldo Moro, 70124 Bari, Italy; (A.M.); (I.D.B.); (M.G.); (R.F.); (F.C.); (A.C.); (M.G.); (M.A.); (C.C.); (L.R.); (E.D.S.); (C.F.); (M.R.); (M.F.)
| | - Ighli Di Bari
- Department of Precision and Regenerative Medicine and Ionian Area (DIMEPRE-J), University of Bari Aldo Moro, 70124 Bari, Italy; (A.M.); (I.D.B.); (M.G.); (R.F.); (F.C.); (A.C.); (M.G.); (M.A.); (C.C.); (L.R.); (E.D.S.); (C.F.); (M.R.); (M.F.)
| | - Marica Giliberti
- Department of Precision and Regenerative Medicine and Ionian Area (DIMEPRE-J), University of Bari Aldo Moro, 70124 Bari, Italy; (A.M.); (I.D.B.); (M.G.); (R.F.); (F.C.); (A.C.); (M.G.); (M.A.); (C.C.); (L.R.); (E.D.S.); (C.F.); (M.R.); (M.F.)
| | - Rossana Franzin
- Department of Precision and Regenerative Medicine and Ionian Area (DIMEPRE-J), University of Bari Aldo Moro, 70124 Bari, Italy; (A.M.); (I.D.B.); (M.G.); (R.F.); (F.C.); (A.C.); (M.G.); (M.A.); (C.C.); (L.R.); (E.D.S.); (C.F.); (M.R.); (M.F.)
| | - Francesca Conserva
- Department of Precision and Regenerative Medicine and Ionian Area (DIMEPRE-J), University of Bari Aldo Moro, 70124 Bari, Italy; (A.M.); (I.D.B.); (M.G.); (R.F.); (F.C.); (A.C.); (M.G.); (M.A.); (C.C.); (L.R.); (E.D.S.); (C.F.); (M.R.); (M.F.)
| | - Anna Chiusolo
- Department of Precision and Regenerative Medicine and Ionian Area (DIMEPRE-J), University of Bari Aldo Moro, 70124 Bari, Italy; (A.M.); (I.D.B.); (M.G.); (R.F.); (F.C.); (A.C.); (M.G.); (M.A.); (C.C.); (L.R.); (E.D.S.); (C.F.); (M.R.); (M.F.)
| | - Maddalena Gigante
- Department of Precision and Regenerative Medicine and Ionian Area (DIMEPRE-J), University of Bari Aldo Moro, 70124 Bari, Italy; (A.M.); (I.D.B.); (M.G.); (R.F.); (F.C.); (A.C.); (M.G.); (M.A.); (C.C.); (L.R.); (E.D.S.); (C.F.); (M.R.); (M.F.)
| | - Matteo Accetturo
- Department of Precision and Regenerative Medicine and Ionian Area (DIMEPRE-J), University of Bari Aldo Moro, 70124 Bari, Italy; (A.M.); (I.D.B.); (M.G.); (R.F.); (F.C.); (A.C.); (M.G.); (M.A.); (C.C.); (L.R.); (E.D.S.); (C.F.); (M.R.); (M.F.)
| | - Cesira Cafiero
- Department of Precision and Regenerative Medicine and Ionian Area (DIMEPRE-J), University of Bari Aldo Moro, 70124 Bari, Italy; (A.M.); (I.D.B.); (M.G.); (R.F.); (F.C.); (A.C.); (M.G.); (M.A.); (C.C.); (L.R.); (E.D.S.); (C.F.); (M.R.); (M.F.)
| | - Luisa Ricciato
- Department of Precision and Regenerative Medicine and Ionian Area (DIMEPRE-J), University of Bari Aldo Moro, 70124 Bari, Italy; (A.M.); (I.D.B.); (M.G.); (R.F.); (F.C.); (A.C.); (M.G.); (M.A.); (C.C.); (L.R.); (E.D.S.); (C.F.); (M.R.); (M.F.)
| | - Emma Diletta Stea
- Department of Precision and Regenerative Medicine and Ionian Area (DIMEPRE-J), University of Bari Aldo Moro, 70124 Bari, Italy; (A.M.); (I.D.B.); (M.G.); (R.F.); (F.C.); (A.C.); (M.G.); (M.A.); (C.C.); (L.R.); (E.D.S.); (C.F.); (M.R.); (M.F.)
| | - Cinzia Forleo
- Department of Precision and Regenerative Medicine and Ionian Area (DIMEPRE-J), University of Bari Aldo Moro, 70124 Bari, Italy; (A.M.); (I.D.B.); (M.G.); (R.F.); (F.C.); (A.C.); (M.G.); (M.A.); (C.C.); (L.R.); (E.D.S.); (C.F.); (M.R.); (M.F.)
| | - Anna Gallone
- Department of Basic Medical Sciences, Neurosciences and Sense Organs, University of Bari Aldo Moro, 70121 Bari, Italy;
| | - Michele Rossini
- Department of Precision and Regenerative Medicine and Ionian Area (DIMEPRE-J), University of Bari Aldo Moro, 70124 Bari, Italy; (A.M.); (I.D.B.); (M.G.); (R.F.); (F.C.); (A.C.); (M.G.); (M.A.); (C.C.); (L.R.); (E.D.S.); (C.F.); (M.R.); (M.F.)
| | - Marco Fiorentino
- Department of Precision and Regenerative Medicine and Ionian Area (DIMEPRE-J), University of Bari Aldo Moro, 70124 Bari, Italy; (A.M.); (I.D.B.); (M.G.); (R.F.); (F.C.); (A.C.); (M.G.); (M.A.); (C.C.); (L.R.); (E.D.S.); (C.F.); (M.R.); (M.F.)
| | - Giuseppe Castellano
- Department of Clinical Sciences and Community Health, University of Milano, 20122 Milano, Italy;
- Fondazione IRCCS Cà Grande Ospedale Maggiore Policlinico, 20122 Milano, Italy
| | - Paola Pontrelli
- Department of Precision and Regenerative Medicine and Ionian Area (DIMEPRE-J), University of Bari Aldo Moro, 70124 Bari, Italy; (A.M.); (I.D.B.); (M.G.); (R.F.); (F.C.); (A.C.); (M.G.); (M.A.); (C.C.); (L.R.); (E.D.S.); (C.F.); (M.R.); (M.F.)
| | - Loreto Gesualdo
- Department of Precision and Regenerative Medicine and Ionian Area (DIMEPRE-J), University of Bari Aldo Moro, 70124 Bari, Italy; (A.M.); (I.D.B.); (M.G.); (R.F.); (F.C.); (A.C.); (M.G.); (M.A.); (C.C.); (L.R.); (E.D.S.); (C.F.); (M.R.); (M.F.)
| |
Collapse
|
8
|
Li X, Wang GA, Wei Z, Wang H, Zhu X. Protein-DNA interface hotspots prediction based on fusion features of embeddings of protein language model and handcrafted features. Comput Biol Chem 2023; 107:107970. [PMID: 37866116 DOI: 10.1016/j.compbiolchem.2023.107970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 10/06/2023] [Accepted: 10/07/2023] [Indexed: 10/24/2023]
Abstract
The identification of hotspot residues at the protein-DNA binding interfaces plays a crucial role in various aspects such as drug discovery and disease treatment. Although experimental methods such as alanine scanning mutagenesis have been developed to determine the hotspot residues on protein-DNA interfaces, they are both inefficient and costly. Therefore, it is highly necessary to develop efficient and accurate computational methods for predicting hotspot residues. Several computational methods have been developed, however, they are mainly based on hand-crafted features which may not be able to represent all the information of proteins. In this regard, we propose a model called PDH-EH, which utilizes fused features of embeddings extracted from a protein language model (PLM) and handcrafted features. After we extracted the total 1141 dimensional features, we used mRMR to select the optimal feature subset. Based on the optimal feature subset, several different learning algorithms such as Random Forest, Support Vector Machine, and XGBoost were used to build the models. The cross-validation results on the training dataset show that the model built by using Random Forest achieves the highest AUROC. Further evaluation on the independent test set shows that our model outperforms the existing state-of-the-art models. Moreover, the effectiveness and interpretability of embeddings extracted from PLM were demonstrated in our analysis. The codes and datasets used in this study are available at: https://github.com/lixiangli01/PDH-EH.
Collapse
Affiliation(s)
- Xiang Li
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Gang-Ao Wang
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Zhuoyu Wei
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Hong Wang
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Xiaolei Zhu
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China.
| |
Collapse
|
9
|
Ji C, Wei J, Zhang L, Hou X, Tan J, Yuan Q, Tan W. Aptamer-Protein Interactions: From Regulation to Biomolecular Detection. Chem Rev 2023; 123:12471-12506. [PMID: 37931070 DOI: 10.1021/acs.chemrev.3c00377] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2023]
Abstract
Serving as the basis of cell life, interactions between nucleic acids and proteins play essential roles in fundamental cellular processes. Aptamers are unique single-stranded oligonucleotides generated by in vitro evolution methods, possessing the ability to interact with proteins specifically. Altering the structure of aptamers will largely modulate their interactions with proteins and further affect related cellular behaviors. Recently, with the in-depth research of aptamer-protein interactions, the analytical assays based on their interactions have been widely developed and become a powerful tool for biomolecular detection. There are some insightful reviews on aptamers applied in protein detection, while few systematic discussions are from the perspective of regulating aptamer-protein interactions. Herein, we comprehensively introduce the methods for regulating aptamer-protein interactions and elaborate on the detection techniques for analyzing aptamer-protein interactions. Additionally, this review provides a broad summary of analytical assays based on the regulation of aptamer-protein interactions for detecting biomolecules. Finally, we present our perspectives regarding the opportunities and challenges of analytical assays for biological analysis, aiming to provide guidance for disease mechanism research and drug discovery.
Collapse
Affiliation(s)
- Cailing Ji
- Molecular Science and Biomedicine Laboratory (MBL), State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha 410082, China
| | - Junyuan Wei
- Molecular Science and Biomedicine Laboratory (MBL), State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha 410082, China
| | - Lei Zhang
- Molecular Science and Biomedicine Laboratory (MBL), State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha 410082, China
| | - Xinru Hou
- Molecular Science and Biomedicine Laboratory (MBL), State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha 410082, China
| | - Jie Tan
- Molecular Science and Biomedicine Laboratory (MBL), State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha 410082, China
| | - Quan Yuan
- Molecular Science and Biomedicine Laboratory (MBL), State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha 410082, China
| | - Weihong Tan
- Molecular Science and Biomedicine Laboratory (MBL), State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha 410082, China
- The Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, Zhejiang 310022, China
| |
Collapse
|
10
|
Pan Q, Portelli S, Nguyen TB, Ascher DB. Characterization on the oncogenic effect of the missense mutations of p53 via machine learning. Brief Bioinform 2023; 25:bbad428. [PMID: 38018912 PMCID: PMC10685404 DOI: 10.1093/bib/bbad428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 10/13/2023] [Accepted: 11/05/2023] [Indexed: 11/30/2023] Open
Abstract
Dysfunctions caused by missense mutations in the tumour suppressor p53 have been extensively shown to be a leading driver of many cancers. Unfortunately, it is time-consuming and labour-intensive to experimentally elucidate the effects of all possible missense variants. Recent works presented a comprehensive dataset and machine learning model to predict the functional outcome of mutations in p53. Despite the well-established dataset and precise predictions, this tool was trained on a complicated model with limited predictions on p53 mutations. In this work, we first used computational biophysical tools to investigate the functional consequences of missense mutations in p53, informing a bias of deleterious mutations with destabilizing effects. Combining these insights with experimental assays, we present two interpretable machine learning models leveraging both experimental assays and in silico biophysical measurements to accurately predict the functional consequences on p53 and validate their robustness on clinical data. Our final model based on nine features obtained comparable predictive performance with the state-of-the-art p53 specific method and outperformed other generalized, widely used predictors. Interpreting our models revealed that information on residue p53 activity, polar atom distances and changes in p53 stability were instrumental in the decisions, consistent with a bias of the properties of deleterious mutations. Our predictions have been computed for all possible missense mutations in p53, offering clinical diagnostic utility, which is crucial for patient monitoring and the development of personalized cancer treatment.
Collapse
Affiliation(s)
- Qisheng Pan
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne Victoria 3004, Australia
| | - Stephanie Portelli
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne Victoria 3004, Australia
| | - Thanh Binh Nguyen
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne Victoria 3004, Australia
| | - David B Ascher
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne Victoria 3004, Australia
| |
Collapse
|
11
|
Al-Jarf R, Karmakar M, Myung Y, Ascher DB. Uncovering the Molecular Drivers of NHEJ DNA Repair-Implicated Missense Variants and Their Functional Consequences. Genes (Basel) 2023; 14:1890. [PMID: 37895239 PMCID: PMC10606680 DOI: 10.3390/genes14101890] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Revised: 09/24/2023] [Accepted: 09/27/2023] [Indexed: 10/29/2023] Open
Abstract
Variants in non-homologous end joining (NHEJ) DNA repair genes are associated with various human syndromes, including microcephaly, growth delay, Fanconi anemia, and different hereditary cancers. However, very little has been done previously to systematically record the underlying molecular consequences of NHEJ variants and their link to phenotypic outcomes. In this study, a list of over 2983 missense variants of the principal components of the NHEJ system, including DNA Ligase IV, DNA-PKcs, Ku70/80 and XRCC4, reported in the clinical literature, was initially collected. The molecular consequences of variants were evaluated using in silico biophysical tools to quantitatively assess their impact on protein folding, dynamics, stability, and interactions. Cancer-causing and population variants within these NHEJ factors were statistically analyzed to identify molecular drivers. A comprehensive catalog of NHEJ variants from genes known to be mutated in cancer was curated, providing a resource for better understanding their role and molecular mechanisms in diseases. The variant analysis highlighted different molecular drivers among the distinct proteins, where cancer-driving variants in anchor proteins, such as Ku70/80, were more likely to affect key protein-protein interactions, whilst those in the enzymatic components, such as DNA-PKcs, were likely to be found in intolerant regions undergoing purifying selection. We believe that the information acquired in our database will be a powerful resource to better understand the role of non-homologous end-joining DNA repair in genetic disorders, and will serve as a source to inspire other investigations to understand the disease further, vital for the development of improved therapeutic strategies.
Collapse
Affiliation(s)
- Raghad Al-Jarf
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville, VIC 3052, Australia (M.K.)
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, VIC 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
| | - Malancha Karmakar
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville, VIC 3052, Australia (M.K.)
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, VIC 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
| | - Yoochan Myung
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville, VIC 3052, Australia (M.K.)
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, VIC 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, St. Lucia, QLD 4072, Australia
| | - David B. Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville, VIC 3052, Australia (M.K.)
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville, VIC 3052, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC 3004, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, St. Lucia, QLD 4072, Australia
| |
Collapse
|
12
|
Shirvanizadeh N, Vihinen M. VariBench, new variation benchmark categories and data sets. FRONTIERS IN BIOINFORMATICS 2023; 3:1248732. [PMID: 37795169 PMCID: PMC10546188 DOI: 10.3389/fbinf.2023.1248732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 09/08/2023] [Indexed: 10/06/2023] Open
Affiliation(s)
| | - Mauno Vihinen
- Department of Experimental Medical Science, Lund University, Lund, Sweden
| |
Collapse
|
13
|
Zhang X, Mei LC, Gao YY, Hao GF, Song BA. Web tools support predicting protein-nucleic acid complexes stability with affinity changes. WILEY INTERDISCIPLINARY REVIEWS. RNA 2023; 14:e1781. [PMID: 36693636 DOI: 10.1002/wrna.1781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 11/10/2022] [Accepted: 11/28/2022] [Indexed: 01/26/2023]
Abstract
Numerous biological processes, such as transcription, replication, and translation, rely on protein-nucleic acid interactions (PNIs). Demonstrating the binding stability of protein-nucleic acid complexes is vital to deciphering the code for PNIs. Numerous web-based tools have been developed to attach importance to protein-nucleic acid stability, facilitating the prediction of PNIs characteristics rapidly. However, the data and tools are dispersed and lack comprehensive integration to understand the stability of PNIs better. In this review, we first summarize existing databases for evaluating the stability of protein-nucleic acid binding. Then, we compare and evaluate the pros and cons of web tools for forecasting the interaction energies of protein-nucleic acid complexes. Finally, we discuss the application of combining models and capabilities of PNIs. We may hope these web-based tools will facilitate the discovery of recognition mechanisms for protein-nucleic acid binding stability. This article is categorized under: RNA Interactions with Proteins and Other Molecules > Protein-RNA Recognition RNA Interactions with Proteins and Other Molecules > RNA-Protein Complexes RNA Interactions with Proteins and Other Molecules > Protein-RNA Interactions: Functional Implications.
Collapse
Affiliation(s)
- Xiao Zhang
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang, China
| | - Long-Can Mei
- National Key Laboratory of Green Pesticide, Central China Normal University, Wuhan, China
| | - Yang-Yang Gao
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang, China
| | - Ge-Fei Hao
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang, China
- National Key Laboratory of Green Pesticide, Central China Normal University, Wuhan, China
| | - Bao-An Song
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang, China
| |
Collapse
|
14
|
Portelli S, Heaton R, Ascher DB. Identifying Innate Resistance Hotspots for SARS-CoV-2 Antivirals Using In Silico Protein Techniques. Genes (Basel) 2023; 14:1699. [PMID: 37761839 PMCID: PMC10531314 DOI: 10.3390/genes14091699] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 08/02/2023] [Accepted: 08/22/2023] [Indexed: 09/29/2023] Open
Abstract
The development and approval of antivirals against SARS-CoV-2 has further equipped clinicians with treatment strategies against the COVID-19 pandemic, reducing deaths post-infection. Extensive clinical use of antivirals, however, can impart additional selective pressure, leading to the emergence of antiviral resistance. While we have previously characterized possible effects of circulating SARS-CoV-2 missense mutations on proteome function and stability, their direct effects on the novel antivirals remains unexplored. To address this, we have computationally calculated the consequences of mutations in the antiviral targets: RNA-dependent RNA polymerase and main protease, on target stability and interactions with their antiviral, nucleic acids, and other proteins. By analyzing circulating variants prior to antiviral approval, this work highlighted the inherent resistance potential of different genome regions. Namely, within the main protease binding site, missense mutations imparted a lower fitness cost, while the opposite was noted for the RNA-dependent RNA polymerase binding site. This suggests that resistance to nirmatrelvir/ritonavir combination treatment is more likely to occur and proliferate than that to molnupiravir. These insights are crucial both clinically in drug stewardship, and preclinically in the identification of less mutable targets for novel therapeutic design.
Collapse
Affiliation(s)
- Stephanie Portelli
- School of Chemistry and Molecular Biosciences, The University of Queensland, St Lucia, QLD 4072, Australia
- Baker Heart and Diabetes Institute, 75 Commercial Road, Melbourne, VIC 3004, Australia
| | - Ruby Heaton
- School of Chemistry and Molecular Biosciences, The University of Queensland, St Lucia, QLD 4072, Australia
| | - David B. Ascher
- School of Chemistry and Molecular Biosciences, The University of Queensland, St Lucia, QLD 4072, Australia
- Baker Heart and Diabetes Institute, 75 Commercial Road, Melbourne, VIC 3004, Australia
| |
Collapse
|
15
|
Pandey P, Panday SK, Rimal P, Ancona N, Alexov E. Predicting the Effect of Single Mutations on Protein Stability and Binding with Respect to Types of Mutations. Int J Mol Sci 2023; 24:12073. [PMID: 37569449 PMCID: PMC10418460 DOI: 10.3390/ijms241512073] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 07/24/2023] [Accepted: 07/26/2023] [Indexed: 08/13/2023] Open
Abstract
The development of methods and algorithms to predict the effect of mutations on protein stability, protein-protein interaction, and protein-DNA/RNA binding is necessitated by the needs of protein engineering and for understanding the molecular mechanism of disease-causing variants. The vast majority of the leading methods require a database of experimentally measured folding and binding free energy changes for training. These databases are collections of experimental data taken from scientific investigations typically aimed at probing the role of particular residues on the above-mentioned thermodynamic characteristics, i.e., the mutations are not introduced at random and do not necessarily represent mutations originating from single nucleotide variants (SNV). Thus, the reported performance of the leading algorithms assessed on these databases or other limited cases may not be applicable for predicting the effect of SNVs seen in the human population. Indeed, we demonstrate that the SNVs and non-SNVs are not equally presented in the corresponding databases, and the distribution of the free energy changes is not the same. It is shown that the Pearson correlation coefficients (PCCs) of folding and binding free energy changes obtained in cases involving SNVs are smaller than for non-SNVs, indicating that caution should be used in applying them to reveal the effect of human SNVs. Furthermore, it is demonstrated that some methods are sensitive to the chemical nature of the mutations, resulting in PCCs that differ by a factor of four across chemically different mutations. All methods are found to underestimate the energy changes by roughly a factor of 2.
Collapse
Affiliation(s)
- Preeti Pandey
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA; (P.P.); (S.K.P.); (P.R.)
| | - Shailesh Kumar Panday
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA; (P.P.); (S.K.P.); (P.R.)
| | - Prawin Rimal
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA; (P.P.); (S.K.P.); (P.R.)
| | - Nicolas Ancona
- Department of Biological Sciences, Clemson University, Clemson, SC 29634, USA;
| | - Emil Alexov
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA; (P.P.); (S.K.P.); (P.R.)
| |
Collapse
|
16
|
David A, Sternberg MJE. Protein structure-based evaluation of missense variants: Resources, challenges and future directions. Curr Opin Struct Biol 2023; 80:102600. [PMID: 37126977 DOI: 10.1016/j.sbi.2023.102600] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 03/30/2023] [Accepted: 03/31/2023] [Indexed: 05/03/2023]
Abstract
We provide an overview of the methods that can be used for protein structure-based evaluation of missense variants. The algorithms can be broadly divided into those that calculate the difference in free energy (ΔΔG) between the wild type and variant structures and those that use structural features to predict the damaging effect of a variant without providing a ΔΔG. A wide range of machine learning approaches have been employed to develop those algorithms. We also discuss challenges and opportunities for variant interpretation in view of the recent breakthrough in three-dimensional structural modelling using deep learning.
Collapse
Affiliation(s)
- Alessia David
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK.
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| |
Collapse
|
17
|
Sun Y, Wu H, Xu Z, Yue Z, Li K. Prediction of hot spots in protein-DNA binding interfaces based on discrete wavelet transform and wavelet packet transform. BMC Bioinformatics 2023; 24:129. [PMID: 37016308 PMCID: PMC10074722 DOI: 10.1186/s12859-023-05263-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 03/30/2023] [Indexed: 04/06/2023] Open
Abstract
BACKGROUND Identification of hot spots in protein-DNA binding interfaces is extremely important for understanding the underlying mechanisms of protein-DNA interactions and drug design. Since experimental methods for identifying hot spots are time-consuming and expensive, and most of the existing computational methods are based on traditional protein-DNA features to predict hot spots, unable to make full use of the effective information in the features. RESULTS In this work, a method named WTL-PDH is proposed for hot spots prediction. To deal with the unbalanced dataset, we used the Synthetic Minority Over-sampling Technique to generate minority class samples to achieve the balance of dataset. First, we extracted the solvent accessible surface area features and structural features, and then processed the traditional features using discrete wavelet transform and wavelet packet transform to extract the wavelet energy information and wavelet entropy information, and obtained a total of 175 dimensional features. In order to obtain the best feature subset, we systematically evaluate these features in various feature selection strategies. Finally, light gradient boosting machine (LightGBM) was used to establish the model. CONCLUSIONS Our method achieved good results on independent test set with AUC, MCC and F1 scores of 0.838, 0.533 and 0.750, respectively. WTL-PDH can achieve generally better performance in predicting hot spots when compared with state-of-the-art methods. The dataset and source code are available at https://github.com/chase2555/WTL-PDH .
Collapse
Affiliation(s)
- Yu Sun
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Hongwei Wu
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Zhengrong Xu
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Zhenyu Yue
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Ke Li
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China.
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui University, Hefei, 230601, Anhui, China.
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, 230036, Anhui, China.
| |
Collapse
|
18
|
Mei LC, Hao GF, Yang GF. Thermodynamic database supports deciphering protein-nucleic acid interactions. Trends Biotechnol 2023; 41:140-143. [PMID: 36272818 DOI: 10.1016/j.tibtech.2022.09.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 09/16/2022] [Accepted: 09/27/2022] [Indexed: 01/11/2023]
Abstract
The thermodynamics of protein-nucleic acid interactions (PNIs) is crucial for elucidating the mechanisms of molecular recognition and pathological consequences. The Protein-Nucleic Acid Thermodynamics Database (PNATDB) is a database containing experimentally determined thermodynamic parameters along with sequence, structural, and function data, which is available free online.
Collapse
Affiliation(s)
- Long-Can Mei
- Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan 430079, China
| | - Ge-Fei Hao
- Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan 430079, China; State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Research and Development Center for Fine Chemicals, Guizhou University, Guiyang 550000, China.
| | - Guang-Fu Yang
- Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan 430079, China.
| |
Collapse
|
19
|
Aljarf R, Tang S, Pires DEV, Ascher DB. embryoTox: Using Graph-Based Signatures to Predict the Teratogenicity of Small Molecules. J Chem Inf Model 2023; 63:432-441. [PMID: 36595441 DOI: 10.1021/acs.jcim.2c00824] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Teratogenic drugs can lead to extreme fetal malformation and consequently critically influence the fetus's health, yet the teratogenic risks associated with most approved drugs are unknown. Here, we propose a novel predictive tool, embryoTox, which utilizes a graph-based signature representation of the chemical structure of a small molecule to predict and classify molecules likely to be safe during pregnancy. embryoTox was trained and validated using in vitro bioactivity data of over 700 small molecules with characterized teratogenicity effects. Our final model achieved an area under the receiver operating characteristic curve (AUC) of up to 0.96 on 10-fold cross-validation and 0.82 on nonredundant blind tests, outperforming alternative approaches. We believe that our predictive tool will provide a practical resource for optimizing screening libraries to determine effective and safe molecules to use during pregnancy. To provide a simple and integrated platform to rapidly screen for potential safe molecules and their risk factors, we made embryoTox freely available online at https://biosig.lab.uq.edu.au/embryotox/.
Collapse
Affiliation(s)
- Raghad Aljarf
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia
| | - Simon Tang
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia
| | - Douglas E V Pires
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia
- School of Computing and Information Systems, University of Melbourne, Parkville 3052, Victoria, Australia
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia 4072, Queensland, Australia
| |
Collapse
|
20
|
Ascher DB, Kaminskas LM, Myung Y, Pires DEV. Using Graph-Based Signatures to Guide Rational Antibody Engineering. Methods Mol Biol 2023; 2552:375-397. [PMID: 36346604 DOI: 10.1007/978-1-0716-2609-2_21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Antibodies are essential experimental and diagnostic tools and as biotherapeutics have significantly advanced our ability to treat a range of diseases. With recent innovations in computational tools to guide protein engineering, we can now rationally design better antibodies with improved efficacy, stability, and pharmacokinetics. Here, we describe the use of the mCSM web-based in silico suite, which uses graph-based signatures to rapidly identify the structural and functional consequences of mutations, to guide rational antibody engineering to improve stability, affinity, and specificity.
Collapse
Affiliation(s)
- David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne, Parkville, VIC, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
- Department of Biochemistry, Cambridge University, Cambridge, UK
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, Queensland, Australia
| | - Lisa M Kaminskas
- School of Biological Sciences, University of Queensland, St Lucia, QLD, Australia
| | - Yoochan Myung
- Structural Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne, Parkville, VIC, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia, Queensland, Australia
| | - Douglas E V Pires
- Structural Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne, Parkville, VIC, Australia.
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia.
- School of Computing and Information Systems, University of Melbourne, Parkville, VIC, Australia.
| |
Collapse
|
21
|
Salgado Á, de Melo-Minardi RC, Giovanetti M, Veloso A, Morais-Rodrigues F, Adelino T, de Jesus R, Tosta S, Azevedo V, Lourenco J, Alcantara LCJ. Machine learning models exploring characteristic single-nucleotide signatures in yellow fever virus. PLoS One 2022; 17:e0278982. [PMID: 36508435 PMCID: PMC9744328 DOI: 10.1371/journal.pone.0278982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 11/29/2022] [Indexed: 12/14/2022] Open
Abstract
Yellow fever virus (YFV) is the agent of the most severe mosquito-borne disease in the tropics. Recently, Brazil suffered major YFV outbreaks with a high fatality rate affecting areas where the virus has not been reported for decades, consisting of urban areas where a large number of unvaccinated people live. We developed a machine learning framework combining three different algorithms (XGBoost, random forest and regularized logistic regression) to analyze YFV genomic sequences. This method was applied to 56 YFV sequences from human infections and 27 from non-human primate (NHPs) infections to investigate the presence of genetic signatures possibly related to disease severity (in human related sequences) and differences in PCR cycle threshold (Ct) values (in NHP related sequences). Our analyses reveal four non-synonymous single nucleotide variations (SNVs) on sequences from human infections, in proteins NS3 (E614D), NS4a (I69V), NS5 (R727G, V643A) and six non-synonymous SNVs on NHP sequences, in proteins E (L385F), NS1 (A171V), NS3 (I184V) and NS5 (N11S, I374V, E641D). We performed comparative protein structural analysis on these SNVs, describing possible impacts on protein function. Despite the fact that the dataset is limited in size and that this study does not consider virus-host interactions, our work highlights the use of machine learning as a versatile and fast initial approach to genomic data exploration.
Collapse
Affiliation(s)
- Álvaro Salgado
- Laboratório de Genética Celular e Molecular, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- * E-mail: (AS); (LCJA); (JL)
| | - Raquel C. de Melo-Minardi
- Departamento de Ciência da Computação, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Marta Giovanetti
- Laboratório de Genética Celular e Molecular, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- Laboratório de Flavivírus, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
| | - Adriano Veloso
- Departamento de Ciência da Computação, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Francielly Morais-Rodrigues
- Laboratório de Genética Celular e Molecular, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Talita Adelino
- Laboratório Central de Saúde Pública, Fundação Ezequiel Dias, Belo Horizonte, Minas Gerais, Brazil
| | - Ronaldo de Jesus
- Coordenação Geral dos Laboratórios de Saúde Pública, Secretaria de Vigilância em Saúde, Ministério da Saúde, Brasília, DF, Brazil
| | - Stephane Tosta
- Laboratório de Genética Celular e Molecular, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Vasco Azevedo
- Laboratório de Genética Celular e Molecular, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - José Lourenco
- Department of Zoology, University of Oxford, Oxford, United Kingdom
- * E-mail: (AS); (LCJA); (JL)
| | - Luiz Carlos J. Alcantara
- Laboratório de Genética Celular e Molecular, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- Laboratório de Flavivírus, Instituto Oswaldo Cruz, Fundação Oswaldo Cruz, Rio de Janeiro, Brazil
- * E-mail: (AS); (LCJA); (JL)
| |
Collapse
|
22
|
Iftkhar S, de Sá AGC, Velloso JPL, Aljarf R, Pires DEV, Ascher DB. cardioToxCSM: A Web Server for Predicting Cardiotoxicity of Small Molecules. J Chem Inf Model 2022; 62:4827-4836. [PMID: 36219164 DOI: 10.1021/acs.jcim.2c00822] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The design of novel, safe, and effective drugs to treat human diseases is a challenging venture, with toxicity being one of the main sources of attrition at later stages of development. Failure due to toxicity incurs a significant increase in costs and time to market, with multiple drugs being withdrawn from the market due to their adverse effects. Cardiotoxicity, for instance, was responsible for the failure of drugs such as fenspiride, propoxyphene, and valdecoxib. While significant effort has been dedicated to mitigate this issue by developing computational approaches that aim to identify molecules likely to be toxic, including quantitative structure-activity relationship models and machine learning methods, current approaches present limited performance and interpretability. To overcome these, we propose a new web-based computational method, cardioToxCSM, which can predict six types of cardiac toxicity outcomes, including arrhythmia, cardiac failure, heart block, hERG toxicity, hypertension, and myocardial infarction, efficiently and accurately. cardioToxCSM was developed using the concept of graph-based signatures, molecular descriptors, toxicophore matchings, and molecular fingerprints, leveraging explainable machine learning, and was validated internally via different cross validation schemes and externally via low-redundancy blind sets. The models presented robust performances with areas under ROC curves of up to 0.898 on 5-fold cross-validation, consistent with metrics on blind tests. Additionally, our models provide interpretation of the predictions by identifying whether substructures that are commonly enriched in toxic compounds were present. We believe cardioToxCSM will provide valuable insight into the potential cardiotoxicity of small molecules early on drug screening efforts. The method is made freely available as a web server at https://biosig.lab.uq.edu.au/cardiotoxcsm.
Collapse
Affiliation(s)
- Saba Iftkhar
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia 4072, Queensland, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia
| | - Alex G C de Sá
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia 4072, Queensland, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, Victoria, Australia
| | - João P L Velloso
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia 4072, Queensland, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia
| | - Raghad Aljarf
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, Victoria, Australia
| | - Douglas E V Pires
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,School of Computing and Information Systems, University of Melbourne, Parkville 3052, Victoria, Australia
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland, St Lucia 4072, Queensland, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, Victoria, Australia
| |
Collapse
|
23
|
Bheemireddy S, Sandhya S, Srinivasan N, Sowdhamini R. Computational tools to study RNA-protein complexes. Front Mol Biosci 2022; 9:954926. [PMID: 36275618 PMCID: PMC9585174 DOI: 10.3389/fmolb.2022.954926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 09/20/2022] [Indexed: 11/19/2022] Open
Abstract
RNA is the key player in many cellular processes such as signal transduction, replication, transport, cell division, transcription, and translation. These diverse functions are accomplished through interactions of RNA with proteins. However, protein–RNA interactions are still poorly derstood in contrast to protein–protein and protein–DNA interactions. This knowledge gap can be attributed to the limited availability of protein-RNA structures along with the experimental difficulties in studying these complexes. Recent progress in computational resources has expanded the number of tools available for studying protein-RNA interactions at various molecular levels. These include tools for predicting interacting residues from primary sequences, modelling of protein-RNA complexes, predicting hotspots in these complexes and insights into derstanding in the dynamics of their interactions. Each of these tools has its strengths and limitations, which makes it significant to select an optimal approach for the question of interest. Here we present a mini review of computational tools to study different aspects of protein-RNA interactions, with focus on overall application, development of the field and the future perspectives.
Collapse
Affiliation(s)
- Sneha Bheemireddy
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Sankaran Sandhya
- Department of Biotechnology, Faculty of Life and Allied Health Sciences, M.S. Ramaiah University of Applied Sciences, Bengaluru, India
- *Correspondence: Sankaran Sandhya, ; Ramanathan Sowdhamini,
| | | | - Ramanathan Sowdhamini
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
- National Centre for Biological Sciences, TIFR, GKVK Campus, Bangalore, India
- Institute of Bioinformatics and Applied Biotechnology, Bangalore, India
- *Correspondence: Sankaran Sandhya, ; Ramanathan Sowdhamini,
| |
Collapse
|
24
|
Portelli S, Albanaz A, Pires DEV, Ascher DB. Identifying the molecular drivers of ALS-implicated missense mutations. J Med Genet 2022; 60:484-490. [PMID: 36180205 DOI: 10.1136/jmg-2022-108798] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 09/01/2022] [Indexed: 11/03/2022]
Abstract
BACKGROUND Amyotrophic lateral sclerosis (ALS) is a progressively fatal, neurodegenerative disease associated with both motor and non-motor symptoms, including frontotemporal dementia. Approximately 10% of cases are genetically inherited (familial ALS), while the majority are sporadic. Mutations across a wide range of genes have been associated; however, the underlying molecular effects of these mutations and their relation to phenotypes remain poorly explored. METHODS We initially curated an extensive list (n=1343) of missense mutations identified in the clinical literature, which spanned across 111 unique genes. Of these, mutations in genes SOD1, FUS and TDP43 were analysed using in silico biophysical tools, which characterised changes in protein stability, interactions, localisation and function. The effects of pathogenic and non-pathogenic mutations within these genes were statistically compared to highlight underlying molecular drivers. RESULTS Compared with previous ALS-dedicated databases, we have curated the most extensive missense mutation database to date and observed a twofold increase in unique implicated genes, and almost a threefold increase in the number of mutations. Our gene-specific analysis identified distinct molecular drivers across the different proteins, where SOD1 mutations primarily reduced protein stability and dimer formation, and those in FUS and TDP-43 were present within disordered regions, suggesting different mechanisms of aggregate formation. CONCLUSION Using our three genes as case studies, we identified distinct insights which can drive further research to better understand ALS. The information curated in our database can serve as a resource for similar gene-specific analyses, further improving the current understanding of disease, crucial for the development of treatment strategies.
Collapse
Affiliation(s)
- Stephanie Portelli
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia .,SCMB, The University of Queensland, Saint Lucia Campus, Saint Lucia, Queensland, Australia.,Systems and Computational Biology, Bio21 Institute, The University of Melbourne, Parkville, Victoria, Australia
| | | | - Douglas Eduardo Valente Pires
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia .,School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| | - David Benjamin Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia .,SCMB, The University of Queensland, Saint Lucia Campus, Saint Lucia, Queensland, Australia.,Systems and Computational Biology, Bio21 Institute, The University of Melbourne, Parkville, Victoria, Australia
| |
Collapse
|
25
|
Rezende PM, Xavier JS, Ascher DB, Fernandes GR, Pires DEV. Evaluating hierarchical machine learning approaches to classify biological databases. Brief Bioinform 2022; 23:6611916. [PMID: 35724625 PMCID: PMC9310517 DOI: 10.1093/bib/bbac216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 04/29/2022] [Accepted: 05/09/2022] [Indexed: 12/04/2022] Open
Abstract
The rate of biological data generation has increased dramatically in recent years, which has driven the importance of databases as a resource to guide innovation and the generation of biological insights. Given the complexity and scale of these databases, automatic data classification is often required. Biological data sets are often hierarchical in nature, with varying degrees of complexity, imposing different challenges to train, test and validate accurate and generalizable classification models. While some approaches to classify hierarchical data have been proposed, no guidelines regarding their utility, applicability and limitations have been explored or implemented. These include ‘Local’ approaches considering the hierarchy, building models per level or node, and ‘Global’ hierarchical classification, using a flat classification approach. To fill this gap, here we have systematically contrasted the performance of ‘Local per Level’ and ‘Local per Node’ approaches with a ‘Global’ approach applied to two different hierarchical datasets: BioLip and CATH. The results show how different components of hierarchical data sets, such as variation coefficient and prediction by depth, can guide the choice of appropriate classification schemes. Finally, we provide guidelines to support this process when embarking on a hierarchical classification task, which will help optimize computational resources and predictive performance.
Collapse
Affiliation(s)
- Pâmela M Rezende
- Universidade Federal de Minas Gerais.,Instituto René Rachou, Fundação Oswaldo Cruz.,Stilingue Inteligência Artificial
| | - Joicymara S Xavier
- Universidade Federal de Minas Gerais.,Instituto René Rachou, Fundação Oswaldo Cruz.,Institute of Agricultural Sciences, Universidade Federal dos Vales do Jequitinhonha e Mucuri
| | - David B Ascher
- School of Chemistry and Molecular Biosciences, University of Queensland.,Systems and Computational Biology, Bio 21 Institute, University of Melbourne.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute
| | | | - Douglas E V Pires
- Systems and Computational Biology, Bio 21 Institute, University of Melbourne.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute.,School of Computing and Information Systems, University of Melbourne
| |
Collapse
|
26
|
Pan X, Liu S, Liu L, Zhang X, Yao H, Tan B. Case Report: Exome and RNA Sequencing Identify a Novel de novo Missense Variant in HNRNPK in a Chinese Patient With Au-Kline Syndrome. Front Genet 2022; 13:853028. [PMID: 35422839 PMCID: PMC9001983 DOI: 10.3389/fgene.2022.853028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Accepted: 03/14/2022] [Indexed: 02/05/2023] Open
Abstract
Au-Kline syndrome is a severe multisystemic syndrome characterized by several congenital defects, including intellectual disability. Loss-of-function and missense variants in the HNRNPK gene are associated with a range of dysmorphic features. This report describes an eleven-year-old Chinese boy with intellectual disability and developmental delays. Family-based whole-exome and Sanger sequencing identified a de novo missense variant in HNRNPK (NM_002140.3: c.143T > A, p. Leu48Val). In silico analysis predicted that this variant would be damaged in a highly conserved residue in the K homology 1 (KH1) domain. Bioinformatic analysis showed that the affinity change (ΔΔG) caused by this variant was -0.033 kcal/mol, indicating that it would have reduced affinity for RNA binding. Transcript analysis of the peripheral blood from this case found 42 aberrantly expressed and 86 aberrantly spliced genes (p-value <0.01). Functional enrichment analysis confirmed that the biological functions of these genes, including protein binding and transcriptional regulation, are associated with HNRNPK. In summary, this study identifies the first Chinese patient with a novel de novo heterozygous HNRNPK gene variant that contributes to Au-Kline syndrome and expands current knowledge of the clinical spectrum of HNRNPK variants.
Collapse
Affiliation(s)
- Xin Pan
- Department of Gynecology and Obstetrics, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Sihan Liu
- Institute of Rare Diseases, West China Hospital of Sichuan University, Chengdu, China
| | - Li Liu
- Department of Gynecology and Obstetrics, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Xu Zhang
- Department of Gynecology and Obstetrics, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Hong Yao
- Department of Gynecology and Obstetrics, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Bo Tan
- Department of Gynecology and Obstetrics, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
| |
Collapse
|
27
|
Ali I, Khan A, Fa Z, Khan T, Wei DQ, Zheng J. Crystal structure of Acetyl-CoA carboxylase (AccB) from Streptomyces antibioticus and insights into the substrate-binding through in silico mutagenesis and biophysical investigations. Comput Biol Med 2022; 145:105439. [PMID: 35344865 DOI: 10.1016/j.compbiomed.2022.105439] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2021] [Revised: 03/14/2022] [Accepted: 03/20/2022] [Indexed: 11/18/2022]
Abstract
Acetyl-CoA carboxylase (ACC) is crucial for polyketides biosynthesis and acts as an essential metabolic checkpoint. It is also an attractive drug target against obesity, cancer, microbial infections, and diabetes. However, the lack of knowledge, particularly sequence-structure function relationship to narrate ligand-enzyme binding, has hindered the progress of ACC-specific therapeutics and unnatural "natural" polyketides. Structural characterization of such enzymes will boost the opportunity to understand the substrate binding, designing new inhibitors and information regarding the molecular rules which control the substrate specificity of ACCs. To understand the substrate specificity, we determined the crystal structure of AccB (Carboxyl-transferase, CT) from Streptomyces antibioticus with a resolution of 2.3 Å and molecular modeling approaches were employed to unveil the molecular mechanism of acetyl-CoA recognition and processing. The CT domain of S. antibioticus shares a similar structural organization with the previous structures and the two steps reaction was confirmed by enzymatic assay. Furthermore, to reveal the key hotspots required for the substrate recognition and processing, in silico mutagenesis validated only three key residues (V223, Q346, and Q514) that help in the fixation of the substrate. Moreover, we also presented atomic level knowledge on the mechanism of the substrate binding, which unveiled the terminal loop (500-514) function as an opening and closing switch and pushes the substrate inside the cavity for stable binding. A significant decline in the hydrogen bonding half-life was observed upon the alanine substitution. Consequently, the presented structural data highlighted the potential key interacting residues for substrate recognition and will also help to re-design ACCs active site for proficient substrate specificity to produce diverse polyketides.
Collapse
Affiliation(s)
- Imtiaz Ali
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, PR China
| | - Abbas Khan
- Department of Bioinformatics and Biological Statistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, PR China
| | - Zhang Fa
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, PR China
| | - Taimoor Khan
- Department of Bioinformatics and Biological Statistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, PR China
| | - Dong-Qing Wei
- Department of Bioinformatics and Biological Statistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, PR China; State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, Ministry of Education and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200030, PR China; Peng Cheng Laboratory, Vanke Cloud City Phase I Building 8, Xili Street, Nashan District, Shenzhen, Guangdong, 518055, PR China
| | - Jianting Zheng
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, PR China; Joint International Research Laboratory of Metabolic & Developmental Sciences, Shanghai Jiao Tong University, Shanghai, PR China.
| |
Collapse
|
28
|
Pires DEV, Stubbs KA, Mylne JS, Ascher DB. cropCSM: designing safe and potent herbicides with graph-based signatures. Brief Bioinform 2022; 23:6535680. [PMID: 35211724 PMCID: PMC9155605 DOI: 10.1093/bib/bbac042] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 01/26/2022] [Accepted: 01/27/2022] [Indexed: 12/11/2022] Open
Abstract
Herbicides have revolutionised weed management, increased crop yields and improved profitability allowing for an increase in worldwide food security. Their widespread use, however, has also led to a rise in resistance and concerns about their environmental impact. Despite the need for potent and safe herbicidal molecules, no herbicide with a new mode of action has reached the market in 30 years. Although development of computational approaches has proven invaluable to guide rational drug discovery pipelines, leading to higher hit rates and lower attrition due to poor toxicity, little has been done in contrast for herbicide design. To fill this gap, we have developed cropCSM, a computational platform to help identify new, potent, nontoxic and environmentally safe herbicides. By using a knowledge-based approach, we identified physicochemical properties and substructures enriched in safe herbicides. By representing the small molecules as a graph, we leveraged these insights to guide the development of predictive models trained and tested on the largest collected data set of molecules with experimentally characterised herbicidal profiles to date (over 4500 compounds). In addition, we developed six new environmental and human toxicity predictors, spanning five different species to assist in molecule prioritisation. cropCSM was able to correctly identify 97% of herbicides currently available commercially, while predicting toxicity profiles with accuracies of up to 92%. We believe cropCSM will be an essential tool for the enrichment of screening libraries and to guide the development of potent and safe herbicides. We have made the method freely available through a user-friendly webserver at http://biosig.unimelb.edu.au/crop_csm.
Collapse
Affiliation(s)
- Douglas E V Pires
- School of Computing and Information Systems at the University of Melbourne
| | - Keith A Stubbs
- School of Molecular Sciences at the University of Western Australia
| | - Joshua S Mylne
- Curtin University and Deputy Director of the Centre for Crop and Disease Management
| | - David B Ascher
- University of Queensland, and head of Computational Biology and Clinical Informatics at the Baker Institute and Systems
| |
Collapse
|
29
|
Nguyen TB, Pires DEV, Ascher DB. CSM-carbohydrate: protein-carbohydrate binding affinity prediction and docking scoring function. Brief Bioinform 2021; 23:6457169. [PMID: 34882232 DOI: 10.1093/bib/bbab512] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Revised: 11/06/2021] [Accepted: 11/08/2021] [Indexed: 12/29/2022] Open
Abstract
Protein-carbohydrate interactions are crucial for many cellular processes but can be challenging to biologically characterise. To improve our understanding and ability to model these molecular interactions, we used a carefully curated set of 370 protein-carbohydrate complexes with experimental structural and biophysical data in order to train and validate a new tool, cutoff scanning matrix (CSM)-carbohydrate, using machine learning algorithms to accurately predict their binding affinity and rank docking poses as a scoring function. Information on both protein and carbohydrate complementarity, in terms of shape and chemistry, was captured using graph-based structural signatures. Across both training and independent test sets, we achieved comparable Pearson's correlations of 0.72 under cross-validation [root mean square error (RMSE) of 1.58 Kcal/mol] and 0.67 on the independent test (RMSE of 1.72 Kcal/mol), providing confidence in the generalisability and robustness of the final model. Similar performance was obtained across mono-, di- and oligosaccharides, further highlighting the applicability of this approach to the study of larger complexes. We show CSM-carbohydrate significantly outperformed previous approaches and have implemented our method and make all data freely available through both a user-friendly web interface and application programming interface, to facilitate programmatic access at http://biosig.unimelb.edu.au/csm_carbohydrate/. We believe CSM-carbohydrate will be an invaluable tool for helping assess docking poses and the effects of mutations on protein-carbohydrate affinity, unravelling important aspects that drive binding recognition.
Collapse
Affiliation(s)
- Thanh Binh Nguyen
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Douglas E V Pires
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia.,Department of Biochemistry, University of Cambridge, Cambridge, UK
| |
Collapse
|
30
|
Nguyen TB, Myung Y, de Sá AGC, Pires DEV, Ascher DB. mmCSM-NA: accurately predicting effects of single and multiple mutations on protein-nucleic acid binding affinity. NAR Genom Bioinform 2021; 3:lqab109. [PMID: 34805992 PMCID: PMC8600011 DOI: 10.1093/nargab/lqab109] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2021] [Revised: 09/20/2021] [Accepted: 10/27/2021] [Indexed: 02/02/2023] Open
Abstract
While protein-nucleic acid interactions are pivotal for many crucial biological processes, limited experimental data has made the development of computational approaches to characterise these interactions a challenge. Consequently, most approaches to understand the effects of missense mutations on protein-nucleic acid affinity have focused on single-point mutations and have presented a limited performance on independent data sets. To overcome this, we have curated the largest dataset of experimentally measured effects of mutations on nucleic acid binding affinity to date, encompassing 856 single-point mutations and 141 multiple-point mutations across 155 experimentally solved complexes. This was used in combination with an optimized version of our graph-based signatures to develop mmCSM-NA (http://biosig.unimelb.edu.au/mmcsm_na), the first scalable method capable of quantitatively and accurately predicting the effects of multiple-point mutations on nucleic acid binding affinities. mmCSM-NA obtained a Pearson's correlation of up to 0.67 (RMSE of 1.06 Kcal/mol) on single-point mutations under cross-validation, and up to 0.65 on independent non-redundant datasets of multiple-point mutations (RMSE of 1.12 kcal/mol), outperforming similar tools. mmCSM-NA is freely available as an easy-to-use web-server and API. We believe it will be an invaluable tool to shed light on the role of mutations affecting protein-nucleic acid interactions in diseases.
Collapse
Affiliation(s)
- Thanh Binh Nguyen
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
| | - Yoochan Myung
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
| | - Alex G C de Sá
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
| | - Douglas E V Pires
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
- School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| |
Collapse
|
31
|
Rose M, Bai B, Tang M, Cheong CM, Beard S, Burgess JT, Adams MN, O'Byrne KJ, Richard DJ, Gandhi NS, Bolderson E. The Impact of Rare Human Variants on Barrier-To-Auto-Integration Factor 1 (Banf1) Structure and Function. Front Cell Dev Biol 2021; 9:775441. [PMID: 34820387 PMCID: PMC8606531 DOI: 10.3389/fcell.2021.775441] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 10/18/2021] [Indexed: 11/23/2022] Open
Abstract
Barrier-to-Autointegration Factor 1 (Banf1/BAF) is a critical component of the nuclear envelope and is involved in the maintenance of chromatin structure and genome stability. Banf1 is a small DNA binding protein that is conserved amongst multicellular eukaryotes. Banf1 functions as a dimer, and binds non-specifically to the phosphate backbone of DNA, compacting the DNA in a looping process. The loss of Banf1 results in loss of nuclear envelope integrity and aberrant chromatin organisation. Significantly, mutations in Banf1 are associated with the severe premature ageing syndrome, Néstor–Guillermo Progeria Syndrome. Previously, rare human variants of Banf1 have been identified, however the impact of these variants on Banf1 function has not been explored. Here, using in silico modelling, biophysical and cell-based approaches, we investigate the effect of rare human variants on Banf1 structure and function. We show that these variants do not significantly alter the secondary structure of Banf1, but several single amino acid variants in the N- and C-terminus of Banf1 impact upon the DNA binding ability of Banf1, without altering Banf1 localisation or nuclear integrity. The functional characterisation of these variants provides further insight into Banf1 structure and function and may aid future studies examining the potential impact of Banf1 function on nuclear structure and human health.
Collapse
Affiliation(s)
- Maddison Rose
- Queensland University of Technology (QUT), Cancer and Ageing Research Program, Centre for Genomics and Personalised Health, Translational Research Institute (TRI), Brisbane, QLD, Australia
| | - Bond Bai
- Queensland University of Technology (QUT), Cancer and Ageing Research Program, Centre for Genomics and Personalised Health, Translational Research Institute (TRI), Brisbane, QLD, Australia
| | - Ming Tang
- Queensland University of Technology (QUT), Cancer and Ageing Research Program, Centre for Genomics and Personalised Health, Translational Research Institute (TRI), Brisbane, QLD, Australia
| | - Chee Man Cheong
- Queensland University of Technology (QUT), Cancer and Ageing Research Program, Centre for Genomics and Personalised Health, Translational Research Institute (TRI), Brisbane, QLD, Australia
| | - Sam Beard
- Queensland University of Technology (QUT), Cancer and Ageing Research Program, Centre for Genomics and Personalised Health, Translational Research Institute (TRI), Brisbane, QLD, Australia
| | - Joshua T Burgess
- Queensland University of Technology (QUT), Cancer and Ageing Research Program, Centre for Genomics and Personalised Health, Translational Research Institute (TRI), Brisbane, QLD, Australia
| | - Mark N Adams
- Queensland University of Technology (QUT), Cancer and Ageing Research Program, Centre for Genomics and Personalised Health, Translational Research Institute (TRI), Brisbane, QLD, Australia
| | - Kenneth J O'Byrne
- Queensland University of Technology (QUT), Cancer and Ageing Research Program, Centre for Genomics and Personalised Health, Translational Research Institute (TRI), Brisbane, QLD, Australia.,Princess Alexandra Hospital, Woolloongabba, QLD, Australia
| | - Derek J Richard
- Queensland University of Technology (QUT), Cancer and Ageing Research Program, Centre for Genomics and Personalised Health, Translational Research Institute (TRI), Brisbane, QLD, Australia
| | - Neha S Gandhi
- Queensland University of Technology (QUT), Cancer and Ageing Research Program, Centre for Genomics and Personalised Health, Translational Research Institute (TRI), Brisbane, QLD, Australia.,School of Chemistry and Physics, Queensland University of Technology, Brisbane, QLD, Australia
| | - Emma Bolderson
- Queensland University of Technology (QUT), Cancer and Ageing Research Program, Centre for Genomics and Personalised Health, Translational Research Institute (TRI), Brisbane, QLD, Australia
| |
Collapse
|
32
|
da Silva BM, Myung Y, Ascher DB, Pires DEV. epitope3D: a machine learning method for conformational B-cell epitope prediction. Brief Bioinform 2021; 23:6407730. [PMID: 34676398 DOI: 10.1093/bib/bbab423] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 08/25/2021] [Accepted: 09/14/2021] [Indexed: 11/13/2022] Open
Abstract
The ability to identify antigenic determinants of pathogens, or epitopes, is fundamental to guide rational vaccine development and immunotherapies, which are particularly relevant for rapid pandemic response. A range of computational tools has been developed over the past two decades to assist in epitope prediction; however, they have presented limited performance and generalization, particularly for the identification of conformational B-cell epitopes. Here, we present epitope3D, a novel scalable machine learning method capable of accurately identifying conformational epitopes trained and evaluated on the largest curated epitope data set to date. Our method uses the concept of graph-based signatures to model epitope and non-epitope regions as graphs and extract distance patterns that are used as evidence to train and test predictive models. We show epitope3D outperforms available alternative approaches, achieving Mathew's Correlation Coefficient and F1-scores of 0.55 and 0.57 on cross-validation and 0.45 and 0.36 during independent blind tests, respectively.
Collapse
Affiliation(s)
- Bruna Moreira da Silva
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| | - YooChan Myung
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,Baker Department of Cardiometabolic Health, University of Melbourne, Melbourne, Victoria, Australia
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,Baker Department of Cardiometabolic Health, University of Melbourne, Melbourne, Victoria, Australia.,Department of Biochemistry, University of Cambridge, 80 Tennis Ct Rd, Cambridge CB2 1GA, UK
| | - Douglas E V Pires
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| |
Collapse
|
33
|
Zhang S, Zhao L, Zheng CH, Xia J. A feature-based approach to predict hot spots in protein-DNA binding interfaces. Brief Bioinform 2021; 21:1038-1046. [PMID: 30957840 DOI: 10.1093/bib/bbz037] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Revised: 02/20/2019] [Accepted: 03/07/2019] [Indexed: 12/21/2022] Open
Abstract
DNA-binding hot spot residues of proteins are dominant and fundamental interface residues that contribute most of the binding free energy of protein-DNA interfaces. As experimental methods for identifying hot spots are expensive and time consuming, computational approaches are urgently required in predicting hot spots on a large scale. In this work, we systematically assessed a wide variety of 114 features from a combination of the protein sequence, structure, network and solvent accessible information and their combinations along with various feature selection strategies for hot spot prediction. We then trained and compared four commonly used machine learning models, namely, support vector machine (SVM), random forest, Naïve Bayes and k-nearest neighbor, for the identification of hot spots using 10-fold cross-validation and the independent test set. Our results show that (1) features based on the solvent accessible surface area have significant effect on hot spot prediction; (2) different but complementary features generally enhance the prediction performance; and (3) SVM outperforms other machine learning methods on both training and independent test sets. In an effort to improve predictive performance, we developed a feature-based method, namely, PrPDH (Prediction of Protein-DNA binding Hot spots), for the prediction of hot spots in protein-DNA binding interfaces using SVM based on the selected 10 optimal features. Comparative results on benchmark data sets indicate that our predictor is able to achieve generally better performance in predicting hot spots compared to the state-of-the-art predictors. A user-friendly web server for PrPDH is well established and is freely available at http://bioinfo.ahu.edu.cn:8080/PrPDH.
Collapse
Affiliation(s)
- Sijia Zhang
- Institutes of Physical Science and Information Technology, School of Computer Science and Technology, Anhui University, Hefei, Anhui, China
| | - Le Zhao
- Institutes of Physical Science and Information Technology, School of Computer Science and Technology, Anhui University, Hefei, Anhui, China
| | - Chun-Hou Zheng
- Institutes of Physical Science and Information Technology, School of Computer Science and Technology, Anhui University, Hefei, Anhui, China
| | - Junfeng Xia
- Institutes of Physical Science and Information Technology, School of Computer Science and Technology, Anhui University, Hefei, Anhui, China
| |
Collapse
|
34
|
Towards Understanding the Pathogenicity of DROSHA Mutations in Oncohematology. Cells 2021; 10:cells10092357. [PMID: 34572006 PMCID: PMC8471307 DOI: 10.3390/cells10092357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 09/01/2021] [Accepted: 09/06/2021] [Indexed: 11/16/2022] Open
Abstract
Myelodysplastic syndrome (MDS) refers to a heterogeneous group of closely related clonal hematopoietic disorders, which are characterized by accumulation of somatic mutations. The acquired mutation burden is suggested to define the pathway and consequent phenotype of the pathology. Recent studies have called attention to the role of miRNA biogenesis genes in MDS progression; in particular, the mutational pressure of the DROSHA gene was determined. Therefore, this highlights the importance of studying the impact of all collected missense mutations found within the DROSHA gene in oncohematology that might affect the functionality of the protein. In this study, the selected mutations were extensively examined by computational screening, and the most deleterious were subjected to a further molecular dynamic simulation in order to uncover the molecular mechanism of the structural damage to the protein altering its biological function. The most significant effect was found for variants I625K, L1047S, and H1170D, presumably affecting the endonuclease activity of DROSHA. Such alterations arisen during MDS progression should be taken into consideration as evoking certain clinical traits in the malignifying clonal evolution.
Collapse
|
35
|
Mei LC, Wang YL, Wu FX, Wang F, Hao GF, Yang GF. HISNAPI: a bioinformatic tool for dynamic hot spot analysis in nucleic acid-protein interface with a case study. Brief Bioinform 2021; 22:bbaa373. [PMID: 33406224 PMCID: PMC7929440 DOI: 10.1093/bib/bbaa373] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 11/19/2020] [Accepted: 11/23/2020] [Indexed: 01/18/2023] Open
Abstract
Protein-nucleic acid interactions play essential roles in many biological processes, such as transcription, replication and translation. In protein-nucleic acid interfaces, hotspot residues contribute the majority of binding affinity toward molecular recognition. Hotspot residues are commonly regarded as potential binding sites for compound molecules in drug design projects. The dynamic property is a considerable factor that affects the binding of ligands. Computational approaches have been developed to expedite the prediction of hotspot residues on protein-nucleic acid interfaces. However, existing approaches overlook hotspot dynamics, despite their essential role in protein function. Here, we report a web server named Hotspots In silico Scanning on Nucleic Acid and Protein Interface (HISNAPI) to analyze hotspot residue dynamics by integrating molecular dynamics simulation and one-step free energy perturbation. HISNAPI is capable of not only predicting the hotspot residues in protein-nucleic acid interfaces but also providing insights into their intensity and correlation of dynamic motion. Protein dynamics have been recognized as a vital factor that has an effect on the interaction specificity and affinity of the binding partners. We applied HISNAPI to the case of SARS-CoV-2 RNA-dependent RNA polymerase, a vital target of the antiviral drug for the treatment of coronavirus disease 2019. We identified the hotspot residues and characterized their dynamic behaviors, which might provide insight into the target site for antiviral drug design. The web server is freely available via a user-friendly web interface at http://chemyang.ccnu.edu.cn/ccb/server/HISNAPI/ and http://agroda.gzu.edu.cn:9999/ccb/server/HISNAPI/.
Collapse
Affiliation(s)
- Long-Can Mei
- College of Chemistry, Central China Normal University
| | | | | | | | | | - Guang-Fu Yang
- Pesticide Science from Nankai University, Tianjin, China
| |
Collapse
|
36
|
Baseri N, Najar-Peerayeh S, Bakhshi B. Investigating the effect of an identified mutation within a critical site of PAS domain of WalK protein in a vancomycin-intermediate resistant Staphylococcus aureus by computational approaches. BMC Microbiol 2021; 21:240. [PMID: 34474665 PMCID: PMC8414773 DOI: 10.1186/s12866-021-02298-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Accepted: 08/23/2021] [Indexed: 11/15/2022] Open
Abstract
Background Vancomycin-intermediate resistant Staphylococcus aureus (VISA) is becoming a common cause of nosocomial infections worldwide. VISA isolates are developed by unclear molecular mechanisms via mutations in several genes, including walKR. Although studies have verified some of these mutations, there are a few studies that pay attention to the importance of molecular modelling of mutations. Method For genomic and transcriptomic comparisons in a laboratory-derived VISA strain and its parental strain, Sanger sequencing and reverse transcriptase quantitative PCR (RT-qPCR) methods were used, respectively. After structural protein mapping of the detected mutation, mutation effects were analyzed using molecular computational approaches and crystal structures of related proteins. Results A mutation WalK-H364R was occurred in a functional zinc ion coordinating residue within the PAS domain in the VISA strain. WalK-H364R was predicted to destabilize protein and decrease WalK interactions with proteins and nucleic acids. The RT-qPCR method showed downregulation of walKR, WalKR-regulated autolysins, and agr locus. Conclusion Overall, WalK-H364R mutation within a critical metal-coordinating site was presumably related to the VISA development. We assume that the WalK-H364R mutation resulted in deleterious effects on protein, which was verified by walKR gene expression changes.. Therefore, molecular modelling provides detailed insight into the molecular mechanism of VISA development, in particular, where allelic replacement experiments are not readily available. Supplementary Information The online version contains supplementary material available at 10.1186/s12866-021-02298-9.
Collapse
Affiliation(s)
- Neda Baseri
- Department of Bacteriology, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
| | - Shahin Najar-Peerayeh
- Department of Bacteriology, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
| | - Bita Bakhshi
- Department of Bacteriology, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran.
| |
Collapse
|
37
|
Liu J, Liu S, Liu C, Zhang Y, Pan Y, Wang Z, Wang J, Wen T, Deng L. Nabe: an energetic database of amino acid mutations in protein-nucleic acid binding interfaces. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021; 2021:6352208. [PMID: 34389843 PMCID: PMC8363842 DOI: 10.1093/database/baab050] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Revised: 07/23/2021] [Accepted: 07/29/2021] [Indexed: 12/17/2022]
Abstract
Protein–nucleic acid complexes play essential roles in regulating transcription, translation, DNA replication, repair and recombination, RNA processing and translocation. Site-directed mutagenesis has been extremely useful in understanding the principles of protein–DNA and protein–RNA interactions, and experimentally determined mutagenesis data are prerequisites for designing effective algorithms for predicting the binding affinity change upon mutation. However, a vital challenge in this area is the lack of sufficient public experimentally recognized mutation data, which leads to difficulties in developing computational prediction methods. In this article, we present Nabe, an integrated database of amino acid mutations and their effects on the binding free energy in protein–DNA and protein–RNA interactions for which binding affinities have been experimentally determined. Compared with existing databases and data sets, Nabe is the largest protein–nucleic acid mutation database, containing 2506 mutations in 473 protein–DNA and protein–RNA complexes, and of that 1751 are alanine mutations in 405 protein–nucleic acid complexes. For researchers to conveniently utilize the data, Nabe assembles protein–DNA and protein–RNA benchmark databases by adopting the data-processing procedures in the majority of models. To further facilitate users to query data, Nabe provides a searchable and graphical web page. Database URL: http://nabe.denglab.org
Collapse
Affiliation(s)
- Junyi Liu
- School of Computer Science and Engineering, Central South University, 22 Shaoshan South Road, Changsha 410075, China.,Viterbi School of Engineering, University of Southern California, 3650 McClintock Ave. OHE 106, Los Angeles, CA 90089, USA
| | - Siyu Liu
- School of Computer Science and Engineering, Central South University, 22 Shaoshan South Road, Changsha 410075, China
| | - Chenzhe Liu
- School of Computer Science and Engineering, Central South University, 22 Shaoshan South Road, Changsha 410075, China
| | - Yaping Zhang
- School of Computer Science and Engineering, Central South University, 22 Shaoshan South Road, Changsha 410075, China
| | - Yuliang Pan
- School of Computer Science and Engineering, Central South University, 22 Shaoshan South Road, Changsha 410075, China
| | - Zixiang Wang
- School of Computer Science and Engineering, Central South University, 22 Shaoshan South Road, Changsha 410075, China
| | - Jiacheng Wang
- School of Computer Science and Engineering, Central South University, 22 Shaoshan South Road, Changsha 410075, China
| | - Ting Wen
- School of Computer Science and Engineering, Central South University, 22 Shaoshan South Road, Changsha 410075, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, 22 Shaoshan South Road, Changsha 410075, China
| |
Collapse
|
38
|
Li G, Panday SK, Peng Y, Alexov E. SAMPDI-3D: predicting the effects of protein and DNA mutations on protein-DNA interactions. Bioinformatics 2021; 37:3760-3765. [PMID: 34343273 DOI: 10.1093/bioinformatics/btab567] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 06/28/2021] [Accepted: 07/31/2021] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION Mutations that alter protein-DNA interactions may be pathogenic and cause diseases. Therefore, it is extremely important to quantify the effect of mutations on protein-DNA binding free energy to reveal the molecular origin of diseases and to assist the development of treatments. Although several methods that predict the change of protein-DNA binding affinity upon mutations in the binding protein were developed, the effect of DNA mutations was not considered yet. RESULTS Here, we report a new version of SAMPDI, the SAMPDI-3D, which is a gradient boosting decision tree machine learning method to predict the change of the protein-DNA binding free energy caused by mutations in both the binding protein and the bases of the corresponding DNA. The method is shown to achieve Pearson correlation coefficient of 0.76 and 0.80 in a benchmarking test against experimentally determined change of the binding free energy caused by mutations in the binding protein or DNA, respectively. Furthermore, three datasets collected from literature were used to do blind benchmark for SAMPDI-3D and it is shown that it outperforms all existing state-of-the-art methods. The method is very fast allowing for genome-scale investigations. AVAILABILITY It is available as a web server and a stand-code at http://compbio.clemson.edu/SAMPDI-3D/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gen Li
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| | | | - Yunhui Peng
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| | - Emil Alexov
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29634, USA
| |
Collapse
|
39
|
Tunstall T, Phelan J, Eccleston C, Clark TG, Furnham N. Structural and Genomic Insights Into Pyrazinamide Resistance in Mycobacterium tuberculosis Underlie Differences Between Ancient and Modern Lineages. Front Mol Biosci 2021; 8:619403. [PMID: 34422898 PMCID: PMC8372558 DOI: 10.3389/fmolb.2021.619403] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Accepted: 04/14/2021] [Indexed: 11/30/2022] Open
Abstract
Resistance to drugs used to treat tuberculosis disease (TB) continues to remain a public health burden, with missense point mutations in the underlying Mycobacterium tuberculosis bacteria described for nearly all anti-TB drugs. The post-genomics era along with advances in computational and structural biology provide opportunities to understand the interrelationships between the genetic basis and the structural consequences of M. tuberculosis mutations linked to drug resistance. Pyrazinamide (PZA) is a crucial first line antibiotic currently used in TB treatment regimens. The mutational promiscuity exhibited by the pncA gene (target for PZA) necessitates computational approaches to investigate the genetic and structural basis for PZA resistance development. We analysed 424 missense point mutations linked to PZA resistance derived from ∼35K M. tuberculosis clinical isolates sourced globally, which comprised the four main M. tuberculosis lineages (Lineage 1-4). Mutations were annotated to reflect their association with PZA resistance. Genomic measures (minor allele frequency and odds ratio), structural features (surface area, residue depth and hydrophobicity) and biophysical effects (change in stability and ligand affinity) of point mutations on pncA protein stability and ligand affinity were assessed. Missense point mutations within pncA were distributed throughout the gene, with the majority (>80%) of mutations with a destabilising effect on protomer stability and on ligand affinity. Active site residues involved in PZA binding were associated with multiple point mutations highlighting mutational diversity due to selection pressures at these functionally important sites. There were weak associations between genomic measures and biophysical effect of mutations. However, mutations associated with PZA resistance showed statistically significant differences between structural features (surface area and residue depth), but not hydrophobicity score for mutational sites. Most interestingly M. tuberculosis lineage 1 (ancient lineage) exhibited a distinct protein stability profile for mutations associated with PZA resistance, compared to modern lineages.
Collapse
Affiliation(s)
- Tanushree Tunstall
- Department of Infection Biology, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Jody Phelan
- Department of Infection Biology, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Charlotte Eccleston
- Department of Infection Biology, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Taane G. Clark
- Department of Infection Biology, London School of Hygiene and Tropical Medicine, London, United Kingdom
- Department of Infectious Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Nicholas Furnham
- Department of Infection Biology, London School of Hygiene and Tropical Medicine, London, United Kingdom
| |
Collapse
|
40
|
Rodrigues CHM, Pires DEV, Ascher DB. mmCSM-PPI: predicting the effects of multiple point mutations on protein-protein interactions. Nucleic Acids Res 2021; 49:W417-W424. [PMID: 33893812 PMCID: PMC8262703 DOI: 10.1093/nar/gkab273] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 03/18/2021] [Accepted: 04/15/2021] [Indexed: 11/16/2022] Open
Abstract
Protein-protein interactions play a crucial role in all cellular functions and biological processes and mutations leading to their disruption are enriched in many diseases. While a number of computational methods to assess the effects of variants on protein-protein binding affinity have been proposed, they are in general limited to the analysis of single point mutations and have been shown to perform poorly on independent test sets. Here, we present mmCSM-PPI, a scalable and effective machine learning model for accurately assessing changes in protein-protein binding affinity caused by single and multiple missense mutations. We expanded our well-established graph-based signatures in order to capture physicochemical and geometrical properties of multiple wild-type residue environments and integrated them with substitution scores and dynamics terms from normal mode analysis. mmCSM-PPI was able to achieve a Pearson's correlation of up to 0.75 (RMSE = 1.64 kcal/mol) under 10-fold cross-validation and 0.70 (RMSE = 2.06 kcal/mol) on a non-redundant blind test, outperforming existing methods. Our method is freely available as a user-friendly and easy-to-use web server and API at http://biosig.unimelb.edu.au/mmcsm_ppi.
Collapse
Affiliation(s)
- Carlos H M Rodrigues
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, Victoria, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
| | - Douglas E V Pires
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, Victoria, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
- School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- Structural Biology and Bioinformatics, Department of Biochemistry and Pharmacology, University of Melbourne, Melbourne, Victoria, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
- Department of Biochemistry, University of Cambridge, Cambridge, UK
| |
Collapse
|
41
|
Al-Jarf R, de Sá AGC, Pires DEV, Ascher DB. pdCSM-cancer: Using Graph-Based Signatures to Identify Small Molecules with Anticancer Properties. J Chem Inf Model 2021; 61:3314-3322. [PMID: 34213323 PMCID: PMC8317153 DOI: 10.1021/acs.jcim.1c00168] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
![]()
The development of
new, effective, and safe drugs to treat cancer
remains a challenging and time-consuming task due to limited hit rates,
restraining subsequent development efforts. Despite the impressive
progress of quantitative structure–activity relationship and
machine learning-based models that have been developed to predict
molecule pharmacodynamics and bioactivity, they have had mixed success
at identifying compounds with anticancer properties against multiple
cell lines. Here, we have developed a novel predictive tool, pdCSM-cancer,
which uses a graph-based signature representation of the chemical
structure of a small molecule in order to accurately predict molecules
likely to be active against one or multiple cancer cell lines. pdCSM-cancer
represents the most comprehensive anticancer bioactivity prediction
platform developed till date, comprising trained and validated models
on experimental data of the growth inhibition concentration (GI50%)
effects, including over 18,000 compounds, on 9 tumor types and 74
distinct cancer cell lines. Across 10-fold cross-validation, it achieved
Pearson’s correlation coefficients of up to 0.74 and comparable
performance of up to 0.67 across independent, non-redundant blind
tests. Leveraging the insights from these cell line-specific models,
we developed a generic predictive model to identify molecules active
in at least 60 cell lines. Our final model achieved an area under
the receiver operating characteristic curve (AUC) of up to 0.94 on
10-fold cross-validation and up to 0.94 on independent non-redundant
blind tests, outperforming alternative approaches. We believe that
our predictive tool will provide a valuable resource to optimizing
and enriching screening libraries for the identification of effective
and safe anticancer molecules. To provide a simple and integrated
platform to rapidly screen for potential biologically active molecules
with favorable anticancer properties, we made pdCSM-cancer freely
available online at http://biosig.unimelb.edu.au/pdcsm_cancer.
Collapse
Affiliation(s)
- Raghad Al-Jarf
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia
| | - Alex G C de Sá
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, Victoria, Australia
| | - Douglas E V Pires
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,School of Computing and Information Systems, University of Melbourne, Parkville 3052, Victoria, Australia
| | - David B Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Parkville 3052, Victoria, Australia.,Systems and Computational Biology, Bio21 Institute, University of Melbourne, Parkville 3052, Victoria, Australia.,Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne 3004, Victoria, Australia.,Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Parkville 3010, Victoria, Australia.,Department of Biochemistry, University of Cambridge, 80 Tennis Ct Rd, Cambridge CB2 1GA, United Kingdom
| |
Collapse
|
42
|
Mei LC, Hao GF, Yang GF. Computational methods for predicting hotspots at protein-RNA interfaces. WILEY INTERDISCIPLINARY REVIEWS-RNA 2021; 13:e1675. [PMID: 34080311 DOI: 10.1002/wrna.1675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 05/13/2021] [Accepted: 05/14/2021] [Indexed: 11/10/2022]
Abstract
Protein-RNA interactions play essential roles in many critical biological events. A comprehensive understanding of the mechanisms underlying these interactions is helpful when studying cellular activities and therapeutic applications. Hotspots are a small portion of residues contributing much toward protein-RNA binding affinity. In pharmaceutical research, the hotspot residues are seen as the best option for designing small molecules to target proteins of therapeutic interest. With the accumulation of experimental data about protein-RNA interactions, computational methods have been produced for hotspot prediction on a large scale. In this review, we first present an overview of the existing databases for protein-RNA binding data. Furthermore, we outline the most adopted computational methods for hotspots prediction in protein-RNA interactions. Finally, we discuss the applications of hotspot prediction. This article is categorized under: RNA Interactions with Proteins and Other Molecules > Protein-RNA Recognition RNA Interactions with Proteins and Other Molecules > Protein-RNA Interactions: Functional Implications RNA Methods > RNA Analyses In Vitro and In Silico.
Collapse
Affiliation(s)
- Long-Can Mei
- Key Laboratory of Pesticide and Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan, China.,International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, China
| | - Ge-Fei Hao
- Key Laboratory of Pesticide and Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan, China.,International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, China.,State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Research and Development Center for Fine Chemicals, Guizhou University, Guiyang, China
| | - Guang-Fu Yang
- Key Laboratory of Pesticide and Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan, China.,International Joint Research Center for Intelligent Biosensor Technology and Health, Central China Normal University, Wuhan, China.,Collaborative Innovation Center of Chemical Science and Engineering, Tianjin, China
| |
Collapse
|
43
|
Portelli S, Barr L, de Sá AG, Pires DE, Ascher DB. Distinguishing between PTEN clinical phenotypes through mutation analysis. Comput Struct Biotechnol J 2021; 19:3097-3109. [PMID: 34141133 PMCID: PMC8180946 DOI: 10.1016/j.csbj.2021.05.028] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 04/29/2021] [Accepted: 05/19/2021] [Indexed: 12/28/2022] Open
Abstract
Phosphate and tensin homolog on chromosome ten (PTEN) germline mutations are associated with an overarching condition known as PTEN hamartoma tumor syndrome. Clinical phenotypes associated with this syndrome range from macrocephaly and autism spectrum disorder to Cowden syndrome, which manifests as multiple noncancerous tumor-like growths (hamartomas), and an increased predisposition to certain cancers. It is unclear, however, the basis by which mutations might lead to these very diverse phenotypic outcomes. Here we show that, by considering the molecular consequences of mutations in PTEN on protein structure and function, we can accurately distinguish PTEN mutations exhibiting different phenotypes. Changes in phosphatase activity, protein stability, and intramolecular interactions appeared to be major drivers of clinical phenotype, with cancer-associated variants leading to the most drastic changes, while ASD and non-pathogenic variants associated with more mild and neutral changes, respectively. Importantly, we show via saturation mutagenesis that more than half of variants of unknown significance could be associated with disease phenotypes, while over half of Cowden syndrome mutations likely lead to cancer. These insights can assist in exploring potentially important clinical outcomes delineated by PTEN variation.
Collapse
Affiliation(s)
- Stephanie Portelli
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne, Victoria, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Lucy Barr
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne, Victoria, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
| | - Alex G.C. de Sá
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne, Victoria, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Melbourne, Victoria, Australia
| | - Douglas E.V. Pires
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne, Victoria, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| | - David B. Ascher
- Structural Biology and Bioinformatics, Department of Biochemistry, University of Melbourne, Melbourne, Victoria, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, Melbourne, Victoria, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- Baker Department of Cardiometabolic Health, Melbourne Medical School, University of Melbourne, Melbourne, Victoria, Australia
- Department of Biochemistry, University of Cambridge, 80 Tennis Ct Rd, Cambridge CB2 1GA, United States
| |
Collapse
|
44
|
Sequeiros-Borja CE, Surpeta B, Brezovsky J. Recent advances in user-friendly computational tools to engineer protein function. Brief Bioinform 2021; 22:bbaa150. [PMID: 32743637 PMCID: PMC8138880 DOI: 10.1093/bib/bbaa150] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 06/03/2020] [Accepted: 06/16/2020] [Indexed: 12/14/2022] Open
Abstract
Progress in technology and algorithms throughout the past decade has transformed the field of protein design and engineering. Computational approaches have become well-engrained in the processes of tailoring proteins for various biotechnological applications. Many tools and methods are developed and upgraded each year to satisfy the increasing demands and challenges of protein engineering. To help protein engineers and bioinformaticians navigate this emerging wave of dedicated software, we have critically evaluated recent additions to the toolbox regarding their application for semi-rational and rational protein engineering. These newly developed tools identify and prioritize hotspots and analyze the effects of mutations for a variety of properties, comprising ligand binding, protein-protein and protein-nucleic acid interactions, and electrostatic potential. We also discuss notable progress to target elusive protein dynamics and associated properties like ligand-transport processes and allosteric communication. Finally, we discuss several challenges these tools face and provide our perspectives on the further development of readily applicable methods to guide protein engineering efforts.
Collapse
Affiliation(s)
- Carlos Eduardo Sequeiros-Borja
- Laboratory of Biomolecular Interactions and Transport, Department of Gene Expression, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University and the International Institute of Molecular and Cell Biology in Warsaw, Warsaw, Poland
| | - Bartłomiej Surpeta
- Laboratory of Biomolecular Interactions and Transport, Department of Gene Expression, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University and the International Institute of Molecular and Cell Biology in Warsaw, Warsaw, Poland
| | - Jan Brezovsky
- Laboratory of Biomolecular Interactions and Transport, Department of Gene Expression, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University and the International Institute of Molecular and Cell Biology in Warsaw
| |
Collapse
|
45
|
Zhang S, Wang L, Zhao L, Li M, Liu M, Li K, Bin Y, Xia J. An improved DNA-binding hot spot residues prediction method by exploring interfacial neighbor properties. BMC Bioinformatics 2021; 22:253. [PMID: 34000983 PMCID: PMC8130120 DOI: 10.1186/s12859-020-03871-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Accepted: 11/09/2020] [Indexed: 11/29/2022] Open
Abstract
Background DNA-binding hot spots are dominant and fundamental residues that contribute most of the binding free energy yet accounting for a small portion of protein–DNA interfaces. As experimental methods for identifying hot spots are time-consuming and costly, high-efficiency computational approaches are emerging as alternative pathways to experimental methods. Results Herein, we present a new computational method, termed inpPDH, for hot spot prediction. To improve the prediction performance, we extract hybrid features which incorporate traditional features and new interfacial neighbor properties. To remove redundant and irrelevant features, feature selection is employed using a two-step feature selection strategy. Finally, a subset of 7 optimal features are chosen to construct the predictor using support vector machine. The results on the benchmark dataset show that this proposed method yields significantly better prediction accuracy than those previously published methods in the literature. Moreover, a user-friendly web server for inpPDH is well established and is freely available at http://bioinfo.ahu.edu.cn/inpPDH. Conclusions We have developed an accurate improved prediction model, inpPDH, for hot spot residues in protein–DNA binding interfaces by given the structure of a protein–DNA complex. Moreover, we identify a comprehensive and useful feature subset including the proposed interfacial neighbor features that has an important strength for identifying hot spot residues. Our results indicate that these features are more effective than the conventional features considered previously, and that the combination of interfacial neighbor features and traditional features may support the creation of a discriminative feature set for efficient prediction of hot spot residues in protein–DNA complexes. Supplementary information Supplementary information accompanies this paper at 10.1186/s12859-020-03871-1.
Collapse
Affiliation(s)
- Sijia Zhang
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China
| | - Lihua Wang
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Le Zhao
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Menglu Li
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Mengya Liu
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Ke Li
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Yannan Bin
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China. .,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China.
| | - Junfeng Xia
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China. .,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China.
| |
Collapse
|
46
|
Wegrzyn K, Zabrocka E, Bury K, Tomiczek B, Wieczor M, Czub J, Uciechowska U, Moreno-Del Alamo M, Walkow U, Grochowina I, Dutkiewicz R, Bujnicki JM, Giraldo R, Konieczny I. Defining a novel domain that provides an essential contribution to site-specific interaction of Rep protein with DNA. Nucleic Acids Res 2021; 49:3394-3408. [PMID: 33660784 PMCID: PMC8034659 DOI: 10.1093/nar/gkab113] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 02/04/2021] [Accepted: 02/10/2021] [Indexed: 12/24/2022] Open
Abstract
An essential feature of replication initiation proteins is their ability to bind to DNA. In this work, we describe a new domain that contributes to a replication initiator sequence-specific interaction with DNA. Applying biochemical assays and structure prediction methods coupled with DNA–protein crosslinking, mass spectrometry, and construction and analysis of mutant proteins, we identified that the replication initiator of the broad host range plasmid RK2, in addition to two winged helix domains, contains a third DNA-binding domain. The phylogenetic analysis revealed that the composition of this unique domain is typical within the described TrfA-like protein family. Both in vitro and in vivo experiments involving the constructed TrfA mutant proteins showed that the newly identified domain is essential for the formation of the protein complex with DNA, contributes to the avidity for interaction with DNA, and the replication activity of the initiator. The analysis of mutant proteins, each containing a single substitution, showed that each of the three domains composing TrfA is essential for the formation of the protein complex with DNA. Furthermore, the new domain, along with the winged helix domains, contributes to the sequence specificity of replication initiator interaction within the plasmid replication origin.
Collapse
Affiliation(s)
- Katarzyna Wegrzyn
- Intercollegiate Faculty of Biotechnology of University of Gdansk and Medical University of Gdansk, University of Gdansk, Abrahama 58, 80-307 Gdansk, Poland
| | - Elzbieta Zabrocka
- Intercollegiate Faculty of Biotechnology of University of Gdansk and Medical University of Gdansk, University of Gdansk, Abrahama 58, 80-307 Gdansk, Poland
| | - Katarzyna Bury
- Intercollegiate Faculty of Biotechnology of University of Gdansk and Medical University of Gdansk, University of Gdansk, Abrahama 58, 80-307 Gdansk, Poland
| | - Bartlomiej Tomiczek
- Intercollegiate Faculty of Biotechnology of University of Gdansk and Medical University of Gdansk, University of Gdansk, Abrahama 58, 80-307 Gdansk, Poland
| | - Milosz Wieczor
- Department of Physical Chemistry, Gdańsk University of Technology, Narutowicza 11/12, 80-233 Gdańsk, Poland
| | - Jacek Czub
- Department of Physical Chemistry, Gdańsk University of Technology, Narutowicza 11/12, 80-233 Gdańsk, Poland
| | - Urszula Uciechowska
- Intercollegiate Faculty of Biotechnology of University of Gdansk and Medical University of Gdansk, University of Gdansk, Abrahama 58, 80-307 Gdansk, Poland
| | - María Moreno-Del Alamo
- Department of Cellular and Molecular Biology, Centro de Investigaciones Biológicas - CSIC, E28040 Madrid, Spain
| | - Urszula Walkow
- Intercollegiate Faculty of Biotechnology of University of Gdansk and Medical University of Gdansk, University of Gdansk, Abrahama 58, 80-307 Gdansk, Poland
| | - Igor Grochowina
- Intercollegiate Faculty of Biotechnology of University of Gdansk and Medical University of Gdansk, University of Gdansk, Abrahama 58, 80-307 Gdansk, Poland
| | - Rafal Dutkiewicz
- Intercollegiate Faculty of Biotechnology of University of Gdansk and Medical University of Gdansk, University of Gdansk, Abrahama 58, 80-307 Gdansk, Poland
| | - Janusz M Bujnicki
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, Księcia Trojdena 4, 02-109 Warsaw, Poland.,Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University, Umultowska 89, 61-614 Poznan, Poland
| | - Rafael Giraldo
- Department of Cellular and Molecular Biology, Centro de Investigaciones Biológicas - CSIC, E28040 Madrid, Spain
| | - Igor Konieczny
- Intercollegiate Faculty of Biotechnology of University of Gdansk and Medical University of Gdansk, University of Gdansk, Abrahama 58, 80-307 Gdansk, Poland
| |
Collapse
|
47
|
Vedithi SC, Malhotra S, Acebrón-García-de-Eulate M, Matusevicius M, Torres PHM, Blundell TL. Structure-Guided Computational Approaches to Unravel Druggable Proteomic Landscape of Mycobacterium leprae. Front Mol Biosci 2021; 8:663301. [PMID: 34026836 PMCID: PMC8138464 DOI: 10.3389/fmolb.2021.663301] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 04/12/2021] [Indexed: 02/02/2023] Open
Abstract
Leprosy, caused by Mycobacterium leprae (M. leprae), is treated with a multidrug regimen comprising Dapsone, Rifampicin, and Clofazimine. These drugs exhibit bacteriostatic, bactericidal and anti-inflammatory properties, respectively, and control the dissemination of infection in the host. However, the current treatment is not cost-effective, does not favor patient compliance due to its long duration (12 months) and does not protect against the incumbent nerve damage, which is a severe leprosy complication. The chronic infectious peripheral neuropathy associated with the disease is primarily due to the bacterial components infiltrating the Schwann cells that protect neuronal axons, thereby inducing a demyelinating phenotype. There is a need to discover novel/repurposed drugs that can act as short duration and effective alternatives to the existing treatment regimens, preventing nerve damage and consequent disability associated with the disease. Mycobacterium leprae is an obligate pathogen resulting in experimental intractability to cultivate the bacillus in vitro and limiting drug discovery efforts to repositioning screens in mouse footpad models. The dearth of knowledge related to structural proteomics of M. leprae, coupled with emerging antimicrobial resistance to all the three drugs in the multidrug therapy, poses a need for concerted novel drug discovery efforts. A comprehensive understanding of the proteomic landscape of M. leprae is indispensable to unravel druggable targets that are essential for bacterial survival and predilection of human neuronal Schwann cells. Of the 1,614 protein-coding genes in the genome of M. leprae, only 17 protein structures are available in the Protein Data Bank. In this review, we discussed efforts made to model the proteome of M. leprae using a suite of software for protein modeling that has been developed in the Blundell laboratory. Precise template selection by employing sequence-structure homology recognition software, multi-template modeling of the monomeric models and accurate quality assessment are the hallmarks of the modeling process. Tools that map interfaces and enable building of homo-oligomers are discussed in the context of interface stability. Other software is described to determine the druggable proteome by using information related to the chokepoint analysis of the metabolic pathways, gene essentiality, homology to human proteins, functional sites, druggable pockets and fragment hotspot maps.
Collapse
Affiliation(s)
- Sundeep Chaitanya Vedithi
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom,*Correspondence: Sundeep Chaitanya Vedithi,
| | - Sony Malhotra
- Rutherford Appleton Laboratory, Science and Technology Facilities Council, Oxon, United Kingdom
| | | | | | - Pedro Henrique Monteiro Torres
- Laboratório de Modelagem e Dinâmica Molecular, Instituto de Biofísica Carlos Chagas Filho, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil
| | - Tom L. Blundell
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom,Tom L. Blundell,
| |
Collapse
|
48
|
Jiang Y, Liu HF, Liu R. Systematic comparison and prediction of the effects of missense mutations on protein-DNA and protein-RNA interactions. PLoS Comput Biol 2021; 17:e1008951. [PMID: 33872313 PMCID: PMC8084330 DOI: 10.1371/journal.pcbi.1008951] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Revised: 04/29/2021] [Accepted: 04/08/2021] [Indexed: 12/30/2022] Open
Abstract
The binding affinities of protein-nucleic acid interactions could be altered due to missense mutations occurring in DNA- or RNA-binding proteins, therefore resulting in various diseases. Unfortunately, a systematic comparison and prediction of the effects of mutations on protein-DNA and protein-RNA interactions (these two mutation classes are termed MPDs and MPRs, respectively) is still lacking. Here, we demonstrated that these two classes of mutations could generate similar or different tendencies for binding free energy changes in terms of the properties of mutated residues. We then developed regression algorithms separately for MPDs and MPRs by introducing novel geometric partition-based energy features and interface-based structural features. Through feature selection and ensemble learning, similar computational frameworks that integrated energy- and nonenergy-based models were established to estimate the binding affinity changes resulting from MPDs and MPRs, but the selected features for the final models were different and therefore reflected the specificity of these two mutation classes. Furthermore, the proposed methodology was extended to the identification of mutations that significantly decreased the binding affinities. Extensive validations indicated that our algorithm generally performed better than the state-of-the-art methods on both the regression and classification tasks. The webserver and software are freely available at http://liulab.hzau.edu.cn/PEMPNI and https://github.com/hzau-liulab/PEMPNI. Protein-nucleic acid interactions play important roles in various cellular processes. Missense mutations occurring in DNA- or RNA-binding proteins (termed MPDs and MPRs, respectively) could change the binding affinities of these interactions. Previous studies have compared protein-DNA and protein-RNA interactions from multifaceted viewpoints, but less attention has been given to the similarities and specific differences between the effects of MPDs and MPRs and between the methodologies for predicting the affinity changes induced by the two mutation classes. Therefore, we systematically compared their impacts and demonstrated that MPDs and MPRs could have specific preferences for binding affinity changes. These observations motivated us to construct regression models separately for MPDs and MPRs by introducing novel energy and nonenergy descriptors. Although similar frameworks were developed to estimate these two categories of mutation effects, different descriptors were selected in the regression models and further revealed the specificity of mutation classes. The interplay between the energy and nonenergy modules effectively improved prediction performance. Our algorithm can also be adopted to disentangle mutations significantly decreasing binding affinities from other mutations.
Collapse
Affiliation(s)
- Yao Jiang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
| | - Hui-Fang Liu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
| | - Rong Liu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
| |
Collapse
|
49
|
Blake S, Hemming I, Heng JIT, Agostino M. Structure-Based Approaches to Classify the Functional Impact of ZBTB18 Missense Variants in Health and Disease. ACS Chem Neurosci 2021; 12:979-989. [PMID: 33621064 DOI: 10.1021/acschemneuro.0c00758] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
The Cys2His2 type zinc finger is a motif found in many eukaryotic transcription factor proteins that facilitates binding to genomic DNA so as to influence cellular gene expression. One such transcription factor is ZBTB18, characterized as a repressor that orchestrates the development of mammalian tissues including skeletal muscle and brain during embryogenesis. In humans, it has been recognized that disease-associated ZBTB18 missense variants mapping to the coding sequence of the zinc finger domain influence sequence-specific DNA binding, disrupt transcriptional regulation, and impair neural circuit formation in the brain. Furthermore, general population ZBTB18 missense variants that influence DNA binding and transcriptional regulation have also been documented within this domain; however, the molecular traits that explain why some variants cause disease while others do not are poorly understood. Here, we have applied five structure-based approaches to evaluate their ability to discriminate between disease-associated and general population ZBTB18 missense variants. We found that thermodynamic integration and Residue Scanning in the Schrodinger Biologics Suite were the best approaches for distinguishing disease-associated variants from general population variants. Our results demonstrate the effectiveness of structure-based approaches for the functional characterization of missense alleles to DNA binding, zinc finger transcription factor protein-coding genes that underlie human health and disease.
Collapse
Affiliation(s)
- Steven Blake
- Curtin Health Innovation Research Institute, Curtin University, Bentley, Western Australia 6102, Australia
- Ralph and Patricia Sarich Neuroscience Research Institute, Nedlands, Western Australia 6009, Australia
- School of Pharmacy and Biomedical Sciences, Curtin University, Bentley, Western Australia 6845, Australia
| | - Isabel Hemming
- Curtin Health Innovation Research Institute, Curtin University, Bentley, Western Australia 6102, Australia
- Ralph and Patricia Sarich Neuroscience Research Institute, Nedlands, Western Australia 6009, Australia
- The Faculty of Health and Medical Sciences, Medical School, The University of Western Australia, Crawley, Western Australia 6009, Australia
| | - Julian Ik-Tsen Heng
- Curtin Health Innovation Research Institute, Curtin University, Bentley, Western Australia 6102, Australia
- Ralph and Patricia Sarich Neuroscience Research Institute, Nedlands, Western Australia 6009, Australia
| | - Mark Agostino
- Curtin Health Innovation Research Institute, Curtin University, Bentley, Western Australia 6102, Australia
- School of Pharmacy and Biomedical Sciences, Curtin University, Bentley, Western Australia 6845, Australia
- Curtin Institute for Computation, Curtin University, Bentley, Western Australia, Australia
| |
Collapse
|
50
|
Suravajhala R, Gupta S, Kumar N, Suravajhala P. Deciphering LncRNA-protein interactions using docking complexes. J Biomol Struct Dyn 2020; 40:3769-3776. [PMID: 33280525 DOI: 10.1080/07391102.2020.1850354] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Deciphering RNA-protein interactions are important to study principal biological mechanisms including transcription and translation regulation, gene silencing, among others. Predicting RNA molecule interaction with the target protein could allow us to understand important cellular processes and design novel treatment therapies for various diseases. As non-coding RNAs do not have coding potential our knowledge about their functions is still limited. Therefore, RNA-binding proteins of non-coding RNAs regulating functions, viz. including cellular maturation, nuclear export and stability may play a very important role. Keeping in view of the need for refined methods to understand protein-RNA interactions, we have attempted a docking model to infer binding sites between lncRNA NONHSAT02007 and protein KIF13A for a rare disease phenotype that we are studying in our lab.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Renuka Suravajhala
- Department of Chemistry, School of Basic Science, Manipal University, Manipal, India
| | - Sonal Gupta
- Department of Biotechnology and Bioinformatics, Birla Institute of Scientific Research (BISR), Jaipur, India.,Department of Biotechnology, Amity University Rajasthan, Jaipur, India
| | - Narayan Kumar
- Department of Biotechnology and Bioinformatics, NIIT University, Neemrana, India
| | - Prashanth Suravajhala
- Department of Biotechnology and Bioinformatics, Birla Institute of Scientific Research (BISR), Jaipur, India.,Bioclues.org, India
| |
Collapse
|